Main menu


Safety monitoring method of moving target in underground coal mine based on computer vision processing

featured image

Computer vision concepts

Computer vision refers to the use of computers to achieve human visual function, perception, recognition and understanding of the objective world. It includes bionic vision and machine vision. One is to study the mechanism and function of vision by imitating human vision, the other is to study perception and processing, recognition and classification of vision, in order to achieve the purpose of replacing human vision9. At present, the visual information receiving equipment mainly includes CCD camera, CMOS camera, X-ray camera, mid-infrared camera, pinhole radar imaging equipment, microwave imager and so on. These devices are connected to a computer to form an optical system.

Fast image filtering algorithm

The underground environment of coal mines is dark and the imaging of surveillance video images is very poor, which brings a lot of difficulties to the safe production of coal mines, so intelligent video surveillance plays an important role. Due to the poor artificial lighting conditions in the mine, the video image in the mine is not clear and has many limitations. Therefore, before studying the target detection and tracking, it is necessary to enhance the video image. When the image is enhanced, it will be affected by noise, which is likely to amplify the noise and affect the quality of the image. Therefore, many scholars use the wavelet transform technology to deal with the noise problem in the image. The wavelet transform generally adopts the threshold filtering method. The method is simple to operate, and the enhancement effect of the image is also good, but he does not separately process the low-frequency and high-frequency coefficients generated by the wavelet transform, but adopts the same enhancement method. During reconstruction, the enhancement effect is not good, and the edge information of the image becomes blurred. Aiming at these problems, a method based on the combination of wavelet transform and dark primary color prior knowledge is proposed, which can effectively remove image noise while enhancing image detail information, and at the same time solve dense fog, uneven illumination, and improve the image quality after processing. Contrast, highlighting the details of the image, making the enhanced image clearer.

Neighborhood pixel smoothing filtering

Due to the randomness of high-frequency noise, the adjacent ratios of noise points and gray-scale pixels show large changes, which can lead to image degradation10. Neighborhood pixel smoothing is to replace the gray value of the edit point with the average value of the gray value of the neighborhood of the point to achieve smoothing. Let q(a, b) be the ratio of grayscale pixels in the neighborhood of point (x, y), a = 0,…,c − 1,b = 1,…,c − 1, c is the neighborhood of the number of points of length and width, regardless of whether g(x,y) is the grayscale of the processing point x = 0,…,n − 1,y = 0,…,m − 1, n is the image length, m is the image width, then:

$$g(x,y) = frac{1}{{c^{2} – 1}}sumlimits_{begin{subarray}{l} a = 0 \ b = 1 end{subarray} }^{c – 1} {q(a,b)}$$


Noise points can be smoothed using formula (1), but at the same time it smoothes the edges of the image, making the edges blurry, something that is not good for edge detection processing. Specifically, the larger the neighborhood, the more blurred it is. Therefore, the following improvements have been made during use.

$$g(x,y) = left{ {frac{1}{{c^{2} – 1}}sumlimits_{begin{subarray}{l} a = 0 \ b = 1 end{subarray} }^{c – 1} {q(a,b)} ,ifleft| {frac{1}{{c^{2} – 1}}sumlimits_{begin{subarray}{l} a = 0 \ b = 1 end{subarray} }^{c – 1} {q(a,b)} – g(x,y)} right| > th} right.$$


The threshold is th. If the difference between the average gray level of the neighborhood and the gray level of the edit point is greater than a threshold, it will be smoothed, if not, the gray level of the edit point will remain unchanged11.

This method is simple and requires very little computation, but still produces blur at the edges of the image when used.

Fast median filter

The fast median filter algorithm is an improvement on the classic median filter method. It is a nonlinear filtering method, which has a certain ability to keep the limit when filtering, but this ability will be lost as the window increases. When filtering, first select a window, and the window moves from left to right and top to bottom in the picture. For the pixel in the center of each window, all pixels in the window are arranged in grayscale from small to large, and the grayscale is placed in the middle. Since the tail processing takes longer, the processing speed is slower. Fast median filtering does not create a direct queue, but performs histogram statistics on the pixels in the window to determine the median, and only counts columns that move from the left side of the window and columns that move from the right during the shift. Also, statistics are no longer required, increasing processing speed.

Fast median filtering method with direction

The above methods are widely used in general image processing and have better filtering effects. However, the neighborhood method damages the edges somewhat during processing12. When processing N*M images with a window width of 3 pixels, the computational effort is close to 9 N*M additions and N*M/9 divisions. For fast median filtering, the window is typically at least 3 pixels wide or more for better results. Taking 3 pixels as an example, the calculation amount is equivalent to the calculation amount of the neighborhood method.

Among them, the images are collected in a specific environment F, the quality of each image is relatively consistent, and the filtering work is mainly aimed at the high frequency of the false noise function. At the same time, the requirements for the algorithm are: the important edge information of the image is not destroyed during filtering. The filtered image should be kept clean, without damage to the visual effect, with the characteristics of short processing time and high speed. Based on the above requirements, the fast median filtering method is improved.

After analyzing a large number of images in the experiment, it is found that the noise points are mainly distributed in the image in the form of discrete points, and the size of the noise points is generally small, mainly one pixel. Therefore, the window of the filter does not need to be very large to reduce the amount of computation using narrow rectangular windows. The selected window size is 1 × 5, as shown in Fig. 1.

Figure 1
figure 1

Filter window and grayscale values.

Video image enhancement method in underground coal mine

Overview of video image enhancement methods in coal mines

The process of improving the video image of the coal mine involves processing each frame of the video using image enhancement techniques. The processing is mainly divided into two stages: denoising and enhancement. The denoising function is to remove noise interference in the image, and the enhancement function is to improve the clarity and brightness of the video image. It should be noted that when processing video images in coal mines, the image needs to be denoised first and then image enhanced, otherwise, the noise in the image will increase with the improvement of the image.

Coal mine video image enhancement is mainly divided into four steps. First, input the video image, then remove the noise in the video image, and then enhance the video image in the mine. Finally, the improved video image is converted into a downhole video image13, and the whole operation procedure is shown in Fig. 2.

Figure 2
figure 2

Image enhancement process diagram.

Airspace enhancement method

The air domain, also known as the image domain, refers to the space composed of pixels. Therefore, the spatial enhancement method is actually a convenient and effective algorithm for directly modifying the grayscale of image pixels. Gray level correction algorithm, gray level mapping transformation algorithm, histogram transformation algorithm are its main algorithms. The spatial enhancement algorithm can process the pixels of the whole image, and can also process the sub-images of the image based on the template14. Several classical algorithms for airspace will be introduced below.

Grayscale transformation refers to the purpose of stretching the dynamic range of grayscale and improving the contrast by changing the grayscale value of the original image pixels, such as formula (3):

$$K(x,y = S[f(x,y)])$$


In the formula, f (m,n) is the original image, and after the mapping function S, the enhanced image K (m,n) is obtained. The key to this method is the function S. It directly determines whether the grayscale transformation is linear or nonlinear. Linear grayscale transformation, usually applied to underexposure, achieves enhanced effect by linearly extending the grayscale of pixels. As shown in Fig. 3, its expression is as follows:

$$K(m,n) = left{ {begin{array}{*{20}l} {bbegin{array}{*{20}c} {} & {} & {} & {0 le f(m,n) le c} \ end{array} } \ {frac{a – b}{{d – c}}[f(x,y) – c] + bbegin{array}{*{20}l} {} & {c le f(m,n) le d} \ end{array} } \ {abegin{array}{*{20}l} {} & {} & {} & {d le f(m,n) le L} \ end{array} } \ end{array} } right.$$


Figure 3
figure 3

Linearly transformed grayscale image.

The function image is as follows:

However, for images with a large dynamic range, linear transformation will lead to the problem of information loss, so nonlinear grayscale transformation is more suitable. At present, the widely used nonlinear transformations are mainly exponential, logarithmic and power transformations15. The expressions in turn are:

Exponential transformation:

$$K(m,n) = c^{[f(m,n)]}$$


Logarithmic transformation:

$$K(m,n) = log_{c} f(m,n)$$


Power transformation:

$$K(m,n) = [f(m,n)]^{n}$$


The function images are shown in Fig. 4, Fig. 5, and Fig. 6 in turn:

Figure 4
figure 4

Exponentially transformed image.

Figure 5
figure 5
Figure 6
figure 6

It can be seen from the function image that the exponential transformation can suppress the low-gray part and enhance the high-gray part. The logarithmic transformation can enhance the low gray part and suppress the high gray part. The power transformation can adjust the γ value to obtain different curves. When γ = 1, it is a linear transformation, and when γ > 1, it is similar to an exponential transformation. The grayscale transformation operation is simple and can quickly adjust the brightness, but the effect of directly applied to the mine image is not good.

Histogram equalization

A histogram is a mathematical model used to express an image, which can intuitively reflect the distribution of pixels in an image. The basic idea of equalization is to use normalization to convert the histogram of the original image into a state of uniform distribution, thereby maximizing the dynamic range of grayscale pixel values, thereby improving the overall brightness and contrast of the image. Histogram normalization:

$$Q_{T} (T_{j} ) = n_{j} /nbegin{array}{*{20}c} {} & {0 le } \ end{array} T_{j} le 1,j = 0,1,…,L – 1$$


Cumulative distribution function:

$$F = int_{ – e}^{e} {Q_{T} } (x)dx = sumlimits_{b = 0}^{j} {Q_{T} } (T_{b} )$$


The relationship between the input image grayscale (v_{b}) and the output grayscale (T_{b}):

$$T_{b} = P(v_{b} )begin{array}{*{20}c} {} & {P^{ – 1} (} \ end{array} T_{b} ) = v_{b}$$


This function satisfies the previous requirements of the improved function: monotonically increasing, and the probability of the domain and value domain is uniformly distributed between [0, 1]. The relative probability densities are: (Q_{T} (T_{b} )) and (Q_{v} (v_{b} )), and the corresponding cumulative distribution functions are:

$$CDF_{T} (T_{b} ) = int_{0}^{{T_{b} }} {Q_{T} (x)dxbegin{array}{*{20}c} {} & {CDF_{b} (v_{b} ) = int_{0}^{{T_{b} }} {Q_{v} } } \ end{array} } (x)dx$$


Functions and distributions of random variables:

$$CDF_{T} (T_{b} ) = CDF_{b} (v_{b} )$$


$$Q_{T} (T_{b} ) = frac{{dCDF_{T} (T_{b} )}}{{dT_{b} }} = frac{{dint_{0}^{v} {Q_{v} (x)dx} }}{{dT_{b} }} = Q_{T} (v_{b} )frac{{dv_{b} }}{{dT_{b} }}$$


$$begin{gathered} frac{{dv_{b} }}{{dT_{b} }} = frac{{dP(v_{b} )}}{{dv_{b} }} = frac{{d[int_{0}^{v} {Q_{v} (x)dx} }}{{dv_{b} }} = Q_{v} (v_{b} ), hfill \ Q_{T} (T_{b} ) = Q_{v} (v_{b} )frac{{dv_{b} }}{{dT_{b} }} = Q_{T} (T_{b} )frac{1}{{Q_{v} (v_{b} )}} = 1 hfill \ end{gathered}$$


That is, (T_{b}) means that the density distribution is changed uniformly.

Retinex algorithm

The Retinex algorithm is an image enhancement method based on scientific experiments and scientific observations. The principle of Retinex is that the image information obtained by the human eye is not only determined by the absolute light entering the human eye, but the color and brightness around the image also play an important role. At present, the most widely used Retinex algorithms are the SSR algorithm and the MSR algorithm. This theory claims that the images observed by people are mainly affected by illumination and reflection16, namely:

$$H(x,y) = K(x,y)J(x,y)$$


In the above formula, H (x,y) is the image observed by the human eye, and J (x,y) is the illuminance component, which is affected by the surrounding environment. Z (x,y) is the reflection element and contains the basic properties of the image. The principle of the Retine algorithm is to reduce or even eliminate the influence of the illumination component J (x,y) based on a certain method, and receive the original reflection component Z(x,y) of the image as much as possible. Changed to the logarithmic domain, the reflection component of the object itself is treated as:

$$k = h – j = log (frac{I(x,y)}{{J(x,y)}})$$


$$j = log (I(x,y) * F(x,y))$$


The principle of the MSR algorithm is to make the calculation of illuminance close to reality through a multi-scale weighting strategy, which can be expressed as:

$$log K(x,y) = sumlimits_{a = 1}^{A} {beta_{a} } left{ {log H(x,y) – log (H(x,y) oplus G(x,y))} right}$$


The number of Gaussian functions is A, and the weight of the nth Gaussian function is (beta_{a}), which satisfies (sumlimits_{a = 1}^{A} {beta_{a} } = 1). In general, A is selected as 3, the weights are equal, and the scales are 15, 80, and 250 (large, medium, and small) respectively. Due to the lack of the ability of normalized edge-preserving Gaussian filtering, in recent years, bilateral filtering and guided filtering have been gradually applied to the estimation of illuminance scores, and the results are also ideal in practice.

Template matching tracking method

The template matching method makes the classic method of target tracking easy. The algorithm has the advantages of simple and easy implementation, wide application range and good anti-noise effect. However, the shortcomings are also obvious, and it is not suitable for scenes with large changes in light and drastic changes in the appearance characteristics of the monitoring target. Importing the video image, use the target template method to find the position of the target to be tracked in the current frame, and the target that matches the template with the highest degree is the target to be tracked. Template matching only needs to compare the template with all sub-regions of the full image. It can determine the region closest to the target, and thus determine the target location. The following will explain the principle of how to realize the comparison between the sub-region and the template. The most direct and convenient method is to calculate the correlation coefficient between the two17,18.

Mathematically speaking, (r) is a mathematical distance that describes how close two vectors are. The correlation coefficient uses the law of cosines in mathematics to determine the angle between two vectors, denoted as (cos (A) = (a^{2} + c^{2} – b^{2} )/2bc), to calculate the degree of A (the included angle). When r = 1, it indicates that the two vectors are completely similar. When r tends to 0, it indicates that the similarity between the two vectors is lower. When r = 1, the vectors are completely opposite. The cosine theorem is represented by a vector as:

$$cos (A) = < b,c > /(left| b right| * left| c right|)$$


Which is:

$$cos (A) = (b_{1} c_{1} + b_{2} c_{2} + b_{n} c_{n} )/sqrt[(b_{12} + b_{22} + b_{n2} )(c_{12} + c_{22} + c_{n2} )]$$


The denominator is the modular product of the vectors, and the numerator is the sum of the inner products of the vectors.

In practical use, in order to enhance the correlation of vectors, it is usually necessary to indicate the correlation coefficient19. The purpose is to remove the similar parts of the two vectors, specifically the numerator and denominator while removing the mean of each vector, the formula is:

$$r = frac{{sum {(x_{e} – overline{x} } )(y_{e} – overline{y} )}}{{sqrt {sum {(x_{e} – overline{x} } )^{2} (y_{e} – overline{y} )^{2} } }}$$


Supposing we use the 9*9 target template to match the image, then this last class can be regarded as an 81-dimensional vector, where each dimension represents a pixel gray value in the image. Comparing and matching each word region of the whole picture with the template, and use the correlation coefficient to find the region with the highest degree of matching, so that the position of the target can be determined.