Modern video analysis technologies address complex issues. They are able to provide a convenient tool for objects detection in a real-time and searching for them in an archive based on various criteria: size, human presence, specific facial features, etc. The higher degree of unique search criteria leads to the more accurate results.
The most accurate results are produced based on pattern search (photo or image from an archive), and facial search. In this article, we will describe algorithms, which allow one to detect and recognize people's faces.
Face detection has been studied by experts in the field of computer vision for a long time. Results achieved in this area are quite impressive. In this article, we will describe the most popular approach by date - face detection using Haar’s cascades.
Haar’s cascades are masks sets (rectangular windows), each of which represents a certain image with a black and white pattern (a combination of black and white parts). There can be an unlimited number of such masks. Patterns quantity and complexity may vary.
Masks are rectangular windows, each of which represents a certain image with a black and white pattern
Masks are placed over different parts of the frame. The software determines whether or not there is any face in a frame. Masking a certain portion of a snapshot provides a numeric value. This value is the result of a frame matching with a mask. The software sums up the brightness of all pixels, which are in the white part of the mask. It also sums up the brightness of all pixels belonging to the black part of the mask. Then it calculates the difference between these values. The result is then compared to a threshold value.
The popularity of this approach is because the calculation is carried out quickly and simply. It is enough to perform only three operations for each rectangular mask element.
Face detector training
Human face images have several distinguishing characteristics:
1. From a frontal standpoint, a human face has dark and light zones and areas: eyes and lips are dark, while the forehead, cheeks, and chin are light).
2. Faces are similar to each other. They differ in details but, in general, human faces are of the same type.
This means that you can pick up a set of masks (Haar’s cascade) and create a classifier (an algorithm that detects a particular object in a snapshot). This classifier will take these features into account and will be able to detect faces as accurately as possible.
In the process of mask selection, a classifier can learn to improve detection accuracy. AdaBoost algorithm is used for a classifier’s training and performance improvement. A sampling is created for machine learning purposes. It includes a large number of pictures with images of people. Each of the classifier’s masks is used in turn.
Positive learning sample consists of a large number of pictures with images of people’s faces
There can be a huge number of masks with different variations of black and white patterns. Each mask gives a certain value in the process of comparison. If this value is above a threshold, it means that a human face is present in a frame. Along with a positive training example containing human faces images, a negative example is created as well. The negative example does not contain images of human faces. This example is also used for classifier’s training. In the case where the negative example comparison returns a value, it is smaller than a threshold value.
If any image mask makes a mistake, the weight (importance) of this image increases for other masks.
As a result of comparisons made with positive and negative examples, a mask is placed into a cascade classifier. It gets there with some ratio showing a face detection error for this mask, and the proportion of photo images on which this mask did not make a mistake. Taking into consideration their individual error detection ratios, the face detection module compares the deviations value for all masks with a threshold value within the cascade classifier. If a resulting value is greater than a threshold, the faces detector signals a human face present in a frame.
More often than not, a sample contains frontal view images of faces. It is easier to detect faces from a frontal view. However, a classifier can be trained to detect faces in different positions using appropriate sampling.
Several algorithms can be used for human face recognition. By recognition, we mean comparing a detected face in a snapshot with a reference image within the database.
Working with 2D-images
The most commonly used algorithm works with special points of a human face, and with the distances between them. These points and distances are unique. That is why when comparing them with reference values within a standard database, it becomes possible to recognize whether a person in a frame is, in fact, the same person as in an image. A face within a snapshot is compared to benchmarks in the database. If they are similar, the most similar image is reported to an operator.
This method allows one to identify human faces in frontal view, high definition, and without glare. The algorithm is rather sensitive to head tilts and turns, facial expressions, lighting, and so on. That is why it is not suitable for face detection in unorganized flows (crowds, street traffic, etc.). However, this method can be used on the sites requiring access control (checkpoints of enterprises, factories, government agencies). It often operates integrated with access control systems.
Strict requirements for a facial image in a frame using 3D-models can be addressed in several ways. One method requires that special stereo cameras are installed and synchronized with each other. When a person appears in a surveillance zone, cameras take a series of shots from different angles. Then a 3D-model of a person’s face is built and some analysis of special facial points and the distances between them is performed. A further comparison with a reference database of human faces is carried in analogy with the method described above.
Dealing with particular issues
Nowadays, the method using Haar’s cascades is one of the most popular in face detection. The most accurately detected faces are in frontal view. A classifier can be trained to detect faces in different positions using this method as well.
Different technologies and algorithms can be used to identify people. They all have different requirements for facial images.
Some methods allow one to address access control and personnel monitoring tasks. They are primarily used for the automation of staff identification and admission into territories of enterprises or institutions.
Other methods allow one to identify and search for people in unorganized flows and in videos with a variety of scenes. A comparison of the faces in a frame and in a database is performed by referring to the special points and distances between them.
Download Macroscop Demo to check how it works!