Smart Eyes: The Integration of AI and ML in Surveillance Cameras for Enhanced Security
2019 – present
About the Client
Our client is a technological company in the medical field that implements virtual and augmented reality solutions for medical purposes. This allows them to revolutionize the field and provide practical and precise patient treatment methods. The clinic offers its clients virtual reality technologies for the treatment of chronic pain, anxiety, and post-traumatic stress disorders; rehabilitation classes for patients with physical disabilities, etc.
Amaizunu’s Outstaff Data Scientist team joined our client’s team to implement the Computer Vision solution. The client provided us with a service based on technologies:
- Face detection;
- Face recognition;
- Person detection;
- Person re-identification;
- Object tracking.
Amazinum Data Scientists in Action
The system associates images of the same person taken from different cameras or from the same camera on different occasions in order to not alert people that were marked as trusted.
The main challenge was to find a solution that would provide similar embeddings to the same person regardless of the image source. The system uses a pre-trained OSNet model, finetuned on a custom-labeled data set.
Based on the data provided to us by the client, we researched algorithms that would satisfy our needs. From the available trained models, we chose the one that suited the best and implemented it in the application itself. After the appropriate model was chosen, Data Science Institute Amaizunmu conducted its evaluation. Based on the collected results, a team of specialists retrained her according to the existing Torchreid project.
Among the difficulties encountered by the Data Scientists Amazinum was the problem with person re-identification in night vision
Since the client gave our specialists access to the installed surveillance cameras, we used their data to train the model. At the same time using different approaches. According to one of the methods, we trained the model on the night picture immediately. With the other, they trained her on the day picture, which was then trained on the night picture.
Evaluation of the models made it possible to choose the best approach for its training and implement it in the service.
Our client’s task was to teach the model to work with faces, that is, to recognize or identify them. This would help create a personal experience of a specific surveillance camera and would increase the security of objects. Facial recognition technology can compare a detected face to a pre-existing database of faces, allowing surveillance cameras to identify known individuals and flag potential security threats or security issues.
In this task, our data scientists followed similar steps as in the previous one.
- Search for a model
- Evaluation of models and selection of the best one
- Implementation of the model at production companies
To work with faces, we chose three models.
RetinaFace – used to search for absolutely all faces in the frame, regardless of quality or distance.
Another model is used to choose the best quality of faces found by RetinaFace. Thus, they can be used in face recognition.
The face recognition model was chosen like all the previous ones. That is the selection of the model, its testing, training, and implementation. To train the model, we used the CASIA-WebFace dataset. It has about 453,453 images for 10,575 IDs after face detection.
Among the challenges faced by the Data Scientist of Amazinum was that the face detector we used did not provide a sufficient level of model training.
To solve this problem, we tested the use of other detectors. One of the facial landmark detectors that have proven to perform well in these conditions is the multi-task CNN.
After that, it was possible to proceed to the face recognition model. It consists of several steps:
- Face recognition. An algorithm that analyzes a photo or video determines the areas that contain faces.
- Identification of signs. A feature extraction algorithm is used to identify personal facial features that are unique to each person.
- Comparison of faces. At this stage, the model compares the face with the existing database, guided by certain metrics.
- Face ID. If the detected face matches a face in an existing database, the model identifies the person
To evaluate the model, the data scientists of Amazinum used certain AP metrics, map@.5-.95, Precision, Recall, F1 score, and confusion matrix.
The person detection model is actually the main model after which further work was to be carried out.
To accomplish this task, Amazinum’s Data Scientists researched available solutions and looked for models. After that, the model was evaluated and the best one was chosen. The model was selected based on certain parameters.
- The performance of the model, that is, how well it searches for a person;
- the architecture of the model – how many resources it needs, and its speed.
After that, our specialists began to retrain the model, based on data from surveillance cameras. In this way, we received videos from different locations from different clients, cut them into frames, labeled them, and formed a training data set.
Among the challenges faced by the Amazinum Data Scientists team was improving the performance of the pre-trained model.
To cope with this task, we conducted experiments with model hyperparameters. Cleaning and correcting data was carried out, as well as adding external data to the data set, i.e. photos or videos taken from the Internet.
Data Scientists of Amazinum needed to develop a model that could recognize objects. For this, we looked for models, evaluated them, and chose the one that would satisfy all our needs. For the object detector, our specialists used the new version of YoloV7, with the DeepSort sorting algorithm. For comparison already detected people TorchReid.
Since the metrics were lower than we expected, the Data Scientists team decided to create a custom Object tracking logic. The developed technology better suited the client’s requirements regarding Object tracking.
The use of computer vision technologies in surveillance cameras helped to make the protection of areas more reliable. End users will be able to get:
- Improved security. Because facial recognition and object detection can help identify potential security threats in real-time, allowing security personnel to take quick action to prevent crime and ensure public safety.
- Greater efficiency and effectiveness. Machine learning algorithms can automate the monitoring and analysis of large volumes of data.
- Savings of funds. You can optimize security costs by automating many processes. At the same time, computer vision technologies will help you increase efficiency.
- Personalized experience. Computer vision technologies, such as face recognition or face detectors, will help you create supervision that will be customized for you.
- Statistical information in real-time. You can get up-to-date information about people and their behavior and other key indicators.
For our client, as the owner of the surveillance camera software and service, the implementation of machine learning technologies provided:
- Increasing the trust of his clients;
- Optimizing processes and increasing efficiency;
- Increasing the number of customers
- Market leadership among competitors;
- Provision of more effective services