For a computer to learn to recognize an image in a photo, it’s usually a matter of first showing it to thousands of pictures that have been labeled with data. To simplify the process of computer image recognition, six members from Facebook’s Artificial Intelligence Research Laboratory (FAIR) used the Transformer neural network architecture to create an end-to-end image detection AI.
DETR can predict the final test results directly (in parallel)
The researchers named the tool DETR (Detection Transformer) and said it simplifies the components needed to identify image objects.
IN ITS OFFICIAL BLOG, FAIR SAID DETR WAS THE FIRST TOOL TO SUCCESSFULLY INTEGRATE THE TRANSFORMER ARCHITECTURE INTO THE CORE OF IMAGE OBJECT DETECTION. Transformer architecture can revolutionize computer vision, just as it has done with natural language processing in recent years, or bridge the gap between natural language processing and computer vision.
“By combining a common CNN with a Transformer architecture, DETR can predict the final test results directly (in parallel), and unlike many other modern detection tools, the new model is conceptually simple and does not require a dedicated database.” The researchers said in their paper.
Created in 2017 by Google researchers, the Transformer architecture was originally designed to improve machine translation methods, but has now developed into a cornerstone of machine learning that can be used to train some of the most popular pre-trained language models such as Google’s BERT and Facebook’s RoBERTa. The Transformer schema uses attention functions instead of recursive neural networks to predict the next step in a sequence. When applied to object detection, Transformer can reduce the steps to model, such as creating spatial anchors and custom layers.
DeTR’s results are comparable to those of Faster R-CNN, the researchers said in their paper. Faster R-CNN is an object detection model created by Microsoft Research and has received nearly 10,000 citations since its launch in 2015.
Although the results are good, the researchers also point to one of the main problems with the model in the paper: DETR is more accurate in identifying large objects than small objects. “The current model will take several years to improve to deal with similar problems, and we hope that future work will be successfully addressed,” the researchers said. “
It’s worth noting that DETR is Facebook’s latest AI program, which aims to find a language model to solve the challenges of computer vision. Before that, Facebook introduced the malicious meme data set challenge in response to rumors and false news that were circulating on its own platform. Facebook believes that malicious graphics are an interesting challenge for machine learning programs, which can’t find the perfect solution in a short time. Facebook wants developers to create model-identifying images and content in accompanying text that violates Facebook policy.