AI face change once again brush burst the circle of friends, the recent cloud graduation is at the time, the various technology companies homeward launched their own AI face-changing technology, the result was played by students bad! Face change this matter must not be less than the industry big wigs! It has to be said that good-looking is gender-neutral, The brother of Yan Hong full score!
On closer inspection, AI face change technology in recent years or mature a lot, the overall face fit, detail handling has significantly improved. Recently, Deepfake has once again made an important breakthrough. Disney is understood to have released a new study claiming that its face-swapping technology is at the highest level in the industry.
From the effect map, it is true that a little pick not fault!
It is understood that Disney Research, in collaboration with the Federal Institute of Technology zurich, has developed a new algorithm based on GAN that automatically enables face exchange in images/videos while guaranteeing millions of levels of high resolution.
More notably, the study is now initially planned for Hollywood blockbuster production, which is said to improve film quality and post-production costs.
Deepfake into hollywood blockbusters
Face exchange is not uncommon in the film industry. In some Hollywood blockbusters, it’s often used to use stand-in actors to perform professional, difficult moves. Post-production can cost a lot of money to ensure the effect of the film. However, common computer graphics synthesis technology, the effect is often poor, and even refurbished remake.
This is a very large cost of time and money, so Disney has specially conducted this collaborative study in conjunction with the Federal Institute of Technology zucrich.
Recently, Disney said it was working on a new face-swapping technology that could be used in movies or TV series. They claim the technology produces high-resolution, realistic images/videos during face exchange, making it ideal for large-screen playback.
Local fusion is more tested with the technical difficulty of changing faces. To test the algorithm’s performance, the researchers did not fuse local organs such as the eyes and lips of the face, and the effect was amazing.
Based on Figure 1, Figure 2 and 3 respectively carried out the lip and eye of the local face fusion, you can see the local fusion degree is very high, high-definition, natural, can not see a bit of cracks. It also fits in real time with lip shake, without a jump. And researchers have confirmed that face exchange in videos generally works better than still images.
The fusion advantage of local face exchange in dynamic video is very necessary in a movie scene.
What’s more interesting is that it produces megapixel resolutions. However, the researchers say they have used a progressive approach to pre-training the source video/image, from which the algorithm can extract higher-resolution images. The following image shows that the trained face pixels are much higher than the results of untrained ones.
The researchers say a new algorithm based on high-definition resolution and local fusion technology maximizes the use of face switching in movies. In addition to the full face exchange of stand-in actors, if you need to portray an age-getting task or an old man who is in his twilight years, you can add subtle wrinkles, hairstyles and postures to the character as needed.
In addition, it can be replaced with other works on the show, although the background and lighting of the original video can be specially treated so that he can fit into the movie scene. It is also a new way to distinguish it from traditional post-production.
The latest algorithm based on comb model
So how does this AI face-change technology work? Let’s start with a complete set of face-changing path maps:
Steps 1 and 2: Facial recognition of the source image, feature extraction, and standardized tailoring (1024×1024);
Step 3: The image is entered into the universal encoder for model training.
Step 4: Multi-band mixing of the decoded output image with the target that needs to be matched, and finally get the effect plot after face exchange.
The training model of the universal encoder is a key one, where the researchers used a progressive comb-type network structure (Comb Model) facial exchange, which is primarily achieved through domain transfer. We use a universal encoder to embed preprocessed images into the potential space of the share, and then use the corresponding decoder to map the embedded embedded into the pixel space. Usually domain transfers switch between the two spaces, but in this article, the researchers expand a new approach.
As we can see in our diagram, the encoder-processed image is branched by the decoder into the P domain, and the researchers turn this architecture into a comb model. Here the encoders are the “teeth” of the combed structure.
Here, a single comb model can handle face fusion for multiple source targets, and it can effectively reduce training time and significantly improve the fidelity of the image compared to the bidirectional model.
As mentioned earlier, model training takes a gradual approach. The process is based on low-resolution images, which are then lower-resolution, and then gradually input into high-resolution in training, gradually expanding the capacity of the network, resulting in high-fidelity images.
However, it is important to note that the image resolution of the final output is limited by the image resolution of the original dataset. If the dataset lacks high resolution, the image can be preprocessed in a super-resolution manner, but it is best to use face-specific SR training methods.
In addition, the researchers describe a mix of comb models and multi-band bands that help maintain light and contrast in a fusion background.
Comparative analysis, the advantages are obvious
The researchers compared the progressive comb model with three open-source face technologies, Deepfake, DeepFaceLab and Nirkin et aI. Among them, The Nirkin et aI uses a three-dimensional variable model that does not require pre-training. The latter two are implemented by y-shaped automatic encoder structure.
The experiment compared five groups of faces. The first two columns are source and target images, which need to be AI fusion, from the later images can be seen, the study model in the detail fusion, image resolution and shadow processing, higher than other algorithm models.
Moreover, its multi-band mixing is significantly better than Poisson blending in eliminating artifacts. DeepFakes and DeepFaceLab are both used in Poisson.
However, there are obvious limitations to the study, such as showing that a stable face exchange is not possible with a person wearing glasses, not because the eyeglasses are not renderable, but because the face cannot be mixed with the surrounding image. The researchers tried to match the input source, but the results were mixed. 、
However, the researchers also explain that in practical applications or movie scenes, there may be little impact.