VOLUMETRIC HUMANS CONTINUE TO EVOLVE

By TREVOR HOGG

Reshaping how volumetric humans are brought to life digitally and realistically are NeRFs, multi-camera arrays, AI motion synthesis and dynamic texture systems. These technological advances, among many others, have allowed for more than proxy versions to appear in previs and have elevated the creation of indistinguishable stunt doubles and crowd simulations.

Certain fundamentals have to be kept in mind when producing volumetric humans. “We are most concerned that we are capturing the subtle nuance of the actors’ performances,” states Kimball Thurston, CTO at Wētā FX. “The slight eye twitch, crinkle of a nose, whatever it might be that then allows us to transport the emotion from the actor to the virtual character they are portraying, even if it is themselves in a situation that would be unsafe or impossible. With that, we focus all our work to ensure we have multiple avenues to capture results, not only in real-time while on stage, but also afterwards, with higher-resolution processing that refines the raw images for better detail. That then enables us to represent those details better, more accurately, and allows additional artists to help ensure that the performance is being accurately transported into the final scene.”

Over the years, the methodology, workflow, technology and techniques have changed. “The FACS system still provides a good reference language to talk about human facial performance, which we used for Gollum 25 years ago, but the technology we deploy today can represent fine details not possible with FACS alone,” Thurston remarks. “We are also able to work in a wider variety of environments when filming performance capture. Along with that, it has also become clearer what works and doesn’t in the workflow that assists artists in transforming performances, even down to how we apply reference footage. While facial performance is often the focus, innovations in capturing motion of the rest of the body also evolve every day as new methods become available, allowing a more comfortable experience for actors. Additionally, the ability to provide better real-time feedback while on set continues to help enable directors to make decisions faster and have confidence in what they have in the can.”

FACS capture of faces was a key moment that occurred a quarter century ago, as well as motion capture innovations over the past three decades. “They resulted in a steady stream of improvements, from the quality of capturing the rest of the body to even integrating costumes,” Thurston notes. “However, more recently, with the use of machine learning for computer vision tasks, the quality of ‘markerless’ capture is approaching film quality. This will truly unlock volumetric humans, where we would no longer be talking just about a face capture and a body rig, but the ability to capture the unique, subtle deformations and skin textures more easily.” The pipeline is shrinking to help with previs and postvis. Thurston states, “The faster you can create the data you need, the easier it is. For those early-stage portions of the movie-making process, there’s sometimes a decision to sacrifice quality for speed. This results in a pipeline that can capture the data and produce a fast thing fast, with all the data running a slower process trailing behind, so ideally you would try to push the technology to merge the two processes to just be fast.”

The hologram of Jor-El and Lara Lor-Van in *Superman* was achieved by Framestore through the volume-rendering technique known as Gaussian splatting. (Image courtesy of Framestore and Warner Bros. Pictures)

Given that the hologram message in *Superman* gets played five different times at various scales, Framestore opted to utilize Gaussian splatting. (Image courtesy of Framestore and Warner Bros. Pictures)

Digital Domain used Masquerade facial capture to bring Thanos to life for *Avengers: Endgame*. (Image courtesy of Digital Domain and Marvel Studios)

When Framestore was approached to create a hologram of Jor-El and Lara Lor-Van talking to their son Kal-El in the Fortress of Solitude for Superman, the decision was made to explore a volume-rendering technique called Gaussian splatting, which generates more complete point clouds to produce highly photorealistic 3D models. “We wanted to shoot a two-minute continuous take of these two parents delivering dialogue to their child in this heartfelt emotional scene,” remarks Kevin Sears, CG Supervisor at Framestore. “We didn’t know from how many different angles and how big these holograms were going to be on set.” The message gets played five different times at various scales. “That could have been done with motion control, but it would have been time-consuming, and it also means that you are locked into exactly what that camera move was. The nature of the film required a lot of wide-angle lenses, so we didn’t want to have to put a big camera in front of the actors’ faces while they delivered this performance. Also, the message had to be corrupt and unfinished. Superman doesn’t know what the second half of the message says until later in the film.”

Framestore partnered with Infinite Realities, a scanning company, which has a rig with 192 cameras. “Instead of a normal photogrammetry rig, where an actor stands and gets captured, now all of those cameras are doing live video feedback rather than just one snapshot,” Sears explains. “They’re capturing the two-minute performance, and every camera that sees a certain pixel on the subject is capturing that one point in space in a multitude of different values and colors from all of these different camera angles.” With the assistance of machine learning, Framestore was able to extrapolate between two different points to rasterize and visualize them at high fidelity. “What ended up happening is we had this dense point cloud per frame that is changing, of the actors doing this dialogue, and we could cut it up in editorial, then scale it and put the camera down here or look up at them, or animate the cameras to swing around their head and change the focal length. At 4K, we could get hologram images that looked like live video footage because they had so much detail. For the glitching and creative exercise of making them stutter, we used phrases like fuzzing and fritzing. We were using a plug-in that was available in Houdini to take this point cloud and slice it into pieces and partition it. Now, we could animate whole sections of the human body and rotate them almost like an anatomy textbook. That was an interesting way to treat a hologram because it was a true 3D representation of a human who was glitching and rotating.”

“If our clients agree… we are now incorporating GenAI into post-workflows within our previs/postvis pipeline. This allows us to implement improved lip sync, more realistic digital humans, and achieve much higher fidelity real-time renders. These advancements help us focus on the story, minimizing distracting visual elements that could otherwise take early audiences out of the experience.”

—Jan Philip Cramer, Visual Effects Supervisor, Digital Domain

The ambition of previs is growing larger. “There are virtually no constraints on what can be achieved,” states Jan Philip Cramer, Visual Effects Supervisor at Digital Domain. “While in the past you would shy away from very complicated take-overs or close detail studies of CG to live-action interaction, these can now be done to a higher difficulty level. On a technical side, and if our clients agree to the use of GenAI, we are now incorporating GenAI into post-workflows within our previs/postvis pipeline. This allows us to implement improved lip sync, more realistic digital humans, and achieve much higher fidelity real-time renders. These advancements help us focus on the story, minimizing distracting visual elements that could otherwise take early audiences out of the experience.” AI, machine learning and real-time game engines have influenced the animation process. “In animation, we utilize AI tools for generic motion and help with crowd work, but the hero action is still handcrafted by an artist. For crowd assets, we are able to generate the crowds via AI workflows and leverage pictures to model tools to speed up the modeling process, but this is not yet possible for hero creatures. On the other hand, Unreal Engine provides a lot of great integration into our pipeline, from planning and scouting to mocap re-targeting.”

“Currently, there is a huge push to use Generative AI in production,” Cramer observes. “This can affect every department and will also heavily impact the creative process over the next few years. The goal for most companies will be to generate and modify designs, implement them with motion in video form, and use tools to convert them into production-ready assets. However, there are also significant restrictions and sensitivities around the use of GenAI, especially from clients who are concerned about data privacy, intellectual property and ethical implications. Navigating these restrictions while still leveraging the power of AI will be a key challenge for the industry moving forward. It will be critical in all this not to lose touch with the creative, artistic input. Curation and artist-driven input will remain essential, as filmmakers and creatives will seek to control details that AI cannot yet master. Positioning yourself as the creative force behind AI will be a critical aspect of this development. No one wants to work with just prompters; they will want to get ideas from animators and lighting artists. Using tools more effectively than ever before allows us to merge technology and artistry.”

Using a plug-in that was available in Houdini, Framestore was able to take the point cloud generated by the Deus system and slice it in pieces and partition it. (Image courtesy of Framestore)

Infinite Realities’ AeonX full-body motion-scanning system, Deus, utilizes Ximea’s machine-vision cameras and Sony Pregius sensors. (Image courtesy of Framestore)

Every camera sees a certain pixel on the subject and captures that one point in space in a multitude of different values and colors from various angles. (Image courtesy of Framestore)

There is certainly room for improvement when producing volumetric humans. “The more we can do to get actors out of specialized suits for capture and have them performing in their character’s costume, while still capturing their precise motion and the physics of the cape or other feature they might be wearing, the more natural and engaging the performance we will be in the movie,” Thurston believes. “There is a balance of increasing fidelity, inferring motion with costumes partially obscuring, and increasing the ease of capture, even if that is just setup or prep time in makeup, it would allow for better performances, and that is what we all strive for.” The shifting landscape of media consumption is driving creative and technical trends. Thurston remarks, “With some of the recent investments, you can see experiments where studios are looking to expand what new and shifting media consumption patterns mean and how to provide more immersive worlds. This can enable creatives and artists to create big, giant worlds that inspire entire generations of people, while also allowing everyone to create their own content [UGC] based on that world, contributing a small piece of themselves to that larger world. If this does become a sustaining trend from a creative perspective, some of the volumetric human technology we hint at here will enable that UGC to feel more integrated and less ‘fake.’”

VOLUMETRIC HUMANS CONTINUE TO EVOLVE

You May Also Like:

New Tech in VFX: Fall Edition

UNCOMPLICATING HOW TO BUILD COMPLEX VFX ASSETS

New VFX Tools, and How VFX Pros Are Using Them