VFX Voice

The award-winning definitive authority on all things visual effects in the world of film, TV, gaming, virtual reality, commercials, theme parks, and other new media.

Winner of three prestigious Folio Awards for excellence in publishing.

Subscribe to the VFX Voice Print Edition

Subscriptions & Single Issues


January 06
2026

ISSUE

Winter 2026

Overview of Generative Artificial Intelligence (GenAI)

By LUCIEN HARRIOT

Edited for this publication by Jeffrey A. Okun, VES

Abstracted from The VES Handbook of Visual Effects, 4th Edition

Edited by Jeffrey A. Okun, VES, Susan Zwerman, VES and Susan Thurmond O’Neal

Figure 5.2 Image generation from text prompts: “Fantastical creature,” “Futuristic cityscape.”

Understanding Generative AI and Learning Models

Generative AI refers to algorithms that can produce novel content – images, videos, and even entire scenes – based on what they have learned from vast datasets. Unlike traditional tools that rely on hand-crafted rules, GenAI learns patterns from existing material and then extrapolates to form new, unseen results. Early Artificial Intelligence (AI) started in the ‘60s. The past few years has seen increased access to tremendous computational power that has enabled a transformative jump to Large Language Models (LLMs) which, in turn, enabled Generative AI (GenAI) to write text like hu-mans. GenAI technology can be trained on anything digital and at the time of this writing, has the ability to recognize and/or generate sound, images, and video based on large datasets/models trained on billions of data files.

Machine learning is the act of training a model. At its core, AI is simply statistics and probability used to predict. An example of Predictive AI is when one’s phone tries to finish one’s sentences or suggest the next item in one’s social media feed. This technology can now be applied to any data from images to motion capture. Models need vast, diverse datasets to understand a wide range of visual styles, lighting conditions and objects. The broader the dataset, the better the model’s ability to generate high-quality, contextually relevant outputs.

Data Processing and Feature Extraction

Converting visual features into numerical representations means breaking down images and videos into mathematical forms. Before the model can learn, it examines pixels, edges, textures and color distributions to identify patterns and correlations. These insights enable it to reconstruct or generate new visuals that align with the original features.

Creation and Manipulation Capabilities

Users can provide text descriptions or reference images, and the AI will generate new content that reflects these inputs, whether it is a fantastical creature, a futuristic cityscape, or a specific lighting setup. Beyond visuals, certain AI models understand text prompts and use this information to guide image and video creation. This interplay between language and imagery allows for more intuitive creative direction, letting artists describe what they envision and have the AI bring those ideas to life.

Custom Model Training for Specific Elements

Fine-tuning AI models with LoRA (Low-Rank Adaptation) methods allows artists to integrate small, specialized models into larger ones, achieving unique character designs or distinct texture palettes. By training these models on curated datasets, teams can ensure consistent characters, objects and environments across multiple shots and projects.

Inpainting, Outpainting and Object Removal

Inpainting techniques focus on regenerating specific parts of an image. For instance, if a building’s facade requires correction, AI can seamlessly fill in missing details or repair damaged areas. This method is also ideal for removing unwanted elements, such as props, stray equipment or crew members, making cleanup tasks quick and efficient.

Outpainting, conversely, extends an image beyond its original boundaries. When reframing or enlarging a composition, AI intuitively adds new details that blend naturally with the existing lighting, texture and perspective, creating a cohesive and expanded visual.

Purchase your copy here: http://bit.ly/3JnG2yT



Share this post with