By LUCIEN HARRIOT
Edited for this publication by Jeffrey A. Okun, VES
Abstracted from The VES Handbook of Visual Effects, 4th Edition
Edited by Jeffrey A. Okun, VES, Susan Zwerman, VES and Susan Thurmond O’Neal
By LUCIEN HARRIOT
Edited for this publication by Jeffrey A. Okun, VES
Abstracted from The VES Handbook of Visual Effects, 4th Edition
Edited by Jeffrey A. Okun, VES, Susan Zwerman, VES and Susan Thurmond O’Neal


Understanding Generative AI and Learning Models
Generative AI refers to algorithms that can produce novel content – images, videos, and even entire scenes – based on what they have learned from vast datasets. Unlike traditional tools that rely on hand-crafted rules, GenAI learns patterns from existing material and then extrapolates to form new, unseen results. Early Artificial Intelligence (AI) started in the ‘60s. The past few years has seen increased access to tremendous computational power that has enabled a transformative jump to Large Language Models (LLMs) which, in turn, enabled Generative AI (GenAI) to write text like hu-mans. GenAI technology can be trained on anything digital and at the time of this writing, has the ability to recognize and/or generate sound, images, and video based on large datasets/models trained on billions of data files.
Machine learning is the act of training a model. At its core, AI is simply statistics and probability used to predict. An example of Predictive AI is when one’s phone tries to finish one’s sentences or suggest the next item in one’s social media feed. This technology can now be applied to any data from images to motion capture. Models need vast, diverse datasets to understand a wide range of visual styles, lighting conditions and objects. The broader the dataset, the better the model’s ability to generate high-quality, contextually relevant outputs.
Data Processing and Feature Extraction
Converting visual features into numerical representations means breaking down images and videos into mathematical forms. Before the model can learn, it examines pixels, edges, textures and color distributions to identify patterns and correlations. These insights enable it to reconstruct or generate new visuals that align with the original features.
Creation and Manipulation Capabilities
Users can provide text descriptions or reference images, and the AI will generate new content that reflects these inputs, whether it is a fantastical creature, a futuristic cityscape, or a specific lighting setup. Beyond visuals, certain AI models understand text prompts and use this information to guide image and video creation. This interplay between language and imagery allows for more intuitive creative direction, letting artists describe what they envision and have the AI bring those ideas to life.
Custom Model Training for Specific Elements
Fine-tuning AI models with LoRA (Low-Rank Adaptation) methods allows artists to integrate small, specialized models into larger ones, achieving unique character designs or distinct texture palettes. By training these models on curated datasets, teams can ensure consistent characters, objects and environments across multiple shots and projects.
Inpainting, Outpainting and Object Removal
Inpainting techniques focus on regenerating specific parts of an image. For instance, if a building’s facade requires correction, AI can seamlessly fill in missing details or repair damaged areas. This method is also ideal for removing unwanted elements, such as props, stray equipment or crew members, making cleanup tasks quick and efficient.
Outpainting, conversely, extends an image beyond its original boundaries. When reframing or enlarging a composition, AI intuitively adds new details that blend naturally with the existing lighting, texture and perspective, creating a cohesive and expanded visual.
Purchase your copy here: http://bit.ly/3JnG2yT