MT. Iturbide

Gen AI / Motion Compositing / VFXPersonal Project · 2023

I named a mountain after myself. Starting from a single roadside photograph taken in Mexico, I used Stable Diffusion with ControlNet's Illusion Diffusion to generate mountain terrain shaped like my own face — embedding my likeness into the topography itself. The landscape was then expanded with Generative Fill, populated with rotoscoped wildlife, layered with atmospheric fog, and animated with 2.5D parallax to create a living, breathing composite. MT. Iturbide is both a technical demonstration and a personal statement about identity, land, and origin.

8KCanvas Upscale

15+Composited Layers

2.5DCamera Mapping

7Process Steps

RoleAI Artist / Compositor

OriginRoadside Photo, Mexico

TypeGen AI Motion Composite

AI ToolsGenerative Fill, ControlNet

CompositingAfter Effects

TechniqueIllusion Diffusion

OutputAnimated Landscape

Final Piece

The completed composite - a living landscape with 2.5D parallax camera drift, rotoscoped wildlife, atmospheric fog layers, and a mountain whose topography resolves into a human face when viewed at distance. Seven distinct production stages, unified into a single cohesive piece.

Conceptual Framework

Personal & Cultural Origin

The piece begins with a photograph taken on a roadside in Mexico - a landscape connected to family, heritage, and the physical terrain that carries the Iturbide surname. Embedding a facial likeness into that terrain is not a technical gimmick; it is a deliberate statement about the relationship between identity and land. The mountain becomes a self-portrait, and the title becomes literal: MT. Iturbide.

Generative AI as Craft Tool

This project treats generative AI not as a push-button creative shortcut but as one instrument in a larger compositional pipeline. Every AI-generated element - the outscaled periphery, the face-embedded mountain - required significant manual refinement, color matching, compositing judgment, and animation craft to integrate into a cohesive final piece. The AI accelerates specific steps; the artistic intent and technical execution remain entirely human-directed.

Process Breakdown

Original roadside photograph taken in Mexico - person sitting on rocks on a hillside

Source Photograph

The foundation: a single iPhone photograph taken roadside in Mexico, overlooking a green mountainous hillside from a limestone outcrop. This raw 4x5 capture provides the terrain texture, natural lighting, and compositional anchor that every subsequent generative and compositing operation builds upon. Starting from a real photograph - rather than a fully synthetic image - grounds the final piece in physical reality.

Live Photo Capture

The iPhone's Live Photo mode captured several seconds of ambient motion around the still frame - wind through grass, shifting leaves, clouds drifting across the hillside. This micro-movement data served as the motion reference for the final animated scene, establishing the organic breathing rhythm that prevents the composite from feeling static or artificially generated.

Generative Outfill

Adobe Generative Fill expanded the original 4x5 frame into a substantially wider canvas, synthesizing the surrounding environment while preserving the grain, lighting angle, and botanical detail of the source photograph. The critical constraint: the AI-generated periphery must be indistinguishable from the real captured center. Multiple generation passes with varied seed values were blended to eliminate visible seam boundaries.

Living Elements

Rotoscoped wildlife footage was composited into the landscape - birds crossing the frame, insects, ambient animal movement. Each element was individually color graded, motion tracked, and positioned within the correct parallax depth layer of the environment. This is the step where the composition transitions from a manipulated photograph into a functioning ecosystem with biological presence.

Likeness Source

A self-portrait taken in a cornfield in Mexico provides the facial geometry that ControlNet's Illusion Diffusion model will encode into the mountain's topography. The choice of source image is intentional: a face framed by growing maize, rooted in the same Mexican landscape that the composite depicts. The personal connection between subject and terrain is the conceptual core of the entire piece.

Illusion Diffusion

ControlNet's Illusion Diffusion pipeline uses the facial portrait as a structural conditioning map. The text prompt generates “rocky mountain terrain” while the neural network distributes shadows, ridgelines, and highlights to conform to the face's contours. The result is an aerial mountain vista where the topography subtly resolves into a human face at the right viewing distance - the kind of pareidolia that feels discovered rather than designed. MT. Iturbide, named after itself.

Technical Deep Dive

Illusion Diffusion Architecture

ControlNet models accept the facial likeness as a structural depth map that constrains the diffusion process. The network generates photorealistic mountain terrain while being forced to distribute light, shadow, and geological features along the face's alpha mask boundaries. Multiple generation iterations were evaluated to find the optimal balance between terrain plausibility and facial recognition at distance.

Seamless Generative Outscaling

The primary technical challenge in outscaling was matching the source photograph's granular noise profile, film grain characteristics, and lens distortion across the AI-generated peripheral pixels. Multiple passes with different seed values were composited and blended at the boundaries to eliminate the visible generation seams that typically betray AI-expanded imagery.

2.5D Camera Projection

The flat composite was segmented into discrete depth layers - foreground rocks, midground vegetation, background mountain, sky dome. These layers were displaced in 3D space within After Effects, enabling a virtual camera to drift through the landscape with convincing parallax depth. The motion is subtle and continuous, creating the perception of a three-dimensional environment from purely two-dimensional source material.

Atmospheric Compositing

Fractal noise layers simulate rolling fog at multiple altitude bands. Simulated light rays and particle dust are aligned to a consistent lighting direction across all depth layers. Rotoscoped wildlife is individually color graded to sit naturally within the depth stack. The aggregate effect: every element breathes as a single unified ecosystem rather than a collage of disparate sources.

Tech Stack

Stable DiffusionControlNetIllusion DiffusionAdobe Generative FillAfter EffectsPhotoshopRotoscoping2.5D ParallaxFractal NoiseColor Grading

More Projects