THA3-Driven Expressive Facial Animation

Speech-to-animation pipeline with emotion, visemes, and upper-body dynamics — powered by THA3 + LLMs

Goal

Create compelling and emotionally expressive 2D character animations from spoken text using the THA3 model. Each animation mimics the natural rhythm of speech by blending:

Lip-sync (visemes + coarticulation)
Emotion (facial expression, eye shape, mouth tweaks)
Subtle body motion (posture, head tilt, breathing)
Timing finesse (blinks, focus shifts, emphasis)

This system produces only 10–20 expressive keyframes per sentence, which are then smoothly interpolated, achieving high emotional fidelity with low frame count.

What Makes It Different?

Unlike traditional talking-heads animation that rely purely on phoneme-to-mouth mapping, this system:

Embeds emotion and tone via brows, eyes, head/body
Adds gaze dynamics, blinks, breathing
Uses a viseme+emotion composition model
Follows coarticulation rules and speech rhythm
Generates animation plans using a language model prompt grounded in animation principles

Features

🎙 LLM-driven animation planning (Gemini 2.5)
👄 Viseme-aligned lip sync using standard phoneme groups
🤨 Emotional mouth/brow overlays (smirk, smile, raised_corner, etc.)
👀 Eye blinking, iris tracking, gaze aversion
🫁 Breathing modulation linked to emotion
💃 Head tilt, posture shift, lean-in emphasis
🧘 Relaxed idle keyframes at start and end
🔁 Interpolated motion using smoothstep easing
🎞 Output as GIF, built with THA3-compatible pose vectors

Prompt Design & Research Rationale

1. Phoneme → Viseme Mapping

Based on established speech animation literature (e.g., Parke & Waters, 1996), phonemes are grouped into visual equivalents:

Phoneme Group	Viseme Type
AA, AH, AO, AE	`aaa`
IY, IH	`iii`
UW, UH	`uuu`
EH, EY	`eee`
OW, OY	`ooo`
M, B, P, N, D, T	`delta`
Silences / Pauses	`delta`

These are mapped to the mouth_type parameter used by THA3.

2. Emotion Mapping

Facial expressions are derived from punctuation, tone, and semantic cues:

! → excitement, use happy_wink, raised_corner, fast breathing
. → neutral or relaxed, soft blink, lower breathing
? → questioning, with raised eyebrows, head tilt (neck_z), and surprised eyes

This mimics animation acting principles such as "anticipation", "accent", and "reaction".

3. Coarticulation Rules

To avoid robotic frame-to-frame switching:

Consecutive visemes blend smoothly
Emotional mouth shapes (e.g., smile) are layered over visemes
Mouth_left and mouth_right allow asymmetric shaping during transition

4. Head, Neck & Body Language

Inspired by real-time avatar animation (e.g., VRM, VUP), movement cues are used to simulate attention and expression:

Component	Purpose
`head_y`	nodding, emphasis
`neck_z`	head tilt for questions or sarcasm
`body_y`	body sway (left/right weight shift)
`body_z`	lean-in/lean-back posture dynamics

Breathing (breathing) is modulated based on energy or emotion.

5. Eye Dynamics

Blinks and eye focus are critical for human realism.

Expressive blinks every 2–4 seconds
Occasional double blink for emotion beats
Iris rotates subtly ±0.1–0.3 to simulate gaze
eye_type toggles between relaxed, surprised, wink, etc.

6. Keyframe Strategy

Only 10–20 keyframes per sentence, each for:

Viseme switch
Blink
Emotional shift
Gaze change
Posture reweight

All are marked with "interpolate_to_next": true, except final.

Core Components

`create_pose(...)`

Encodes all facial and body parameters (eyebrows, eyes, mouth, iris size/rotation, head tilt, neck twist, body lean, breathing) into a torch.Tensor compatible with the THA3 model.

This function allows:

Simple high-level API use (e.g., mouth_type="aaa", eyebrow_type="raised")
Asymmetric expressions (mouth_left, mouth_right)
Pose exaggeration (via a float multiplier)
Validation and clamping within THA3’s pose parameter ranges

`interpolate_poses(...)`

Smoothly interpolates between each pair of keyframes using cubic easing (smoothstep). This:

Ensures natural coarticulation between visemes
Preserves emotion over time
Prevents robotic “jump cuts” in expression

Each intermediate frame blends:

All numeric values (like head_y, breathing, etc.)
While keeping expression types (eye_type, mouth_type) consistent per step

`animate_sentence(...)`

Main rendering loop that:

Loads a base image (RGBA 512×512)
Applies each interpolated pose using poser.pose(...)
Converts the output frame to PIL and collects all frames
Saves animation as a .gif (typically 2–5 seconds long)

`prompt_template.txt`

This is the master animation scripting prompt, passed to an LLM (Google Gemini). It includes:

Full function interface docstring (for create_pose)
Viseme mapping table
Detailed behavioral rules (emotion, eye direction, head/body motion, breathing, blinks)
Output format structure
Emphasis on “expressive beats” and natural speech timing

`animate.py`

Script that brings everything together:

Renders the prompt using Jinja2
Sends it to the LLM with Gemini API
Parses the returned keyframes
Interpolates and renders the final animation

License & Credits

License: MIT

Credits:

Core animation model and poser logic by THA3
Animation logic and expressive pose authoring developed on top of THA3’s APIs
Prompt-based pose generation powered by Google Gemini LLM
Character portrait from Crypko.ai (non-commercial use only)

Acknowledgments

THA3: Torch-based 2D poser and avatar animation system
Google Gemini: High-speed LLM used for pose planning
Foundational works in animation research:
- Cassell et al., “Animated Conversation” (1994)
- Parke & Waters, “Computer Facial Animation” (1996)
- Beskow, “Rule-based visual speech synthesis” (2003)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.idea		.idea
data		data
tha3		tha3
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
main.py		main.py
prompt_template.txt		prompt_template.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

THA3-Driven Expressive Facial Animation

Goal

What Makes It Different?

Features

Prompt Design & Research Rationale

1. Phoneme → Viseme Mapping

2. Emotion Mapping

3. Coarticulation Rules

4. Head, Neck & Body Language

5. Eye Dynamics

6. Keyframe Strategy

Core Components

`create_pose(...)`

`interpolate_poses(...)`

`animate_sentence(...)`

`prompt_template.txt`

`animate.py`

License & Credits

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

THA3-Driven Expressive Facial Animation

Goal

What Makes It Different?

Features

Prompt Design & Research Rationale

1. Phoneme → Viseme Mapping

2. Emotion Mapping

3. Coarticulation Rules

4. Head, Neck & Body Language

5. Eye Dynamics

6. Keyframe Strategy

Core Components

create_pose(...)

interpolate_poses(...)

animate_sentence(...)

prompt_template.txt

animate.py

License & Credits

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`create_pose(...)`

`interpolate_poses(...)`

`animate_sentence(...)`

`prompt_template.txt`

`animate.py`

Packages