Skip to content

a model that translates text into realistic 3D sign language animations

Notifications You must be signed in to change notification settings

1997MarsRover/Motion-S

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Motion-S: Semantic Driven Sign Motion Generation

Abstract

We introduce Motion-S, a novel framework for text-driven sign language motion generation. Motion-S adapts the generative masked modeling approach of MoMask to the domain of sign language, enabling high-quality 3D sign motion synthesis from natural language descriptions. Our approach employs a hierarchical residual vector quantization (RVQ) scheme to represent sign motions as multi-layer discrete tokens, preserving fine-grained details essential for accurate sign language expression. The framework consists of two key components: a Masked Transformer that generates base-layer motion tokens conditioned on text input through iterative masked token prediction, and a Residual Transformer that progressively refines the motion by predicting residual-layer tokens. This design enables efficient bidirectional generation of sign motions with precise semantic alignment to textual descriptions, making it suitable for applications in accessibility, education, and human-computer interaction.

Transformers

Train Mask for 500 epochs

uv run python -m transformer.train_transformer \
    --vq_path models/rvq_vae_best.pth \
    --epochs 500 \
    --use_amp \
    --batch_size 64 \
    --gradient_accumulation_steps 2 \
    --num_workers 4 \
    --output_dir transformer_checkpoints

Train the residual after for also 500 epochs

uv run python -m transformer.train_transformer \
    --vq_path models/rvq_vae_best.pth \
    --train_residual_only \
    --mask_checkpoint transformer_checkpoints/best_model.pth \
    --residual_epochs 500 \
    --use_amp \
    --batch_size 64 \
    --gradient_accumulation_steps 2 \
    --num_workers 4 \
    --output_dir residual_checkpoints

This is a public excerpt/minimal version of work done at Signvrse

About

a model that translates text into realistic 3D sign language animations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages