Skip to content

wip(neuron): add neuron integration for SFT#5125

Draft
michaelbenayoun wants to merge 7 commits intohuggingface:mainfrom
michaelbenayoun:neuron_integration
Draft

wip(neuron): add neuron integration for SFT#5125
michaelbenayoun wants to merge 7 commits intohuggingface:mainfrom
michaelbenayoun:neuron_integration

Conversation

@michaelbenayoun
Copy link
Member

@michaelbenayoun michaelbenayoun commented Feb 18, 2026

What does this PR do?

Reference script

MODEL_NAME=Qwen/Qwen3-0.6B
NUM_PROC=4
export TP_SIZE=4

echo "Running SFT with the following configuration:"
echo "Model Name: $MODEL_NAME"
echo "Number of Processes: $NUM_PROC"
echo "Tensor Parallel Size: $TP_SIZE"

export TORCH_NEURONX_ENABLE_STABLEHLO=0
export ON_NEURON_EAGER=1
export TORCH_NEURONX_MLIR_ATEN_OPS=1
export TORCH_NEURONX_NEFF_CACHE_DIR="/home/ubuntu/neff_cache"
export TORCH_NEURONX_NEFF_LOCAL_CACHE_DIR="/home/ubuntu/neff_local_cache/"
export ON_NEURON=1
export TENSOR_DUMPER_OUTPUT_DIR="./tensor_dumps_neuron"

export TORCH_NEURONX_FALLBACK_ONLY_FOR_UNIMPLEMENTED_OPS=1

export OMP_NUM_THREADS=128

# Disable asynchronous loading to avoid hanging, investigate why it does work in async
export HF_DEACTIVATE_ASYNC_LOAD=1 

uv run torchrun --nproc_per_node=$NUM_PROC examples/scripts/sft_neuron.py \
    --model_name_or_path $MODEL_NAME \
    --dataset_name trl-lib/Capybara \
    --learning_rate 2.0e-5 \
    --num_train_epochs 1 \
    --packing \
    --bf16 \
    --fp16 false \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --eos_token '<|im_end|>' \
    --eval_strategy no \
    --logging_steps 1 \
    --use_peft false \
    --lora_r 32 \
    --lora_alpha 16 \
    --report_to wandb \
    --output_dir $MODEL_NAME-SFT

Changes needed to integrate torch_neuron and HF ecosystem well:

Transformers

Accelerate

Kernels

Kernels integration to be tested once huggingface/kernels#285 is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant