docs/fine-tune.mdx at main · runpod/docs

title	Fine-tune a model
description	Learn how to fine-tune a large language model on Runpod using Axolotl.

import { InferenceTooltip } from "/snippets/tooltips.jsx";

Fine-tuning is the process of taking a pre-trained large language model (LLM) and further training it on a smaller, specific dataset. This process adapts the model to a particular task or domain, improving its performance and accuracy for your use case.

This guide explains how to use Runpod's fine-tuning feature, powered by Axolotl, to customize an LLM. You'll learn how to select a base model, choose a dataset, configure your training environment, and deploy your fine-tuned model.

For more information about fine-tuning with Axolotl, see the Axolotl Documentation.

Requirements

Before you begin, you'll need:

A Runpod account.
(Optional) A Hugging Face account and an access token if you plan to use gated models or upload your fine-tuned model.

Select a base model and dataset

The base model is the starting point for your fine-tuning process, while the dataset provides the specific knowledge needed to adapt the base model to your task.

You can choose from thousands of models and datasets on Hugging Face.

Deploy a fine-tuning Pod

Navigate to the [Fine-Tuning](https://console.runpod.io/fine-tuning) section in the Runpod console. In the **Base Model** field, enter the Hugging Face model ID. In the **Dataset** field, enter the Hugging Face dataset ID.

If this is your first time fine-tuning and you're just experimenting, try:

# Base model
TinyLlama/TinyLlama_v1.1

# Dataset (alpaca)
mhenrichsen/alpaca_2k_test

If you're using a gated model that requires special access, generate a Hugging Face token with the necessary permissions and add it to the **Hugging Face Access Token** field. Click **Deploy the Fine-Tuning Pod** to start configuring your fine-tuning Pod. Select a GPU instance based on your model's requirements. Larger models and datasets require GPUs with more memory. Finishing configuring the Pod, then click **Deploy on-demand**. This should open the detail pane for your Pod automatically. Click **Logs** to monitor the system logs for deployment progress. Wait for the success message: `"You've successfully configured your training environment!"` Depending on the size of your model and dataset, this may take some time. Once your training environment is ready, you can connect to it to configure and start the fine-tuning process.

Click Connect and choose your preferred connection method:

Jupyter Notebook: A browser-based notebook interface.
Web Terminal: A browser-based terminal.
SSH: A secure connection from your local machine.

To use SSH, add your public SSH key in your account settings. The system automatically adds your key to the Pod's `authorized_keys` file. For more information, see [Connect to a Pod with SSH](/pods/configuration/use-ssh).

Configure your environment

For a list of working configuration examples, check out the [Axolotl examples repository](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples) (also available in your training environment at `/workspace/fine-tuning/examples/`).

Your training environment is located in the /workspace/fine-tuning/ directory and has the following structure:

/examples/ contains sample configurations and scripts, /outputs/ contains your training results and model outputs, and /config.yaml/ is the main configuration file for your training parameters.

The system generates an initial config.yaml based on your selected base model and dataset. This is where you define all the hyperparameters for your fine-tuning job. You may need to experiment with these settings to achieve the best results.

Navigate to the fine-tuning directory (/workspace/fine-tuning/) and open the configuration file (config.yaml) in JupyterLab or your preferred text editor to review and adjust the fine-tuning parameters.

If you're using the web terminal, the fine-tuning directory should open automatically. Use nano to edit the config.yaml file:

nano config.yaml

The config.yaml file will look something like this (base_model and datasets will be replaced with the model and dataset you selected in Step 2):

adapter: lora
base_model: TinyLlama/TinyLlama_v1.1
bf16: auto
datasets:
- path: mhenrichsen/alpaca_2k_test
  type: null
gradient_accumulation_steps: 1
learning_rate: 0.0002
load_in_8bit: true
lora_alpha: 16
lora_dropout: 0.05
lora_r: 8
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
micro_batch_size: 16
num_epochs: 1
optimizer: adamw_bnb_8bit
output_dir: ./outputs/mymodel
sequence_len: 4096
train_on_inputs: false

Here's a breakdown of the config.yaml file:

Model and precision:

base_model: The base model you want to fine-tune.
bf16: auto: This tells the GPU to use Bfloat16 precision if it can. It’s more stable than standard FP16 and helps prevent the model's math from "overflowing" (exploding) during training.
load_in_8bit: true: This is a memory-saving trick. It squashes the base model weights into 8 bits so it takes up less VRAM, allowing you to train on smaller GPUs.

LoRA settings:

lora_r: 8: The rank of the LoRA adapter. 8 is a standard starting point; higher numbers (like 16 or 32) let the model learn more complex patterns but use more VRAM.
lora_alpha: 16: This scales the learned weights.
lora_target_modules: This list tells Axolotl exactly which parts of the Transformer architecture to attach the adapters to.

Dataset logic

path: Where the data is coming from (Hugging Face).
type: null: This tells Axolotl how to format the text into prompts.

You'll need to change this value depending on the dataset you selected—see the next step for details. * **`train_on_inputs: false`**: This is a smart setting. It tells the model: *"Don't try to predict the user's question; only learn how to predict the assistant's answer."* This focuses the "learning energy" on the responses. * **`sequence_len: 4096`**: The maximum length of text the model can "read" at once.

Training mechanics

micro_batch_size: 16: How many examples the GPU processes at a single time.
gradient_accumulation_steps: 1: How many batches to "save up" before actually updating the model's weights.
learning_rate: 0.0002: How fast the model changes. Too high and it "forgets" everything; too low and it never learns.
optimizer: adamw_bnb_8bit: A special version of the AdamW optimizer that uses 8-bit math to save even more VRAM.

The dataset type is set to null by default. You'll need to change this value depending on the dataset you selected. For example, if you selected the mhenrichsen/alpaca_2k_test dataset, you'll need to change type: null to type: alpaca to load the dataset correctly.

Once you've changed the dataset type, save the file (config.yaml) and continue to the next step.

If you're not sure what dataset type to use, you can find an overview of common dataset types below:

chat_template for chat-based datasets:

{
 "messages" : [
   {"role": "user", "content": "What is the capital of France?"},
   {"role": "assistant", "content": "The capital of France is Paris."}
 ]
}

You'll also need to add the field_messages key to datasets to specify the field that contains the messages:

datasets:
  - path: your/dataset
    type: chat_template
    field_messages: messages

completion for raw text datasets:

{
  "text": "The quick brown fox jumps over the lazy dog."
}

input_output for template-free datasets:

{
  "input": "User: What is the capital of France?\nAssistant: ",
  "output": "The capital is Paris.</s>"
}

alpaca for instruction-following datasets:

{
 "instruction": "Summarize the following text.",
 "input": "The sun is a star at the center of the Solar System.",
 "output": "The sun is the central star of our solar system."
}

sharegpt for conversational datasets:

{
 "conversations": [
   {
     "from": "human",
     "value": "What are the three laws of thermodynamics?"
   },
   {
     "from": "gpt",
     "value": "1. Energy cannot be created or destroyed. 2. Entropy always increases. 3. Absolute zero cannot be reached."
   }
 ]
}

You'll also need to add the conversation key to datasets to specify the name of the list field that contains the messages:

datasets:
  - path: your/dataset
    type: sharegpt
    conversation: conversations

Start the fine-tuning process

Once you're satisfied with your configuration, you can start the training.

Run the following command in your terminal:

axolotl train config.yaml

Monitor the training progress in your terminal. The output will show the training loss, validation loss, and other metrics.

Test your model with vLLM

Once the fine-tuning process is complete, you can test the capabilities of your fine-tuned model with vLLM.

For example, to serve the fine-tuned TinyLlama model used in the examples above, you would follow these steps:

To serve your fine-tuned model, run the following command:

vllm serve TinyLlama/TinyLlama_v1.1 --enable-lora --lora-modules my-adapter=/workspace/fine-tuning/outputs/mymodel --port 8000

To test your model, first you'll need to start a new terminal window, tab, or pane.

If you're using the web terminal, tmux is already installed, and you can create a new horizontal pane by running:

tmux split-window -h

In the new window/tab/pane, you can send a test request to the vLLM server using curl:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-adapter",
    "prompt": "### Instruction:\nExplain gravity in one sentence.\n\n### Response:\n",
    "max_tokens": 50
  }'

You should see the response from your model in the terminal.

Push your model to Hugging Face

After the fine-tuning process is complete, you can upload your model to the Hugging Face Hub to share it with the community or use it in your applications.

Run this command to log in to your Hugging Face account:

huggingface-cli login

To upload your model files to the Hugging Face Hub, run this command:

huggingface-cli upload YOUR_USERNAME/MODEL_NAME ./outputs/mymodel

Replace YOUR_USERNAME with your Hugging Face username and MODEL_NAME with your desired model name.

Next steps

Now that you've successfully fine-tuned a model, you can deploy it for inference using Runpod Serverless. If you've uploaded your model to Hugging Face, you can deploy it as a cached model to reduce cost and cold start times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requirements

Select a base model and dataset

Deploy a fine-tuning Pod

Configure your environment

Start the fine-tuning process

Test your model with vLLM

Push your model to Hugging Face

Next steps

FilesExpand file tree

fine-tune.mdx

Latest commit

History

fine-tune.mdx

File metadata and controls

Requirements

Select a base model and dataset

Deploy a fine-tuning Pod

Configure your environment

Start the fine-tuning process

Test your model with vLLM

Push your model to Hugging Face

Next steps