| title | Fine-tune a model |
|---|---|
| description | Learn how to fine-tune a large language model on Runpod using Axolotl. |
import { InferenceTooltip } from "/snippets/tooltips.jsx";
Fine-tuning is the process of taking a pre-trained large language model (LLM) and further training it on a smaller, specific dataset. This process adapts the model to a particular task or domain, improving its performance and accuracy for your use case.
This guide explains how to use Runpod's fine-tuning feature, powered by Axolotl, to customize an LLM. You'll learn how to select a base model, choose a dataset, configure your training environment, and deploy your fine-tuned model.
For more information about fine-tuning with Axolotl, see the Axolotl Documentation.
Before you begin, you'll need:
- A Runpod account.
- (Optional) A Hugging Face account and an access token if you plan to use gated models or upload your fine-tuned model.
The base model is the starting point for your fine-tuning process, while the dataset provides the specific knowledge needed to adapt the base model to your task.
You can choose from thousands of models and datasets on Hugging Face.
Navigate to the [Fine-Tuning](https://console.runpod.io/fine-tuning) section in the Runpod console. In the **Base Model** field, enter the Hugging Face model ID. In the **Dataset** field, enter the Hugging Face dataset ID.If this is your first time fine-tuning and you're just experimenting, try:
# Base model
TinyLlama/TinyLlama_v1.1
# Dataset (alpaca)
mhenrichsen/alpaca_2k_testClick Connect and choose your preferred connection method:
- Jupyter Notebook: A browser-based notebook interface.
- Web Terminal: A browser-based terminal.
- SSH: A secure connection from your local machine.
Your training environment is located in the /workspace/fine-tuning/ directory and has the following structure:
/examples/ contains sample configurations and scripts, /outputs/ contains your training results and model outputs, and /config.yaml/ is the main configuration file for your training parameters.
The system generates an initial config.yaml based on your selected base model and dataset. This is where you define all the hyperparameters for your fine-tuning job. You may need to experiment with these settings to achieve the best results.
Navigate to the fine-tuning directory (/workspace/fine-tuning/) and open the configuration file (config.yaml) in JupyterLab or your preferred text editor to review and adjust the fine-tuning parameters.
If you're using the web terminal, the fine-tuning directory should open automatically. Use nano to edit the config.yaml file:
nano config.yamlThe config.yaml file will look something like this (base_model and datasets will be replaced with the model and dataset you selected in Step 2):
adapter: lora
base_model: TinyLlama/TinyLlama_v1.1
bf16: auto
datasets:
- path: mhenrichsen/alpaca_2k_test
type: null
gradient_accumulation_steps: 1
learning_rate: 0.0002
load_in_8bit: true
lora_alpha: 16
lora_dropout: 0.05
lora_r: 8
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
micro_batch_size: 16
num_epochs: 1
optimizer: adamw_bnb_8bit
output_dir: ./outputs/mymodel
sequence_len: 4096
train_on_inputs: falseHere's a breakdown of the config.yaml file:
Model and precision:
base_model: The base model you want to fine-tune.bf16: auto: This tells the GPU to use Bfloat16 precision if it can. It’s more stable than standard FP16 and helps prevent the model's math from "overflowing" (exploding) during training.load_in_8bit: true: This is a memory-saving trick. It squashes the base model weights into 8 bits so it takes up less VRAM, allowing you to train on smaller GPUs.
LoRA settings:
lora_r: 8: The rank of the LoRA adapter. 8 is a standard starting point; higher numbers (like 16 or 32) let the model learn more complex patterns but use more VRAM.lora_alpha: 16: This scales the learned weights.lora_target_modules: This list tells Axolotl exactly which parts of the Transformer architecture to attach the adapters to.
Dataset logic
path: Where the data is coming from (Hugging Face).type: null: This tells Axolotl how to format the text into prompts.
Training mechanics
micro_batch_size: 16: How many examples the GPU processes at a single time.gradient_accumulation_steps: 1: How many batches to "save up" before actually updating the model's weights.learning_rate: 0.0002: How fast the model changes. Too high and it "forgets" everything; too low and it never learns.optimizer: adamw_bnb_8bit: A special version of the AdamW optimizer that uses 8-bit math to save even more VRAM.
The dataset type is set to null by default. You'll need to change this value depending on the dataset you selected. For example, if you selected the mhenrichsen/alpaca_2k_test dataset, you'll need to change type: null to type: alpaca to load the dataset correctly.
Once you've changed the dataset type, save the file (config.yaml) and continue to the next step.
If you're not sure what dataset type to use, you can find an overview of common dataset types below:
chat_template for chat-based datasets:
{
"messages" : [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."}
]
}You'll also need to add the field_messages key to datasets to specify the field that contains the messages:
datasets:
- path: your/dataset
type: chat_template
field_messages: messagescompletion for raw text datasets:
{
"text": "The quick brown fox jumps over the lazy dog."
}input_output for template-free datasets:
{
"input": "User: What is the capital of France?\nAssistant: ",
"output": "The capital is Paris.</s>"
}alpaca for instruction-following datasets:
{
"instruction": "Summarize the following text.",
"input": "The sun is a star at the center of the Solar System.",
"output": "The sun is the central star of our solar system."
}sharegpt for conversational datasets:
{
"conversations": [
{
"from": "human",
"value": "What are the three laws of thermodynamics?"
},
{
"from": "gpt",
"value": "1. Energy cannot be created or destroyed. 2. Entropy always increases. 3. Absolute zero cannot be reached."
}
]
}You'll also need to add the conversation key to datasets to specify the name of the list field that contains the messages:
datasets:
- path: your/dataset
type: sharegpt
conversation: conversationsOnce you're satisfied with your configuration, you can start the training.
Run the following command in your terminal:
axolotl train config.yamlMonitor the training progress in your terminal. The output will show the training loss, validation loss, and other metrics.
Once the fine-tuning process is complete, you can test the capabilities of your fine-tuned model with vLLM.
For example, to serve the fine-tuned TinyLlama model used in the examples above, you would follow these steps:
To serve your fine-tuned model, run the following command:
vllm serve TinyLlama/TinyLlama_v1.1 --enable-lora --lora-modules my-adapter=/workspace/fine-tuning/outputs/mymodel --port 8000To test your model, first you'll need to start a new terminal window, tab, or pane.
If you're using the web terminal, tmux is already installed, and you can create a new horizontal pane by running:
tmux split-window -hIn the new window/tab/pane, you can send a test request to the vLLM server using curl:
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-adapter",
"prompt": "### Instruction:\nExplain gravity in one sentence.\n\n### Response:\n",
"max_tokens": 50
}'You should see the response from your model in the terminal.
After the fine-tuning process is complete, you can upload your model to the Hugging Face Hub to share it with the community or use it in your applications.
Run this command to log in to your Hugging Face account:
huggingface-cli loginhuggingface-cli upload YOUR_USERNAME/MODEL_NAME ./outputs/mymodelReplace YOUR_USERNAME with your Hugging Face username and MODEL_NAME with your desired model name.
Now that you've successfully fine-tuned a model, you can deploy it for inference using Runpod Serverless. If you've uploaded your model to Hugging Face, you can deploy it as a cached model to reduce cost and cold start times.