Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ pip install ".[extras]"
```

## Usage <a name="usage"></a>

Check out our [examples directory](examples/) for detailed tutorials!

There are two main stages: **inference** and **pretraining** (optional but
recommended).

Expand Down
156 changes: 156 additions & 0 deletions examples/01_getting_started.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Getting Started with Regress-LM\n",
"\n",
"This tutorial provides a step-by-step walkthrough of how to use Regress-LM for a simple regression task. We will predict house prices from text descriptions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Installation\n",
"!pip install -q -e .\n",
"!pip install -q '.[extras]'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from regress_lm import core\n",
"from regress_lm import rlm\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Create Examples\n",
"\n",
"First, we create a few example (x, y) pairs to fine-tune our model. The input `x` is a text description of a house, and the output `y` is its price."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"examples = [\n",
" core.Example(x='A charming 2-bedroom bungalow with a spacious garden.', y=300000.0),\n",
" core.Example(x='A modern 5-bedroom mansion with a swimming pool.', y=1200000.0),\n",
" core.Example(x='A small studio apartment in the city center.', y=150000.0),\n",
" core.Example(x='A 3-bedroom house in the suburbs with a garage.', y=450000.0),\n",
" core.Example(x='A luxurious penthouse with stunning city views.', y=2000000.0)
",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Initialize and Fine-Tune the Model\n",
"\n",
"Next, we initialize a Regress-LM model from scratch and fine-tune it on our examples."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"reg_lm = rlm.RegressLM.from_scratch(max_input_len=2048)\n",
"reg_lm.fine_tune(examples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Make Predictions\n",
"\n",
"Now that the model is fine-tuned, we can make predictions on new, unseen house descriptions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"queries = [\n",
" core.ExampleInput(x='A cozy 1-bedroom apartment with a balcony.'),\n",
" core.ExampleInput(x='A large 4-bedroom family home with a big yard.')\n",
"]\n",
"\n",
"samples1, samples2 = reg_lm.sample(queries, num_samples=128)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Visualize Results\n",
"\n",
"Finally, we visualize the distribution of predictions for each query."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 6))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"plt.hist(samples1, bins=20, alpha=0.7)\n",
"plt.title('Predictions for: \"A cozy 1-bedroom apartment...\"')\n",
"plt.xlabel('Predicted Price')\n",
"plt.ylabel('Frequency')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"plt.hist(samples2, bins=20, alpha=0.7)\n",
"plt.title('Predictions for: \"A large 4-bedroom family home...\"')\n",
"plt.xlabel('Predicted Price')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
69 changes: 69 additions & 0 deletions examples/02_end_to_end_training.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import argparse
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from torch import optim
from regress_lm import core
from regress_lm.pytorch import model as model_lib

def main(args):
# Load and prepare data
data = pd.read_csv(args.dataset_path)
train_data, val_data = train_test_split(data, test_size=0.2, random_state=args.seed)

train_examples = [
core.Example(x=row[args.text_column], y=row[args.target_column])
for _, row in train_data.iterrows()
]
val_examples = [
core.Example(x=row[args.text_column], y=row[args.target_column])
for _, row in val_data.iterrows()
]

# Initialize model
model = model_lib.PyTorchModelConfig(
architecture_kwargs=dict(num_encoder_layers=6, num_decoder_layers=6)
).make_model()

# Initialize optimizer
optimizer = optim.AdamW(model.parameters(), lr=args.learning_rate)

# Training loop
for epoch in range(args.num_epochs):
model.train()
for i in range(0, len(train_examples), args.batch_size):
batch = train_examples[i:i+args.batch_size]
tensor_examples = model.converter.convert_examples(batch)

optimizer.zero_grad()
loss, _ = model.compute_losses_and_metrics(tensor_examples)
loss.mean().backward()
optimizer.step()

if i % args.log_interval == 0:
print(f"Epoch {epoch+1}/{args.num_epochs}, Batch {i//args.batch_size}, Loss: {loss.mean().item():.4f}")

# Validation
model.eval()
val_losses = []
for i in range(0, len(val_examples), args.batch_size):
batch = val_examples[i:i+args.batch_size]
tensor_examples = model.converter.convert_examples(batch)
loss, _ = model.compute_losses_and_metrics(tensor_examples)
val_losses.append(loss.mean().item())

avg_val_loss = np.mean(val_losses)
print(f"Epoch {epoch+1}/{args.num_epochs}, Validation Loss: {avg_val_loss:.4f}")

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--dataset_path", type=str, required=True, help="Path to the training dataset (CSV).")
parser.add_argument("--text_column", type=str, required=True, help="Name of the column containing text descriptions.")
parser.add_argument("--target_column", type=str, required=True, help="Name of the column containing target values.")
parser.add_argument("--num_epochs", type=int, default=10, help="Number of training epochs.")
parser.add_argument("--batch_size", type=int, default=32, help="Training batch size.")
parser.add_argument("--learning_rate", type=float, default=1e-4, help="Learning rate.")
parser.add_argument("--seed", type=int, default=42, help="Random seed.")
parser.add_argument("--log_interval", type=int, default=10, help="Logging interval.")
args = parser.parse_args()
main(args)
155 changes: 155 additions & 0 deletions examples/03_multi_objective_regression.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Multi-Objective Regression with Regress-LM\n",
"\n",
"This tutorial demonstrates how to use Regress-LM for multi-objective regression, where we predict multiple target values simultaneously."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Installation\n",
"!pip install -q -e .\n",
"!pip install -q ".[extras]""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from regress_lm import core\n",
"from regress_lm import rlm\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Create Multi-Objective Examples\n",
"\n",
"We create examples with multiple target values. For instance, predicting both the price and the size (in square feet) of a house."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"examples = [\n",
" core.Example(x=\'A charming 2-bedroom bungalow with a spacious garden.\', y=[300000.0, 1200.0]),\n",
" core.Example(x=\'A modern 5-bedroom mansion with a swimming pool.\', y=[1200000.0, 5000.0]),\n",
" core.Example(x=\'A small studio apartment in the city center.\', y=[150000.0, 400.0]),\n",
" core.Example(x=\'A 3-bedroom house in the suburbs with a garage.\', y=[450000.0, 2000.0]),\n",
" core.Example(x=\'A luxurious penthouse with stunning city views.\', y=[2000000.0, 3000.0])\n",
"]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Initialize and Fine-Tune the Model\n",
"\n",
"We initialize the model with `max_num_objs=2` to handle two target values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"reg_lm = rlm.RegressLM.from_scratch(max_input_len=2048, max_num_objs=2)\n",
"reg_lm.fine_tune(examples)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Make Predictions\n",
"\n",
"We make predictions on new house descriptions. The model will now output two values for each query."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"queries = [\n",
" core.ExampleInput(x=\'A cozy 1-bedroom apartment with a balcony.\'),\n",
" core.ExampleInput(x=\'A large 4-bedroom family home with a big yard.\')\n",
"]\n",
"\n",
"samples1, samples2 = reg_lm.sample(queries, num_samples=128)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Visualize Results\n",
"\n",
"We visualize the joint distribution of the two predicted values."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"plt.figure(figsize=(12, 6))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"plt.scatter(samples1[:, 0], samples1[:, 1], alpha=0.5)\n",
"plt.title(\'Predictions for: \\"A cozy 1-bedroom apartment...\\\"')\n",
"plt.xlabel(\'Predicted Price')\n",
"plt.ylabel(\'Predicted Size (sq ft)')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"plt.scatter(samples2[:, 0], samples2[:, 1], alpha=0.5)\n",
"plt.title(\'Predictions for: \\"A large 4-bedroom family home...\\\"')\n",
"plt.xlabel(\'Predicted Price')\n",
"\n",
"plt.tight_layout()\n",
"plt.show()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Loading