This repository provides tools and utilities for fine-tuning the HydraGNN Graph Foundation Model (GFM) ensemble on materials science datasets. The framework enables transfer learning from pre-trained graph neural network models to domain-specific tasks.
The Graph Foundation Model (GFM) ensemble is a collection of pre-trained HydraGNN models that can be fine-tuned for various molecular and materials property prediction tasks. This repository includes:
- Utilities for fine-tuning model ensembles
- Example configurations for common datasets (QM9)
- Tools for model adaptation and head configuration
- Data preprocessing utilities
├── README.md
├── examples/
│ └── qm9/
│ ├── ensemble_fine_tune.py # Main fine-tuning script for QM9
│ ├── finetuning_config.json # Configuration for fine-tuning heads
│ └── qm9_preonly.py # QM9 preprocessing script
└── utils/
├── __init__.py
├── ensemble_utils.py # Core fine-tuning utilities
└── update_model.py # Model architecture modification tools
-
HydraGNN: Install the latest version from the main branch:
git clone https://github.com/ORNL/HydraGNN.git cd HydraGNN pip install -e .
-
Python Dependencies: Install the dependencies required by HydraGNN:
- Follow the installation instructions in the HydraGNN repository
- All required dependencies will be installed automatically when you install HydraGNN with
pip install -e .
-
Environment Setup: Update your PYTHONPATH to include both directories:
export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"
Or add these lines to your
.bashrcor.zshrc:export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"
Download the pre-trained GFM ensemble from HuggingFace:
# Download all model checkpoints and configuration files
# Each ensemble member will be fine-tuned independentlyThe model ensemble contains multiple pre-trained models with their respective configuration files organized in a structured directory format.
Important: Before running any scripts, ensure your PYTHONPATH includes both the HydraGNN_GFM_FineTuning4Materials and HydraGNN directories:
# Option 1: Set temporarily for current session
export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"
# Option 2: Add to your shell profile (~/.bashrc or ~/.zshrc)
echo 'export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"' >> ~/.bashrc
source ~/.bashrc-
Navigate to the QM9 example directory:
cd examples/qm9/ -
Prepare your dataset (if not using QM9):
- Prepare your data in the appropriate format
- Update the feature schema in the fine-tuning script if needed
-
Configure fine-tuning parameters:
- Modify
finetuning_config.jsonto specify:- Output head architecture
- Task weights
- Layer dimensions
- Number of tasks
- Modify
-
Run fine-tuning:
python ensemble_fine_tune.py
The fine-tuning process is controlled by JSON configuration files that specify:
- Output Heads: Define the architecture of task-specific prediction heads
- Task Configuration: Specify output dimensions, types, and weights
- Training Parameters: Learning rates, batch sizes, and optimization settings
Example configuration structure:
{
"NeuralNetwork": {
"Architecture": {
"output_heads": {
"graph": [{
"type": "branch-0",
"architecture": {
"dim_pretrained": 50,
"num_sharedlayers": 2,
"dim_sharedlayers": 5,
"num_headlayers": 2,
"dim_headlayers": [50, 25]
}
}]
},
"output_dim": [1],
"output_type": ["graph"]
}
}
}For custom datasets, ensure your data includes:
- Graph Features: Energy or other global molecular properties
- Node Features: Atomic numbers, coordinates, and other atomic properties
- Proper Formatting: The framework supports various data formats depending on your use case
The framework expects specific feature schemas that can be customized in the fine-tuning scripts. Data format requirements may vary based on your specific dataset and configuration.
Core utilities for ensemble fine-tuning including:
- Argument parsing for fine-tuning parameters
- Distributed training setup
- Model loading and configuration
- Training loop management
Tools for modifying model architectures:
- Creating custom MLP heads for different tasks
- Adapting pre-trained models to new output dimensions
- Handling different prediction types (graph-level, node-level)
examples/qm9/ensemble_fine_tune.py: Complete example for QM9 molecular property predictionexamples/qm9/qm9_preonly.py: Data preprocessing utilities for QM9
To use your own dataset:
- Prepare data in the appropriate format for your use case
- Define feature schema in your fine-tuning script
- Create appropriate configuration JSON
- Modify output heads to match your tasks
The framework supports multi-task learning scenarios:
- Configure multiple output heads in the JSON configuration
- Specify task weights for balanced training
- Define different architectures for different task types
-
Import Errors: If you encounter
ModuleNotFoundErrorfor HydraGNN or project modules:- Verify your PYTHONPATH includes both directories:
echo $PYTHONPATH
- Check that the paths are correct and the directories exist
- For VS Code debugging, the PYTHONPATH is automatically configured in
.vscode/launch.json
- Verify your PYTHONPATH includes both directories:
-
Environment Variables: Ensure you've sourced your shell profile after adding PYTHONPATH:
source ~/.bashrc # or ~/.zshrc
-
Virtual Environment: If using a virtual environment, activate it before setting PYTHONPATH:
source .venv/bin/activate export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"
This project is part of the ORNL HydraGNN ecosystem. Contributions should follow the established patterns and maintain compatibility with the broader HydraGNN framework.
This project follows the same license as HydraGNN. Please refer to the main HydraGNN repository for licensing information.
If you use this code in your research, please cite the relevant HydraGNN and GFM papers.