HydraGNN GFM Fine-Tuning for Materials

This repository provides tools and utilities for fine-tuning the HydraGNN Graph Foundation Model (GFM) ensemble on materials science datasets. The framework enables transfer learning from pre-trained graph neural network models to domain-specific tasks.

Overview

The Graph Foundation Model (GFM) ensemble is a collection of pre-trained HydraGNN models that can be fine-tuned for various molecular and materials property prediction tasks. This repository includes:

Utilities for fine-tuning model ensembles
Example configurations for common datasets (QM9)
Tools for model adaptation and head configuration
Data preprocessing utilities

Project Structure

├── README.md
├── examples/
│   └── qm9/
│       ├── ensemble_fine_tune.py     # Main fine-tuning script for QM9
│       ├── finetuning_config.json    # Configuration for fine-tuning heads
│       └── qm9_preonly.py           # QM9 preprocessing script
└── utils/
    ├── __init__.py
    ├── ensemble_utils.py             # Core fine-tuning utilities
    └── update_model.py              # Model architecture modification tools

Installation

Prerequisites

HydraGNN: Install the latest version from the main branch:

git clone https://github.com/ORNL/HydraGNN.git
cd HydraGNN
pip install -e .

Python Dependencies: Install the dependencies required by HydraGNN:
- Follow the installation instructions in the HydraGNN repository
- All required dependencies will be installed automatically when you install HydraGNN with pip install -e .

Environment Setup: Update your PYTHONPATH to include both directories:

export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"

Or add these lines to your .bashrc or .zshrc:

export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"

Download Pre-trained Model Ensemble

Download the pre-trained GFM ensemble from HuggingFace:

# Download all model checkpoints and configuration files
# Each ensemble member will be fine-tuned independently

The model ensemble contains multiple pre-trained models with their respective configuration files organized in a structured directory format.

Usage

Environment Setup

Important: Before running any scripts, ensure your PYTHONPATH includes both the HydraGNN_GFM_FineTuning4Materials and HydraGNN directories:

# Option 1: Set temporarily for current session
export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"

# Option 2: Add to your shell profile (~/.bashrc or ~/.zshrc)
echo 'export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"' >> ~/.bashrc
source ~/.bashrc

Quick Start with QM9 Example

Navigate to the QM9 example directory:
```
cd examples/qm9/
```
Prepare your dataset (if not using QM9):
- Prepare your data in the appropriate format
- Update the feature schema in the fine-tuning script if needed
Configure fine-tuning parameters:
- Modify finetuning_config.json to specify:
  - Output head architecture
  - Task weights
  - Layer dimensions
  - Number of tasks
Run fine-tuning:
```
python ensemble_fine_tune.py
```

Configuration

The fine-tuning process is controlled by JSON configuration files that specify:

Output Heads: Define the architecture of task-specific prediction heads
Task Configuration: Specify output dimensions, types, and weights
Training Parameters: Learning rates, batch sizes, and optimization settings

Example configuration structure:

{
    "NeuralNetwork": {
        "Architecture": {
            "output_heads": {
                "graph": [{
                    "type": "branch-0",
                    "architecture": {
                        "dim_pretrained": 50,
                        "num_sharedlayers": 2,
                        "dim_sharedlayers": 5,
                        "num_headlayers": 2,
                        "dim_headlayers": [50, 25]
                    }
                }]
            },
            "output_dim": [1],
            "output_type": ["graph"]
        }
    }
}

Data Preparation

For custom datasets, ensure your data includes:

Graph Features: Energy or other global molecular properties
Node Features: Atomic numbers, coordinates, and other atomic properties
Proper Formatting: The framework supports various data formats depending on your use case

The framework expects specific feature schemas that can be customized in the fine-tuning scripts. Data format requirements may vary based on your specific dataset and configuration.

Key Components

`utils/ensemble_utils.py`

Core utilities for ensemble fine-tuning including:

Argument parsing for fine-tuning parameters
Distributed training setup
Model loading and configuration
Training loop management

`utils/update_model.py`

Tools for modifying model architectures:

Creating custom MLP heads for different tasks
Adapting pre-trained models to new output dimensions
Handling different prediction types (graph-level, node-level)

Example Scripts

examples/qm9/ensemble_fine_tune.py: Complete example for QM9 molecular property prediction
examples/qm9/qm9_preonly.py: Data preprocessing utilities for QM9

Advanced Usage

Custom Datasets

To use your own dataset:

Prepare data in the appropriate format for your use case
Define feature schema in your fine-tuning script
Create appropriate configuration JSON
Modify output heads to match your tasks

Multi-Task Learning

The framework supports multi-task learning scenarios:

Configure multiple output heads in the JSON configuration
Specify task weights for balanced training
Define different architectures for different task types

Troubleshooting

Common Issues

Import Errors: If you encounter ModuleNotFoundError for HydraGNN or project modules:
- Verify your PYTHONPATH includes both directories:
```
echo $PYTHONPATH
```
- Check that the paths are correct and the directories exist
- For VS Code debugging, the PYTHONPATH is automatically configured in .vscode/launch.json
Environment Variables: Ensure you've sourced your shell profile after adding PYTHONPATH:
```
source ~/.bashrc  # or ~/.zshrc
```

Virtual Environment: If using a virtual environment, activate it before setting PYTHONPATH:

source .venv/bin/activate
export PYTHONPATH="${PYTHONPATH}:/path/to/HydraGNN_GFM_FineTuning4Materials:/path/to/HydraGNN"

Contributing

This project is part of the ORNL HydraGNN ecosystem. Contributions should follow the established patterns and maintain compatibility with the broader HydraGNN framework.

License

This project follows the same license as HydraGNN. Please refer to the main HydraGNN repository for licensing information.

Citation

If you use this code in your research, please cite the relevant HydraGNN and GFM papers.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
examples		examples
utils		utils
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HydraGNN GFM Fine-Tuning for Materials

Overview

Project Structure

Installation

Prerequisites

Download Pre-trained Model Ensemble

Usage

Environment Setup

Quick Start with QM9 Example

Configuration

Data Preparation

Key Components

`utils/ensemble_utils.py`

`utils/update_model.py`

Example Scripts

Advanced Usage

Custom Datasets

Multi-Task Learning

Troubleshooting

Common Issues

Contributing

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HydraGNN GFM Fine-Tuning for Materials

Overview

Project Structure

Installation

Prerequisites

Download Pre-trained Model Ensemble

Usage

Environment Setup

Quick Start with QM9 Example

Configuration

Data Preparation

Key Components

utils/ensemble_utils.py

utils/update_model.py

Example Scripts

Advanced Usage

Custom Datasets

Multi-Task Learning

Troubleshooting

Common Issues

Contributing

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`utils/ensemble_utils.py`

`utils/update_model.py`

Packages