Skip to content

Latest commit

 

History

History
47 lines (32 loc) · 2.38 KB

File metadata and controls

47 lines (32 loc) · 2.38 KB

Investigating information dynamics in BERT models during fine-tuning

Code-base for the paper From Performance to Process: Temporal Information Dynamics in Language Model Fine-tuning. The paper uses methods from information theory to extract and examine temporal learning dynamics in the evolution of internal representations in BERT models during fine-tuning.

Repository structure

bert-infodynamics/  
├── src/ 
│   └── configs/
│   │   └── *  # experiment-specific-configs
│   └── utils/
│   │   └── experiments.py # class for creating an experiment
│   │   └── utils_finetune.py # methods for for fine-tuning
│   │   └── utils_infodynamics.py # methods for extracting information signals
│   │   └── utils_visualizations.py  # methods for visualizing results
│   └── main-finetune.py # main driver for fine-tuning
│   └── main-finetune-best-model.py # main driver for fine-tuning 
│   └── main-hyperparameter-optimization.py # main driver for hyperparameter optimization
│   └── main-infodynamics.py # main driver for information dynamics

Pipeline

  1. Define experiment configs in the configfolder.

  2. To run experiments with fixed hyperparameters, run main-finetune.py to perform fine-tuning. This saves model checkpoints and final fine-tuned model. If you instead want to run experiments with optimized hyperparameters, first run main-hyperparameter-optimization.py to perform hyperparameter optimization. This saves a .yaml file with optimal hyperparameters for the given experiment. Update experiment config with those hyperparameters and run main-finetune-best-model.py to perform fine-tuning. This saves model checkpoints and final fine-tuned model.

  3. To extract information signals from fine-tuned model, run main-infodynamics.py. This saves a .json file with novelty, transience, and resonance for the given experiment.

  4. utils_visualization.py holds algorithms for visualization (including filtering, smoothing, detrending, and more) of the time series data (i.e. the extracted information signals).

Technicalities

The codebase relies on python v 3.12.3.

Licencse

This project is licensed under the MIT License - see the LICENSE file for details.