Skip to content

drisspg/transformer_nuggets

Repository files navigation

transformer_nuggets

A grab-bag of experimental transformer kernels and utilities (mostly PyTorch + Triton).

transformer_nuggies

What’s in here

  • FlashAttention experiments: removed; the useful pieces have been upstreamed to PyTorch as FlexAttention in a commit.
  • NF4 / QLoRA quantization experiments: removed; that work now lives in torchao.
  • transformer_nuggets/fp8: FP8 casting / scaled-quantization kernels (Triton).
  • transformer_nuggets/cute: CUTE DSL experiments and tooling (includes an intra-kernel profiler).
  • transformer_nuggets/misc: Odds and ends (e.g. attention wrappers, utilities).
  • transformer_nuggets/llama: LLaMA-ish model + training/finetune scripts (research-grade).

This repository is research code: APIs are not stable and may change.

Install

You’ll need a working PyTorch install first (CPU or CUDA). Follow the official PyTorch install instructions.

To install from PyPI:

pip install transformer_nuggets

To hack on the code locally:

git clone https://github.com/drisspg/transformer_nuggets.git
cd transformer_nuggets
pip install -e .

Optional extras:

pip install "transformer_nuggets[llama]"  # llama training utilities

Quick examples

Use torchao for quantization experiments.

Use PyTorch FlexAttention instead of the old local FlashAttention experiments.

CUTE intra-kernel profiling (writes a Perfetto trace):

python -m transformer_nuggets.cute.profiler.example

Repo layout

  • transformer_nuggets/: Python package.
  • benchmarks/: Microbenchmarks and profiling scripts.
  • examples/: Small runnable examples.
  • scripts/: One-off utilities.
  • test/: PyTest suite.

Development

pip install -e ".[dev]"
pre-commit install
pytest

About

A place to store reusable transformer components of my own creation or found on the interwebs

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors