Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
52dd605
Updating the github nightly build with uv to get optional deps better.
coreyjadams Feb 19, 2026
eb0ff85
Ensure transformer engine is skipped in CI build until cuda13 fix com…
coreyjadams Feb 19, 2026
cc929a4
Include hidden files in ci venv uploads
coreyjadams Feb 19, 2026
3fa856b
Use a custom action for the install steps of the package.
coreyjadams Feb 19, 2026
42616ff
Try adding an env variable to use headless pyvista off screen
coreyjadams Feb 19, 2026
8917d23
Skip pv pplotter errors
coreyjadams Feb 19, 2026
7345dd8
Update CI
coreyjadams Feb 20, 2026
bfc1931
Attempt to fix torch scatter build
coreyjadams Apr 9, 2026
b91e9b7
Try again with pyg
coreyjadams Feb 20, 2026
6269e2c
Add cmake. try again
coreyjadams Feb 20, 2026
0e9184d
Testing another way
coreyjadams Feb 20, 2026
a280777
Add debuging options
coreyjadams Feb 20, 2026
d77a0f5
Add a dockerfile build action. Switch to cuda 12
coreyjadams Feb 20, 2026
002b015
make the cache pull robust
coreyjadams Feb 20, 2026
46ad16e
Trying again
coreyjadams Feb 20, 2026
84bca8d
Trying again again
coreyjadams Feb 20, 2026
3aec4a6
trying again again again
coreyjadams Feb 20, 2026
e715ce9
Trying again again again again
coreyjadams Feb 20, 2026
0078252
Trying again again again again again
coreyjadams Feb 21, 2026
e2b4cb8
try more agains
coreyjadams Feb 21, 2026
ac6843a
who knows
coreyjadams Feb 21, 2026
ac2c555
Increase test tolerance. Upload test report as artifact
coreyjadams Feb 23, 2026
cb7ecbe
Turn off 3D convnd test, it's not numerically stable
coreyjadams Feb 23, 2026
a042cbc
upload better report.
coreyjadams Feb 24, 2026
7a3d64d
fix workspace permissions
coreyjadams Feb 24, 2026
66fe969
revert workspace changes, upload reports for coverage path and genera…
coreyjadams Feb 24, 2026
27a87ca
Restore container pipeline against main.
coreyjadams Mar 12, 2026
0fa7535
rmove container build action from this pr
coreyjadams Mar 12, 2026
5e5df27
reintroduce pytorch-g deps on torch.
coreyjadams Mar 12, 2026
464734d
updates in this pr:
coreyjadams Mar 17, 2026
db29cba
add explicit git repo
coreyjadams Mar 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .github/actions/bootstrap-cudnn-ci/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
name: Bootstrap cuDNN CI container
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, does this test trigger any packages to get built from source or is this like core deps and we have binaries for everything?

Copy link
Copy Markdown
Collaborator

@NickGeneva NickGeneva Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context, just curious about if this is all the deps needed to install everything (and build deps from source when needed)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does trigger some deps to build from source, sometimes. torch_geometric and family, natten both come to mind. transformer_engine will be in the pile eventually too.

It does not do it all the time though: the uv caching will, since uv itself will cache the binaries locally, have the pre-built wheel from last night available tonight, if that makes sense. And the next night, and the next night, and so on until the the cache is invalid or the lock file requires a new build. So the build doesn't trigger everything all the time.

The first build took forever though.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And no we're not catching everything yet. I need to get TE for sure, still missing a few others I think. I've got a lot of the deps, though. I was hoping to roll out incrementally from here - a part of the reporting stage was to help ID which tests are skipped due to missing software deps and fix that.

Copy link
Copy Markdown
Collaborator

@NickGeneva NickGeneva Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah its the builds form packages from source is what is raising my previous questions about cache invalidation / refreshes.

This is where leaning hard on caching and aggressively using it (and not refreshing it all the time) is very very useful. Initial cache builds for e2s can take hours with e2grid, natten, flash-attn, torch-harmoncis, etc can take hours... something that is useful practically to only do maybe at most once per week imo

managing the cache is uvs problem right. it does not need a fresh cache all the time. creating a new cache really just checks pypi is still alive (I hope so) and source builds still operate as intended. Could be nightly... but when things go wrong... you dont want all new PRs to get stuck building new caches as well because the lock file changed and now your having an issue with TE or other source build

description: Install OS dependencies and uv in CUDA cuDNN container jobs
inputs:
python-version:
description: Python major.minor expected in the container
required: false
default: "3.12"
runs:
using: composite
steps:
- name: Install system dependencies
shell: bash
run: |
set -euo pipefail
export DEBIAN_FRONTEND=noninteractive
apt-get update
apt-get install -y --no-install-recommends \
ca-certificates \
curl \
git \
gh \
build-essential \
cmake \
pkg-config \
python3 \
python3-dev \
python3-venv \
python3-pip \
zstd
ln -sf /usr/bin/python3 /usr/bin/python
rm -rf /var/lib/apt/lists/*

- name: Install uv
shell: bash
run: |
set -euo pipefail
curl -LsSf https://astral.sh/uv/install.sh | sh
echo "$HOME/.cargo/bin" >> "$GITHUB_PATH"

- name: Print toolchain versions
shell: bash
run: |
set -euo pipefail
python3 --version
uv --version
gcc --version | head -n 1
cmake --version | head -n 1
if command -v nvcc >/dev/null 2>&1; then
nvcc --version
else
echo "nvcc not found on PATH"
fi
83 changes: 83 additions & 0 deletions .github/actions/setup-uv-env/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
name: Setup uv environment
description: Restore uv and venv caches, and run uv sync on cache miss
inputs:
uv-cache-key-prefix:
description: Prefix for uv package cache key
required: true
venv-cache-key-prefix:
description: Prefix for virtual environment cache key
required: true
venv-cache-key-suffix:
description: Deterministic suffix appended to venv cache key prefix
required: true
uv-cache-key-suffix:
description: Suffix for uv cache key (e.g. "latest" for a static key)
required: true
outputs:
uv_cache_hit:
description: Whether uv package cache had an exact key hit
value: ${{ steps.restore-uv-cache.outputs.cache-hit }}
venv_cache_hit:
description: Whether venv cache had an exact key hit
value: ${{ steps.restore-venv-cache.outputs.cache-hit }}
runs:
using: composite
steps:
- name: Restore uv package cache
id: restore-uv-cache
uses: actions/cache/restore@v4
with:
path: ~/.cache/uv
key: ${{ inputs.uv-cache-key-prefix }}-${{ inputs.uv-cache-key-suffix }}
fail-on-cache-miss: false
restore-keys: |
${{ inputs.uv-cache-key-prefix }}-

- name: Restore venv cache
id: restore-venv-cache
uses: actions/cache/restore@v4
with:
path: .venv
key: ${{ inputs.venv-cache-key-prefix }}-${{ inputs.venv-cache-key-suffix }}
fail-on-cache-miss: false
restore-keys: |
${{ inputs.venv-cache-key-prefix }}-

- name: Debug cache and environment context
shell: bash
run: |
set -euo pipefail
echo "::group::setup-uv-env debug context"
echo "uv cache key: ${{ inputs.uv-cache-key-prefix }}-${{ inputs.uv-cache-key-suffix }}"
echo "venv cache key: ${{ inputs.venv-cache-key-prefix }}-${{ inputs.venv-cache-key-suffix }}"
echo "uv cache exact hit: ${{ steps.restore-uv-cache.outputs.cache-hit }}"
echo "venv cache exact hit: ${{ steps.restore-venv-cache.outputs.cache-hit }}"
echo "workspace: $GITHUB_WORKSPACE"
df -h
echo "::endgroup::"

- name: Install dependencies with uv (dev + cu12)
if: steps.restore-venv-cache.outputs.cache-hit != 'true'
shell: bash
run: |
set -euo pipefail
export UV_LINK_MODE=copy
echo "::group::uv sync (dev + cu12)"
uv sync \
--frozen \
--group dev \
--extra cu12
echo "::endgroup::"
uv run python -c "import torch; print(f'torch={torch.__version__} cuda={torch.version.cuda}')"

- name: Report cache sizes
shell: bash
run: |
echo "::group::cache sizes"
echo "uv package cache:"
du -sh ~/.cache/uv 2>/dev/null || echo " (not present)"
echo ".venv:"
du -sh .venv 2>/dev/null || echo " (not present)"
df -h
echo "::endgroup::"

Loading
Loading