Skip to content

Derivative functionals#1545

Open
loliverhennigh wants to merge 13 commits intoNVIDIA:mainfrom
loliverhennigh:pr4-derivative-functionals
Open

Derivative functionals#1545
loliverhennigh wants to merge 13 commits intoNVIDIA:mainfrom
loliverhennigh:pr4-derivative-functionals

Conversation

@loliverhennigh
Copy link
Copy Markdown
Collaborator

PhysicsNeMo Pull Request

Description

Various derivative functionals

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR introduces seven derivative functionals (uniform_grid_gradient, rectilinear_grid_gradient, mesh_lsq_gradient, mesh_green_gauss_gradient, spectral_grid_gradient, and two meshless FD ops) under physicsnemo/nn/functional/derivatives, each with PyTorch and optional Warp backends registered via FunctionSpec. The implementations, benchmark cases, and tests are thorough, but two backend-inconsistency issues in the unstructured mesh functionals need attention before merging.

  • P1 (mesh_green_gauss_gradient torch backend, _torch_impl.py line 208): the torch path silently propagates partial/incorrect gradients w.r.t. points through the geometry coefficients, while the warp path correctly raises ValueError("warp … does not support gradients w.r.t. points"). Switching backends changes numerical outputs without any error signal.
  • P1 (mesh_lsq_gradient torch backend, _torch_impl.py line 1133): same pattern—dx = neigh_points - center traces back to points, giving a partial gradient that is inconsistent with the warp backend's explicit return None for grad_points.

Important Files Changed

Filename Overview
physicsnemo/nn/functional/derivatives/mesh_green_gauss_gradient/_torch_impl.py Torch implementation of Green-Gauss gradient; silently allows partial/incorrect gradients w.r.t. points, inconsistent with warp backend that raises ValueError.
physicsnemo/nn/functional/derivatives/mesh_green_gauss_gradient/utils.py Builds Green-Gauss geometry tensors; correct algorithm but creates O(n_cells×n_faces) per-element GPU tensor allocations inside a nested Python loop, which is a significant performance issue at scale.
physicsnemo/nn/functional/derivatives/mesh_green_gauss_gradient/_warp_impl.py Warp kernel implementation with correct forward/backward Green-Gauss kernels and autograd bridge; correctly raises ValueError for points.requires_grad.
physicsnemo/nn/functional/derivatives/mesh_green_gauss_gradient/mesh_green_gauss_gradient.py FunctionSpec class with warp/torch backend registration, benchmark cases, and parity tolerances; well-structured.
physicsnemo/nn/functional/derivatives/mesh_lsq_gradient/_torch_impl.py Torch batched-lstsq LSQ gradient implementation; allows gradient flow w.r.t. points through dx, inconsistent with warp backend that explicitly returns None for grad_points.
physicsnemo/nn/functional/derivatives/mesh_lsq_gradient/_warp_impl.py Warp kernels for 1D/2D/3D LSQ gradient with analytical 2×2/3×3 inverse in kernel and matching backward kernels; returns None for grad_points.
physicsnemo/nn/functional/derivatives/meshless_finite_difference/meshless_finite_difference.py MeshlessFDStencilPoints and MeshlessFDDerivatives FunctionSpec classes; uses jaxtyping annotations inconsistently with other functionals in this PR.
physicsnemo/nn/functional/derivatives/meshless_finite_difference/_torch_impl.py Torch stencil-point construction and central finite-difference derivatives; clean, correct implementation with good validation.
physicsnemo/nn/functional/derivatives/spectral_grid_gradient/spectral_grid_gradient.py SpectralGridGradient FunctionSpec class; uses jaxtyping annotations inconsistently with sibling functionals; otherwise well-structured.
physicsnemo/nn/functional/derivatives/spectral_grid_gradient/_torch_impl.py Spectral differentiation via FFT with correct wavenumber construction and mixed-derivative support; handles float16 upcast and Nyquist frequency correctly.
physicsnemo/nn/functional/derivatives/uniform_grid_gradient/uniform_grid_gradient.py UniformGridGradient with 3-tier auto-dispatch heuristic (warp/torch_compiled/torch) and lru_cache for compiled variant; well-designed.
physicsnemo/nn/functional/derivatives/rectilinear_grid_gradient/rectilinear_grid_gradient.py RectilinearGridGradient FunctionSpec with warp/torch backends for nonuniform periodic grids; correct nonuniform central-difference scheme documented with LaTeX.
physicsnemo/nn/functional/derivatives/init.py Package init exporting all derivative functionals and FunctionSpec classes; complete and consistent.
test/nn/functional/derivatives/test_mesh_green_gauss_gradient.py Tests torch accuracy on a linear field, warp/torch parity, and backward gradient parity; good coverage of the main paths.
benchmarks/physicsnemo/nn/functional/registry.py Adds all seven new derivative functionals to the ASV benchmark registry; correct and complete.

Comments Outside Diff (3)

  1. physicsnemo/nn/functional/derivatives/mesh_green_gauss_gradient/_torch_impl.py, line 208-216 (link)

    P1 Inconsistent points.requires_grad handling vs. warp backend

    The torch implementation silently allows—and partially computes—gradients w.r.t. points, because coeff is derived from points via build_geometry and the autograd graph is never severed. The warp backend explicitly raises a ValueError for this case ("warp mesh_green_gauss_gradient does not support gradients w.r.t. points"). When a user switches between backends, they will get either an incorrect partial gradient (torch) or a hard failure (warp) for the same input, with no warning in the torch path.

    The gradient through coeff is incomplete: it captures the sensitivity of the face-area coefficients to point positions, but not the topological sensitivity (which neighbors a point has), making the gradient numerically incorrect for mesh-optimization use cases.

  2. physicsnemo/nn/functional/derivatives/mesh_lsq_gradient/_torch_impl.py, line 1133-1148 (link)

    P1 Inconsistent gradient behavior for points vs. warp backend

    dx = neigh_points - center traces back to points, so autograd will compute a gradient w.r.t. points through the lstsq solution. The warp backend (_MeshLSQGradientWarpAutograd.backward) returns None for the points gradient, meaning the two backends disagree on this fundamental behavior. A training loop that runs on CPU (torch) will silently produce a gradient for points; the same loop on GPU (warp) will produce zero for the same input—no error, just different numbers.

    Consider guarding against points.requires_grad in the torch path, or at minimum document that gradients w.r.t. points are not supported and are silently zeroed in the warp path.

  3. physicsnemo/nn/functional/derivatives/mesh_green_gauss_gradient/utils.py, line 930-945 (link)

    P2 Per-element device tensor allocation inside nested Python loop

    torch.tensor(verts, device=points.device, ...) is called once per face per cell inside a nested for cell_idx / for face_idx loop—O(n_cells × (dims+1)) tiny device allocations. On CUDA this triggers a device allocation and a host-to-device copy per iteration, making this extremely slow for even moderate meshes. The face_vertices tensor is pre-allocated; the inner assignment could be vectorised instead:

    # Collect all (cell, face, vertex_idx) mappings in Python first
    all_verts = [
        [int(cell[v]) for v in local_face]
        for cell in cells_cpu
        for local_face in local_faces
    ]
    face_vertices = torch.tensor(
        all_verts, dtype=torch.int64, device=points.device
    ).reshape(n_cells, n_faces, dims)

    This reduces allocations to a single torch.tensor call regardless of mesh size.

Reviews (1): Last reviewed commit: "Add spectral and meshless finite-differe..." | Re-trigger Greptile

Comment on lines +60 to +71
@FunctionSpec.register(name="torch", rank=0, baseline=True)
def torch_forward(
points: Float[Tensor, "num_points dim"],
spacing: float | Sequence[float] = 1.0,
include_center: bool = True,
) -> Float[Tensor, "num_points stencil_size dim"]:
"""Dispatch stencil-point construction to the PyTorch backend."""
return meshless_fd_stencil_points_torch(
points=points,
spacing=spacing,
include_center=bool(include_center),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent use of jaxtyping annotations across the PR

MeshlessFDStencilPoints, MeshlessFDDerivatives, and SpectralGridGradient use jaxtyping (Float[Tensor, "..."]) in their FunctionSpec.register-decorated methods, while MeshGreenGaussGradient, MeshLSQGradient, RectilinearGridGradient, and UniformGridGradient in the same PR use plain torch.Tensor. Consistent annotation style across all functionals in this module would improve maintainability.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@@ -0,0 +1,19 @@
# SPDX-FileCopyrightText: Copyright (c) 2023 - 2026 NVIDIA CORPORATION & AFFILIATES.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently unused in the mesh module however I already had a really nice implementation of it and wanted to add it.

)


def meshless_fd_stencil_points_torch(
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just used for generating the data for testing. Probably can remove honestly but ok...

### Forward kernels (periodic central differences)
### ============================================================


Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might break this file up...

Copy link
Copy Markdown
Collaborator

@ktangsali ktangsali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks @loliverhennigh! I also tested the PhysicsInformer integration with this PR and we have everything needed for supporting that higher level abstraction as well. Thank you for addressing the mixed derivatives comment and getting it to feature parity with Sym's implementations.

I have added a few comments. Some comments (like the docstring one for functions) are applicable to all the different gradient methods but I have just added the comment to one of the functions / implementations. Kindly take a look.

Overall this is looking in a great shape! Don't forget to update the Changelog.

spacing: float | Sequence[float] = 1.0,
derivative_orders: int | Sequence[int] = 1,
include_mixed: bool = False,
) -> Float[Tensor, "num_derivs num_points channels"]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as Greptile also pointed out, there is some inconsistency in jaxtyping across the different gradient methods.

neighbor_indices=neighbor_indices,
min_neighbors=int(min_neighbors),
)
if points.requires_grad:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding torch will support this right? Any specific reason why we cannot support this with Warp too?

) -> torch.Tensor:
"""Compute Green-Gauss cell-centered gradients with Warp kernels."""
validate_inputs(points=points, cells=cells, neighbors=neighbors, values=values)
if points.requires_grad:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

include_mixed=bool(include_mixed),
)

orig_dtype = field.dtype
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is a common pattern across the kernels - which makes sense to me, but maybe we add it as a note somewhere? Something like:

Note: Warp backends internally compute in float32. Float64 inputs are accepted but derivative accuracy is limited to float32 precision.

Because we would be losing precision if someone decides for some reason to pass fp64 (which folks might to get more accurate grads)

_BENCHMARK_CASES = (
("2d-tri-24x24-scalar", 24, 24, False),
("2d-tri-36x36-scalar", 36, 36, False),
("2d-tri-36x36-vector", 36, 36, True),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we don't have a 3D test here ? Would be good to test the 3D path too.

used: on CUDA, 1D/2D fields prefer ``torch``; 3D fields use a single-threshold
crossover (``torch`` -> ``warp``) as problem size
grows. Inputs requiring gradients prefer ``warp`` to use the explicit
custom backward kernels.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the public facing API right? Is my understanding correct ? If so, can we add minimal examples to them? And maybe the parameters can be further expanded?

Examples
--------
>>> field = torch.sin(torch.linspace(0, 2*torch.pi, 64))
>>> grads = uniform_grid_gradient(field, spacing=2*torch.pi/64)
>>> grads.shape
torch.Size([1, 64])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I see that we are rendering these functions in the docs...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add a CHANGELOG entry for this PR :)

@ktangsali ktangsali mentioned this pull request Apr 13, 2026
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants