Skip to content

Batch API#540

Merged
amontoison merged 42 commits intomainfrom
am/batch_api
Feb 25, 2026
Merged

Batch API#540
amontoison merged 42 commits intomainfrom
am/batch_api

Conversation

@amontoison
Copy link
Member

@klamike klamike mentioned this pull request Feb 4, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 4, 2026

Package name latest stable
ADNLPModels
AdaptiveRegularization
AmplNLReader
BundleAdjustmentModels
CUTEst
CaNNOLeS
DCISolver
FletcherPenaltySolver
FluxNLPModels
JSOSolvers
JSOSuite
LLSModels
ManualNLPModels
NLPModelsIpopt
NLPModelsJuMP
NLPModelsKnitro
NLPModelsModifiers
NLPModelsTest
NLSProblems
PDENLPModels
PartiallySeparableNLPModels
PartiallySeparableSolvers
Percival
QuadraticModels
RegularizedOptimization
RegularizedProblems
SolverBenchmark
SolverTest
SolverTools

@amontoison
Copy link
Member Author

Michael, I finished what I wanted.
You can do a pass on the tests when you have time.

@klamike
Copy link
Collaborator

klamike commented Feb 5, 2026

@amontoison probably this is why the VI is not in the regular meta: klamike@c54ded6 we can't infer it.

@klamike
Copy link
Collaborator

klamike commented Feb 5, 2026

A more realistic example, batched QuadraticModel where we vary only the RHS: JuliaSmoothOptimizers/QuadraticModels.jl@main...klamike:QuadraticModels.jl:mk/rhsbatch

@amontoison
Copy link
Member Author

Amazing Michael!

@amontoison
Copy link
Member Author

Should we hardcode VI = Vector{Float64} like in the non-batch case?
Did this API is what we need in MadIPM.jl or should we adjust a few things?

@klamike
Copy link
Collaborator

klamike commented Feb 5, 2026

Should we hardcode VI = Vector{Float64} like in the non-batch case?

I think so, yes, to be consistent with the regular API. Both can be updated at the same time later (maybe only in 0.22)

Did this API is what we need in MadIPM.jl or should we adjust a few things?

We probably want to define some more meta functions like get_nvar, get_x0.. I'll try to integrate the batch API into the MadIPM UniformBatch over the few days and get back

@klamike
Copy link
Collaborator

klamike commented Feb 8, 2026

@amontoison what do you think about having an API for updating the nbatch? and maybe some optional get_nlpmodel_at_index returning an AbstractNLPModel`?

@amontoison
Copy link
Member Author

@klamike I don't unserstand what you mean by updating the nbatch.
Do you want to dynamically increase or reduce the batch?
For get_nlpmodel_at_index, you suppose that the models are independent but with ExaModels.jl all models will be compiled in one expression and we can't split them.

@klamike
Copy link
Collaborator

klamike commented Feb 8, 2026

Yes I meant updating the nbatch throughout the solve. At the NLPModels level it would just change the size of the buffers in the out-of-place API. The motivation is to have some way of skipping evaluating batch elements that have already converged, for batch models where that can be done efficiently. Models with non-independent elements can override it to error of course. Similar story for the get_nlpmodel_at_index!.

In the ExaModels case, of course it depends on how you do the batching. When I added parameters to ExaModels I specifically made the lower level functions all take the parameter vector as an input, to make it possible to implement the batching the way I did in BatchNLPKernels. It is based on a single ExaModel and does the same number of kernel launches for the batch evaluation as ExaModels would for a regular model, just with 2D grids. Since the base parametric ExaModel is built "unbatched", it is trivial to implement the set_nbatch and get_nlpmodel_at_index!.

@klamike
Copy link
Collaborator

klamike commented Feb 8, 2026

Actually the set_nbatch should really be set_active_batch_idx which would itself update the nbatch... I need to think on it some more, it feels more like a solver concern than an NLPModels concern. Might make more sense to implement it on some DynamicBatchNLPModel wrapper that lives in e.g. MadNLP. At least in the meantime, we can just force full-batch all the time here.

I do still think the get_nlpmodel_at_index! would be useful. It would simplify the incremental implementation of batch versions of existing solvers.

@amontoison
Copy link
Member Author

@klamike Be free to add what you need in the APi on NLPModels.jl with this PR.

@klamike
Copy link
Collaborator

klamike commented Feb 10, 2026

I think it's good to go, I was overcomplicating things. I got the MadIPM UniformBatch + RHSBatchQuadraticModel working locally, will clean it up and push to the MadIPM PR soon.

@amontoison
Copy link
Member Author

amontoison commented Feb 11, 2026

@klamike Do you have any benchmark with RHSBatchQuadraticModel or anything related to batch MadIPM?
I give a talk at Los Alamos National Laboratory tomorrow and happy to include any numerical results!

@klamike
Copy link
Collaborator

klamike commented Feb 13, 2026

Sorry about that, I missed your message.. The latest result is on a batch of 128 9241_pegase DCOPF, ~6.5x faster over sequential, and ~1.85x faster over multi-threading with 8 threads (comparing 1 task/mini-batch vs 1 task/problem)

@amontoison
Copy link
Member Author

amontoison commented Feb 14, 2026

Sorry about that, I missed your message.. The latest result is on a batch of 128 9241_pegase DCOPF, ~6.5x faster over sequential, and ~1.85x faster over multi-threading with 8 threads (comparing 1 task/mini-batch vs 1 task/problem)

Good 🥇
For MadIPM.jl, should we provide jfix, jlow, ... as a boolean vector of size ncon * nbatch instead of only local indices?
It hard to get the coefficient for each batch with the current version.

Also should I add a prefix batch_ for the availability flags in AbstractBatchNLPModelMeta ?

@michel2323 worked on BatchExaModel today, and except the function batch_obj, we just need to rename the routines to follow this batch API.
--> exanauts/ExaModels.jl#216

@klamike
Copy link
Collaborator

klamike commented Feb 15, 2026

For the jfix etc, maybe it would be more consistent with the non-batch API to have a Vector{Vector{Int}}?

Regarding the batch_ prefix, I think it is fine as is. I would instead suggest that we change the expected field for meta to be batch_meta or bmeta, in case one wants to implement both the regular and the batch APIs in one type.

BatchExaModel looks very nice!

@sshin23
Copy link

sshin23 commented Feb 15, 2026

Which GPU did you run the benchmark on @klamike?

@klamike
Copy link
Collaborator

klamike commented Feb 15, 2026

I believe it was RTX 6000 Pro Blackwell

@sshin23
Copy link

sshin23 commented Feb 15, 2026 via email

@klamike
Copy link
Collaborator

klamike commented Feb 16, 2026

We are indeed reusing the symbolic factorization, using cuDSS's uniform batch feature. But the overall performance of the solver is not quite optimized, currently a lot of time is spent on packing/unpacking from individual to batched buffers since that made it simpler to batch-ify incrementally. Now that (most of) the solver is batched, I plan to revisit this.

I think we have H100 and H200 but no B100. I'll give it a try soon, with more CPU threads too.

@amontoison
Copy link
Member Author

amontoison commented Feb 24, 2026

@klamike We discussed about the batch API with Sungho last week and we converged to a storage where everything is a multi-dimensional array and the last dimension is the number of batch.
Even if it is equivalent, it may be easier to implement kernels this way (pure syntax sugar).
We can easily call vec on the array if needed.
What do you think?

I already updated CUDSS.jl last weekend for that:

@klamike
Copy link
Collaborator

klamike commented Feb 24, 2026

I like that approach! But for the KKT system nzval, I think strided vector is actually nicer (we can reuse the transfer kernels by just building a batched map). Matrix is definitely more natural for a user-facing API. (so it makes sense to have both in CUDSS).

As I am working on this version, I have come across several kernels which can be written exactly the same (differs only in argument types), or only one or two words different (e.g. same mapreduce but with a dims=1), between batched and unbatched. I'm trying to make it match as closely as possible so we can eventually just have one kernel, shared between batched and unbatched. For that, keeping things as Matrix is very helpful. The name change you made from batch_obj -> obj is nice for that too.

@amontoison
Copy link
Member Author

@klamike Do you anything else before that I merge the PR?
I just need to polish the documentation.

@amontoison amontoison merged commit 257de26 into main Feb 25, 2026
74 checks passed
@amontoison amontoison deleted the am/batch_api branch February 25, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants