A question about Adam and the norm-based perspective

Hi authors,

I recently read your paper on (arXiv:2502.07529v2), and it's an awesome piece of work! The idea of unifying optimizers like Muon and SignSGD under a single lmo framework is super cool.
In the paper, you contrast your a priori approach of setting a norm with on-the-fly methods like Adam. This got me thinking, and I'd love to hear your thoughts on this:   Is it possible to also understand the Adam optimizer from a norm-based perspective?

I understand that Adam's update step size is variable, which is different from the fixed-norm output of an lmo. However, the operation in Adam that divides by the square root of squared gradients (\sqrt(v_t)) feels like it's performing some kind of geometric normalization, which might have an intrinsic connection to norms.

More specifically, I was wondering:

1. Could Adam be viewed as using a "dynamic norm-ball," where the norm-ball itself changes at each step based on the history of gradients?
2. I found the connection you drew between weight decay and norm constraints very insightful. It immediately made me think of AdamW. You also cited the paper linking AdamW to the $l_\infty$ norm, which seems to suggest a real connection here. Do you think this could be a potential path to fit Adam into your framework?
3. Alternatively, is there a fundamental reason why this idea wouldn't work? For instance, is the key difference that prevents unification the fact that the lmo is scale-invariant, while Adam's update is sensitive to the gradient's scale?

Sorry for the long question; I'm just a student who is very interested in optimizers, and your paper has been very inspiring.

Thanks for any thoughts you might have and for the great work!

Best regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about Adam and the norm-based perspective #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A question about Adam and the norm-based perspective #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions