Skip to content

A question about Adam and the norm-based perspective #4

@Sun2018421

Description

@Sun2018421

Hi authors,

I recently read your paper on (arXiv:2502.07529v2), and it's an awesome piece of work! The idea of unifying optimizers like Muon and SignSGD under a single lmo framework is super cool.
In the paper, you contrast your a priori approach of setting a norm with on-the-fly methods like Adam. This got me thinking, and I'd love to hear your thoughts on this: Is it possible to also understand the Adam optimizer from a norm-based perspective?

I understand that Adam's update step size is variable, which is different from the fixed-norm output of an lmo. However, the operation in Adam that divides by the square root of squared gradients (\sqrt(v_t)) feels like it's performing some kind of geometric normalization, which might have an intrinsic connection to norms.

More specifically, I was wondering:

  1. Could Adam be viewed as using a "dynamic norm-ball," where the norm-ball itself changes at each step based on the history of gradients?
  2. I found the connection you drew between weight decay and norm constraints very insightful. It immediately made me think of AdamW. You also cited the paper linking AdamW to the $l_\infty$ norm, which seems to suggest a real connection here. Do you think this could be a potential path to fit Adam into your framework?
  3. Alternatively, is there a fundamental reason why this idea wouldn't work? For instance, is the key difference that prevents unification the fact that the lmo is scale-invariant, while Adam's update is sensitive to the gradient's scale?

Sorry for the long question; I'm just a student who is very interested in optimizers, and your paper has been very inspiring.

Thanks for any thoughts you might have and for the great work!

Best regards.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions