support branch parallel for evoformer by GuoxiaWang · Pull Request #14 · dptech-corp/Uni-Core

GuoxiaWang · 2022-11-10T02:21:29Z

support Branch Parallelism in paper Efficient AlphaFold2 Training using Parallel Evoformer and Branch Parallelism

guolinke · 2022-11-11T04:25:19Z

Thank you, I will review this in the weekend.

guolinke · 2022-11-14T06:02:33Z

unicore_cli/train.py

    ), "Must specify batch size either with --batch-size"
    metrics.reset()

+    args.seed += args.dp_rank


is this change needed?

When using a hybrid distributed parallel strategy, such as DP-BP, the parameters and data in the same BP group need to be the same, so the seeds need to be the same.

guolinke · 2022-11-14T06:04:02Z

unicore/distributed/utils.py

        if torch.cuda.is_available():
            dist.all_reduce(torch.zeros(1).cuda())

+        scg.init_group(bp_degree=args.bp_degree, dap_degree=1)


will this affect the normal c10d, no_c10d mode?
Can we make "bp" a choice, like currently c10d, no_c10d?

I'm not quite sure about this question. This PR is just to show how to use BP, not to merge this PR into UniCore.

sorry, I may miss some contexts.

guolinke · 2022-11-14T06:05:50Z

unicore/distributed/bp.py

+
+        return outer_grad.clone(), msa_grad.clone(), pair_grad.clone()
+
+def sync_evoformer_results(outer, msa, pair, training):


I feel like the functions in this file are better to be in Uni-Fold.

Same problem as above. It is necessary to design the code together and merge them into UniFold and UniCore respectively.

dptech-corp/Uni-Fold#73

GuoxiaWang added 2 commits November 10, 2022 10:06

support branch parallel for evoformer

b1ebd0f

add training input param

e3ea8c8

guolinke reviewed Nov 14, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support branch parallel for evoformer#14

support branch parallel for evoformer#14
GuoxiaWang wants to merge 2 commits intodptech-corp:mainfrom
GuoxiaWang:feature_bp

GuoxiaWang commented Nov 10, 2022 •

edited

Loading

Uh oh!

guolinke commented Nov 11, 2022

Uh oh!

guolinke Nov 14, 2022

Uh oh!

GuoxiaWang Nov 21, 2022

Uh oh!

guolinke Nov 14, 2022

Uh oh!

GuoxiaWang Nov 21, 2022

Uh oh!

guolinke Nov 22, 2022

Uh oh!

guolinke Nov 14, 2022

Uh oh!

GuoxiaWang Nov 21, 2022

Uh oh!

GuoxiaWang Nov 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return outer_grad.clone(), msa_grad.clone(), pair_grad.clone()

		def sync_evoformer_results(outer, msa, pair, training):

Conversation

GuoxiaWang commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

guolinke commented Nov 11, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

GuoxiaWang commented Nov 10, 2022 •

edited

Loading