Skip to content

support branch parallel for evoformer#14

Open
GuoxiaWang wants to merge 2 commits intodptech-corp:mainfrom
GuoxiaWang:feature_bp
Open

support branch parallel for evoformer#14
GuoxiaWang wants to merge 2 commits intodptech-corp:mainfrom
GuoxiaWang:feature_bp

Conversation

@GuoxiaWang
Copy link
Copy Markdown

@GuoxiaWang GuoxiaWang commented Nov 10, 2022

@guolinke
Copy link
Copy Markdown
Member

Thank you, I will review this in the weekend.

), "Must specify batch size either with --batch-size"
metrics.reset()

args.seed += args.dp_rank
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this change needed?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using a hybrid distributed parallel strategy, such as DP-BP, the parameters and data in the same BP group need to be the same, so the seeds need to be the same.

if torch.cuda.is_available():
dist.all_reduce(torch.zeros(1).cuda())

scg.init_group(bp_degree=args.bp_degree, dap_degree=1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this affect the normal c10d, no_c10d mode?
Can we make "bp" a choice, like currently c10d, no_c10d?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure about this question. This PR is just to show how to use BP, not to merge this PR into UniCore.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I may miss some contexts.


return outer_grad.clone(), msa_grad.clone(), pair_grad.clone()

def sync_evoformer_results(outer, msa, pair, training):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the functions in this file are better to be in Uni-Fold.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same problem as above. It is necessary to design the code together and merge them into UniFold and UniCore respectively.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants