[Bug] Revise the _remove_state_dict_prefix and _add_state_dict_prefix functions in timm.py to adapt to the case of multiple submodels.#1295
Open
wilxy wants to merge 3 commits intoopen-mmlab:dev-1.xfrom
Open
Conversation
…ions in timm.py to adapt to the case of multiple submodels.
Collaborator
|
please sign the CLA so that I can review your PR. |
Member
|
Hello, can you sign the CLA and fix the lint problem? Then we can merge the PR. @wilxy |
Author
Thanks for the reminder, I've signed the CLA and fixed the lint problem. |
Collaborator
|
Hi @wilxy , Can you migrate this PR to the main branch? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When using
TimmClassifieras student or teacher model in Knowledge Distillation Algorithms, there have some bugs insave_checkpointandload_checkpoint.save_checkpoint
When saving checkpoint like
save_checkpoint(self.state_dict(), 'xxx.pth'), whereselfis a Knowledge Distillation Algorithm which contains submodelsself.studentandself.teacher,self.state_dict()will recursively call the state_dict function here.The
_remove_state_dict_prefixfunction in theTimmClassifierclass will be used as a hook to modify the originaldestination.Specifically, the
_remove_state_dict_prefixfunction creates anew_state_dictwhose memory is different from the originaldestinationas thehook_resultto modify the originaldestinationfor submodelsstudentandteacher. But the state_dict funtion of the Knowledge Distillation Algorithm Model will not receive this modify, so the memory address and value ofdestinationhave not changed.To solve this problem, we change the
_remove_state_dict_prefixfunction to modify thestate_dictdirectly instead of creating anew_state_dict.load_checkpoint
When loading checkpoint of a Knowledge Distillation Algorithm Model whose student and teacher are all
TimmClassifier. The_add_state_dict_prefixfunction in theTimmClassifierclass will be used as a hook to modify thestate_dictof each submodel.When modifying the student submodel,
_add_state_dict_prefixfunction will delete all keys ofteachersubmodel.To solve this problem, we change the
_add_state_dict_prefixfunction to only delete the key that different from its new_key.