Hi, thanks for releasing this awesome code! Currently, i am working on reproducing the result on cityscapes in paper. I found that in paper the description of mtl update equation say the weights of task specific subnetwork should be updated with original learning rate, then the shared weights of network is updated with the MGDA algorithm. But i didnt find the corresponding implementation in code where both the shared weights and task specific weights are updated consistently by timing loss of different task with a weight factor determined by MGDA. Am i missing something here, or is this a implemention trick?
Hi, thanks for releasing this awesome code! Currently, i am working on reproducing the result on cityscapes in paper. I found that in paper the description of mtl update equation say the weights of task specific subnetwork should be updated with original learning rate, then the shared weights of network is updated with the MGDA algorithm. But i didnt find the corresponding implementation in code where both the shared weights and task specific weights are updated consistently by timing loss of different task with a weight factor determined by MGDA. Am i missing something here, or is this a implemention trick?