Hello,
I am confused about the shape of offset.
The paper mentions:
"The grid R defines the receptive field size and dilation. For example,R = {(−1,−1),(−1,0),...,(0,1),(1,1)}.
In deformable convolution, the regular grid R is aug- mented with offsets {∆pn |n = 1, ..., N }, where N = |R|.
The output offset fields have the same spatial resolution with the input feature map. The channel dimension 2N corresponds to N 2D offsets."
So, I think the shape of offset field would be [29, H, W] if 3x3 kernel is used.
While in your implementation, the shape of offset seems to be [batch_size, 2n_channels, H, W]?