Thanks for your code. But I have a problem. In the raw paper, theta represent all paramaters ,including "E; b(2); b(4); cl(w1); cr(wn); W (2);W (4); W (l); W (r); W (sl); W (sr)", these are all trained while training. But in your code, there are no "W (l); W (r); W (sl); W (sr)" and "cl(w1)" always setted to be zero vector because of the use of biRNN, is that a problem?Thanks!
Thanks for your code. But I have a problem. In the raw paper, theta represent all paramaters ,including "E; b(2); b(4); cl(w1); cr(wn); W (2);W (4); W (l); W (r); W (sl); W (sr)", these are all trained while training. But in your code, there are no "W (l); W (r); W (sl); W (sr)" and "cl(w1)" always setted to be zero vector because of the use of biRNN, is that a problem?Thanks!