Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRCN模型中5层lstm的stacked的输入疑问 #8

Open
chenmozxh opened this issue Nov 22, 2019 · 0 comments
Open

DRCN模型中5层lstm的stacked的输入疑问 #8

chenmozxh opened this issue Nov 22, 2019 · 0 comments

Comments

@chenmozxh
Copy link

chenmozxh commented Nov 22, 2019

我理解论文中公式6的意思是,第l层t时刻的输入为(1)第l-1层t时刻隐向量,(2)第l-1层的attention向量,(3)第l-1层t时刻的输入, 三者contact起来为第l层t时刻的输入。
而代码是如下:

` for j in range(5):
with tf.variable_scope(f'p_lstm_{i}{j}', reuse=None):
p_state, _ = self.BiLSTM(tf.concat(p_state, axis=-1))
with tf.variable_scope(f'p_lstm
{i}_{j}' + str(i), reuse=None):
h_state, _ = self.BiLSTM(tf.concat(h_state, axis=-1))

            p_state = tf.concat(p_state, axis=-1)
            h_state = tf.concat(h_state, axis=-1)
            # attention
            cosine = tf.divide(tf.matmul(p_state, tf.matrix_transpose(h_state)),
                               (tf.norm(p_state, axis=-1, keep_dims=True) * tf.norm(h_state, axis=-1, keep_dims=True)))
            att_matrix = tf.nn.softmax(cosine)
            p_attention = tf.matmul(att_matrix, h_state)
            h_attention = tf.matmul(att_matrix, p_state)

            # DesNet
            p = tf.concat((p, p_state, p_attention), axis=-1)
            h = tf.concat((h, h_state, h_attention), axis=-1)

`

所以,第j层的输入应该是p,而不是p_state
不知道我理解的对不对

还有一个细节,5层stacked的bilstm的输出,是要和原始字词的embedding拼接给到下一个5层stacked的bilstm?论文图1是这么画的,文字的话,好像没有提这一点
论文中还有一个pooling结构,在4个5层bilstm后面,输出如果是(30,100)的话(30个词, 每个词的embedding是100维),则进行按列进行max-pooling成100维的p、q向量,然后进行公示7的拼接,在进行3层dense。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant