论文笔记（4）

1. This paper introduces gated-feedback RNN(GF-RNN), which is inspired by clockwork RNN(CW-RNN). The CW-RNN implemented this by allowing the i-th module to operate at the rate of $2^{i-1}$ , where i is a positive integer, meaning the module is updated only when t mode $2^{i-1}$ = 0. This makes each module to operate at different rates. In addition, they precisely defined the connectivity pattern between modules by allowing the i-th module to be affected by j-th module when j>i. (I think it is because when j-module is updated, we can be sure that i-module is updated.) Similar to the CW-RNN, the author partitions the hidden units into multiple modules in which each module corresponds to a different layer in a stack of recurrent layers. the author let each module operate at different timescales by hierarchically stacking them. Each module is fully connected to all the other modules across the stack and itself. The recurrent connection between two module, instead, is gated by a logistic unit which is computed based on the current input and the previous states of the hidden layers. It is called a global reset gate.

The network is as follows:

Bullets correspond to global reset gates, which is computed as:

$h_{t-1}^{*}$ is the concatenation of all the hidden states from time step t-1. The superscript means transition from layer i to layer j. w and u are weight vectors. In other words, the signal from to is controlled by a single scalar g.

2. We can see the GF-RNN further allows information from the upper recurrent layer, corresponding to coarser timescale, flows back into the lower recurrent layers, corresponding to finer timescales.

3. Recurrent Neural Network. The formula below shows how to compute hidden states.

And language modeling:

And we train an RNN to model this distribution by letting it predict given

4. Long Short Term Memory

An LSTM unit consists of a memory cell , an input gate , a forget gate , and an output gate means the content of j-th LSTM unit at time step t(means j-th dimension of . The formulas are as follows:

The input and forget gates control how much new content should be memorized and how much old content should be forgotten, respectivaly. The output gate controls to which degree the memory content is exposed.

5. Gated Recurrent Unit

$h_{t-1}^j$ and $\widetilde h_t^j$ respectively correspond to the previous memory content and the new candidate memory content. The update gate $z_t^j$ controls how much of previous memory content is to be forgotten and how much of the new memory content is to be add.

Reset gate decides how much to ignore previous hidden states.

The reset mechanism helps the GRU to use the model capacity efficiently by allowing it to reset whenever the detected feature is not necessary anymore.

6. Considering GF-RNN, the gates are modified like that:

7. There are two experiments, Language Modeling and Python Program Evaluation. See the paper for more details.

BlueCatの窝

关注机器学习，深度学习，自然语言处理，强化学习等人工智能新技术。

入门RNN

相关数据

神经网络技术

（人工）神经网络是一种起源于 20 世纪 50 年代的监督式机器学习模型，那时候研究者构想了「感知器（perceptron）」的想法。这一领域的研究者通常被称为「联结主义者（Connectionist）」，因为这种模型模拟了人脑的功能。神经网络模型通常是通过反向传播算法应用梯度下降训练的。目前神经网络有两大主要类型，它们都是前馈神经网络：卷积神经网络（CNN）和循环神经网络（RNN），其中 RNN 又包含长短期记忆（LSTM）、门控循环单元（GRU）等等。深度学习是一种主要应用于神经网络帮助其取得更好结果的技术。尽管神经网络主要用于监督学习，但也有一些为无监督学习设计的变体，比如自动编码器和生成对抗网络（GAN）。

来源：机器之心