<Recurrent neural network based language model>
1.Define language model: The link below introduces the definition of language model, which is P(S), we all know the probability theory:
Naive Bayes ignores the context, which means:
This is called unigram, similarly, if the context equals one, then it is called bigram, and if the context equals two, then it is called trigram. We also have n-gram.
This is a great article introducing language model: 漫谈 Language Model (1): 原理篇
2. This paper introduces RNNLM. We usually call it simple recurrent neural network or Elman network. x(t) is input layer, s(t) is hidden layer (or state), y(t) is output layer, w(t) means current word. They are all vectors. This plus notation in the first line means concatenating since they have different dimensions. f and g means sigmoid and softmax function. Obviously y(t) is a probability distribution of next word.
And the error function is easy:
desired(t) is a one-hot vector representing the ground truth.
3. For rare words, we have:
If w(t+1) is rare, it obeys uniform distribution, otherwise we refer to