Institute of Cognitive Science,
49090 Osnabrück, Germany
Deep Learning Models for Long-Distance Dependencies
Memory is the ability to store, retain, and retrieve information on request. Memory is also responsible for reasoning, attention, imagination etc. Cognitive memory could be divided into short-term and long-term memory.
In my project, I am investigating the ability of deep learning models to learn long distance dependencies. It is important for many fields such as Speech Recognition, Speech Translation, Video Understanding, and Stock Prediction.
For instance, when summarizing a book, the summary would be more accurate if we are able to take the whole book as an input to the network instead of few pages or paragraphs. Similarly, in computer vision, understanding a video would require the ability to remember the sequence of actions going on. In addition, having a long-range memory would make use of long sequence data. For example, when predicting stock market prices, having the ability of using the complete history of the market gives us a huge advantage for correct prediction.
For a network to be able to handle long sequences, it should have the following criteria:
• Handle variable-length sequences
• Track long-term dependencies
• Maintain information about order
• Share parameters across the sequence
For long time, Recurrent Neural Networks (RNN) dominated the field. Unlike, ANN and CNN, RNN were able to model sequence data. However, RNN models including LSTM, GRU etc. faced two major problems: 1) they are difficult to train, and 2) vanishing and exploding gradients, which restrict them of learning long range dependencies.
To solve these problems, a new neural network architecture was created called Transformers. Transformers discarded the use of RNN and uses Self-Attention mechanism. They are the current state of the art systems in most of the NLP tasks such as translation, language modelling, sentiment analysis of a text etc. However, the main limitation of the Transformers is the complexity issue. The original Transformer model has a self-attention component with O(n2) time and memory complexity where n is the input sequence length. Many works done to change complexity to linear or O(n log(n)). This is still an active research area, where I hope I will contribute to.
Mohamad Ballout, Mohamad Tuqan, Daniel Asmar, Elie Shammas, George Sakr (2020) The benefits of Synthetic Data for Action Categorization. Internation Joint Conference on Neural Networks (IJCNN), 1-8
PDF (not RTG related)