Transformers for Natural Language Processing (Deep learning for NLP)
Natural Language Processing is the field of computer science which uses computer to process natural human spoken and written language. This field is widely classified into two subdomains Natural Language Understanding and Natural Language Generation. Transformers are the kind of deep learning models which are the state of the art model in all the Natural Language processing tasks and benchmarks. To understand about the transformers better, we should know about its predecessors like RNN, Seq2seq models and model with Attention mechanism. Lets get a deep dive into natural language processing using transformers.
RNN - LSTM AND GRU
RNN - Recurrent Neural Network are the kind of neural network that can process sequential data or time-series data and primarily used in natural language processing because the sentences are treated as sequential and one word relates to other word in the sentence. It is a neural network where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.
Types of RNN based on architecture:
-
LSTM - Long short term memory
-
GRU - Gated Recurrent Unit
Sequence to Sequence Model
Seq2Seq models consist of an Encoder and a Decoder. The Encoder takes the input sequence and maps it into a higher dimensional space (n-dimensional vector). And Decoder takes the vectors from encoder and gives the output sequence. A basic choice for the Encoder and the Decoder of the Seq2Seq model is a single LSTM for each of them.
For example: Take two translators who only know two languages, one is their mother tongue and other is one imaginary language. If two of know the imaginary language and does not know each others mother tongues, we can do translation by first translating one's language to common imaginary language (encoding) and then the common language will be used for translating into other one's language.
Attention Mechanism
Attention Mechanism is introduced look at important part in a sequence for each step to give weightage considering the whole sequence. While reading the long paragraph, we have to keep some important keywords coming the top part of the paragraph in order to get the context in the bottom port of the paragraph.
Transformer architecture
transformer diagram - Attention is all you need
Like LSTM, Transformer is an architecture for transforming one sequence into another one with the help of two parts (Encoder and Decoder) but it does not use any RNN (LSTM or GRU). RNNs were one of the best ways to capture the timely dependencies in a sequence. But architecture with only attention-mechanisms has shown improvement on the results in translation task and other NLP tasks.
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding has given a milestone in using the transformers architecture to most of the Natural Language processing tasks.
Popular Transformers
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- GPT2: Language Models Are Unsupervised Multitask Learners
- XLNet: Generalized Autoregressive Pretraining for Language Understanding
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
- T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- GPT3: Language Models Are Few-Shot Learners
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
- DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Applications of NLP
- Text classification - Email Spam detection
- Named Entity Recognition - Names, location, quantity, etc... while Google search
- Question answering - Responses for Virtual Assistants like Siri, Google Assistant
- Speech recognition and synthesis - Speech to text and Text to Speech
- Topic modelling - Smart Tagging, Clustering
- Machine translation - Google Translate
- Language Modelling - BERT for Google Search
- Automatic image captioning - Image understanding and question answering