Diatoz

August 21, 2021

Transformers for Natural Language Processing (Deep learning for NLP)

Natural Language Processing is the field of computer science which uses computer to process natural human spoken and written language. This field is widely classified into two subdomains Natural Language Understanding and Natural Language Generation. Transformers are the kind of deep learning models which are the state of the art model in all the Natural Language processing tasks and benchmarks. To understand about the transformers better, we should know about its predecessors like RNN, Seq2seq models and model with Attention mechanism. Lets get a deep dive into natural language processing using transformers.

RNN - LSTM AND GRU

RNN - Recurrent Neural Network are the kind of neural network that can process sequential data or time-series data and primarily used in natural language processing because the sentences are treated as sequential and one word relates to other word in the sentence. It is a neural network where connections between nodes form a directed graph along a temporal sequence. This allows it to exhibit temporal dynamic behavior. Derived from feedforward neural networks, RNNs can use their internal state (memory) to process variable length sequences of inputs.

Types of RNN based on architecture:

LSTM - Long short term memory
GRU - Gated Recurrent Unit

Sequence to Sequence Model

Seq2Seq models consist of an Encoder and a Decoder. The Encoder takes the input sequence and maps it into a higher dimensional space (n-dimensional vector). And Decoder takes the vectors from encoder and gives the output sequence. A basic choice for the Encoder and the Decoder of the Seq2Seq model is a single LSTM for each of them.

For example: Take two translators who only know two languages, one is their mother tongue and other is one imaginary language. If two of know the imaginary language and does not know each others mother tongues, we can do translation by first translating one's language to common imaginary language (encoding) and then the common language will be used for translating into other one's language.

Attention Mechanism

Attention Mechanism is introduced look at important part in a sequence for each step to give weightage considering the whole sequence. While reading the long paragraph, we have to keep some important keywords coming the top part of the paragraph in order to get the context in the bottom port of the paragraph.

Transformer architecture

transformer diagram - Attention is all you need

Like LSTM, Transformer is an architecture for transforming one sequence into another one with the help of two parts (Encoder and Decoder) but it does not use any RNN (LSTM or GRU). RNNs were one of the best ways to capture the timely dependencies in a sequence. But architecture with only attention-mechanisms has shown improvement on the results in translation task and other NLP tasks.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding has given a milestone in using the transformers architecture to most of the Natural Language processing tasks.

Popular Transformers

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
GPT2: Language Models Are Unsupervised Multitask Learners
XLNet: Generalized Autoregressive Pretraining for Language Understanding
RoBERTa: A Robustly Optimized BERT Pretraining Approach
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
GPT3: Language Models Are Few-Shot Learners
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
DeBERTa: Decoding-enhanced BERT with Disentangled Attention

Applications of NLP

Text classification - Email Spam detection
Named Entity Recognition - Names, location, quantity, etc... while Google search
Question answering - Responses for Virtual Assistants like Siri, Google Assistant
Speech recognition and synthesis - Speech to text and Text to Speech
Topic modelling - Smart Tagging, Clustering
Machine translation - Google Translate
Language Modelling - BERT for Google Search
Automatic image captioning - Image understanding and question answering

Transformers for Natural Language Processing (Deep learning for NLP)

Diatoz

August 21, 2021

Transformers for Natural Language Processing (Deep learning for NLP)

Add Your Comment

Categories

Business

Consulting

Financial Planning

Applied AI

Recent Posts

HOW AI HELPS TO MAKE A SUCCESSFUL BUSINESS

COVID impacts and benefits with technology

MARKETING CHANGES IN A DIGITAL WORLD

Object detection using Neural Networks (Deep learning)