- What is Machine Learning?
- What is Natural Language Processing?
- Brief insights on how natural language is studied.
- Why Machine Learning over rule based methods?
- Understanding the NLP Pipeline!
Preprocessing
■
Cleaning
➢
Regular Expression
■
Chunking
➢
Paragraph Detection
➢
Sentence Boundary Detection
➢
Sentence to Words
Feature Engineering for ML techniques
■
Grammar
■
Words: Meanings, Synonyms, Antonyms, Part Of Speech (Verb, Adverb, Cardinal) etc.
■
Named Entity Relation, Dependency Parsing, Coreference Resolution etc.
■
Word Normalization: Lemmatization and Stemming
■
Keyword recognition
■
WordNet, Synsets, Stanford Core NLP Parser, spaCy, NLTK
Vector representation and Word Embeddings
■ Why vector or embedding is required?
■ Bag of Words, n-gram Model
■ Skip - gram model
■ Count Vectorizer
■ Term Frequency - Inverse Document Frequency Vectorizer
■ Hashing Vectorizer
■ Automatic Feature selection and vector representation in DL techniques
⟶ Word2Vec
⟶ GloVe
How Scikit learn (sklearn) library comes to the rescue!
Data and ML cookstart!!
■ Corpus
■ Training, Testing and Validation Phase
■ K-fold Cross Validation
Training the Machine Learning Model
■ Evaluation Metrics
How to make your model better and improve the performance?
■ Error Analysis
- Demo and code walk through
Build Automatic Question Answering (AQuA) system which can be a quick document-analyzer providing relevant answers to related question from the document.
For two simple examples:
■
If we pass on a document of Roger Federer, the system when asked about his last Wimbledon Championship Title will answer “2017, against Marin Cilic.”
■
If we pass on a Wiki-document of Google, the system when asked about the founder(s) will answer “Larry Page and Sergey Brin”
AQuA can be trained and applied across range of domains and with diverse applications and can save immense amount of time of reading a large-content document.
- Question and Answers (15 mins)