NLP

Fast.ai Course v3 Lesson 4 Notes

Sentiment analysis (IMDB) Transfer learning in NLP Difficulties in training NLP model: How to speak a language. World knowledge Start with a pre-trained model: a language model - a model that learns to predict the next word of a sentence. Get benefit from a pre-trained model from a much bigger dataset, e.g. Wikipedia. No preset label is needed: self-supervised learning - Yann LeCun, labels still exist in this kind of classification problem, but not created by human, instead, are built into the data set.

Next Word Prediction using Katz Backoff Model - Part 3: Prediction Model Implementation

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed and found some characteristics of the training dataset that can be made use of in the implementation. We have also discussed the Good-Turing smoothing estimate and Katz backoff model that powering our text prediction application in Part 2.

Next Word Prediction using Katz Backoff Model - Part 2: N-gram model, Katz Backoff, and Good-Turing Discounting

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed the data and found that there are a lot of uncommon words and word combinations (2- and 3-grams) can be removed from the corpora, in order to reduce memory usage and speed up the model building time.

Next Word Prediction using Katz Backoff Model - Part 1: The Data Analysis

Executive Summary The Capstone Project of the Data Science Specialization in Coursera offered by Johns Hopkins University is to build an NLP application, which should predict the next word of a user text input. This report will discuss the nature of the project and data, the model and algorithm powering the application, and the implementation of the application. Part 1 will focus on the analysis of the datasets provided, which will guide the direction on the implementation of the actual text prediction program.