Next Word Prediction using Katz Backoff Model - Part 2: N-gram model, Katz Backoff, and Good-Turing Discounting

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed the data and found that there are a lot of uncommon words and word combinations (2- and 3-grams) can be removed from the corpora, in order to reduce memory usage and speed up the model building time.