Coursera Data Science Specialization Capstone Project
This is the summary of works for the capstone project of Coursera Data Science Specialization. The purpose of project is to build a model that can predict the “next” word according to several words that immediately come before it. In this series, we are going to divide the discussion into three parts. The first part takes a look on the characteristic, such as $N-gram$ frequency and text length analysis, of the available text material, which will be the basis of the model. Second post discusses the algorithms that serve as the building blocks of the prediction model. These algorithms include Katz’s backoff model and Good-Turing discounting. The final part talks about the implementation of the algorithms and other details on building the prediction application.