Coursera Capstone Project

Coursera Data Science Specialization Capstone Project

This is the summary of works for the capstone project of Coursera Data Science Specialization. The purpose of project is to build a model that can predict the “next” word according to several words that immediately come before it. In this series, we are going to divide the discussion into three parts. The first part takes a look on the characteristic, such as $N-gram$ frequency and text length analysis, of the available text material, which will be the basis of the model.

Next Word Prediction using Katz Backoff Model - Part 3: Prediction Model Implementation

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed and found some characteristics of the training dataset that can be made use of in the implementation. We have also discussed the Good-Turing smoothing estimate and Katz backoff model that powering our text prediction application in Part 2.

Next Word Prediction using Katz Backoff Model - Part 2: N-gram model, Katz Backoff, and Good-Turing Discounting

Executive Summary The Capstone Project of the Johns Hopkins Data Science Specialization is to build an NLP application, which should predict the next word of a user text input. In Part 1, we have analysed the data and found that there are a lot of uncommon words and word combinations (2- and 3-grams) can be removed from the corpora, in order to reduce memory usage and speed up the model building time.

Next Word Prediction using Katz Backoff Model - Part 1: The Data Analysis

Executive Summary The Capstone Project of the Data Science Specialization in Coursera offered by Johns Hopkins University is to build an NLP application, which should predict the next word of a user text input. This report will discuss the nature of the project and data, the model and algorithm powering the application, and the implementation of the application. Part 1 will focus on the analysis of the datasets provided, which will guide the direction on the implementation of the actual text prediction program.