Gradient Descent

Logistic Regression 2 – Cost Function, Gradient Descent and Other Optimization Algorithms

We have discussed the basic ideas of logistic regression in previous post. The purpose of logistic regression is to find the optimal decision boundary which can classify the data with different categorical target feature into different classes. We also introduced the logistic function or sigmoid function as the regression model to find the optimal decision boundary. Now let’s take a look how to achieve it. Cost Function and Gradient Descent for Logistic Regression We can still use gradient descent to train the logistic regression model.

Linear Regression 4 - Learning Rate and Initial Weight

Choosing Learning Rate We introduced an important parameter, the learning rate \(\alpha\), in Linear Regression 2 – Gradient Descent without discussing how to choose its value. In fact, the choice of the learning rate affects the performance of the algorithm significantly. It determines the convergence speed of the gradient descent algorithm, which is the number of iteration to reach the minimum. The below figures, we call it learning graph, show how different learning rates impact the speed of the algorithm.

Linear Regression 2 - Gradient Descent

Why We Need Gradient Descent In the previous article, Linear Regression 1 – Simple Linear Regression and Cost Function, we introduced the concept of simple linear regression, which is basically to find a regression line model $$M_w(x) = w_0 + w_1x_1$$ so that the prediction \(M_w(x)\) is as close to the \(y\) of our training data \((x,y)\) as possible. To find the best fit regression line, we are actually finding the optimal combination of the weight parameters \(w_0\) and \(w_1\) and trying to minimize the errors between the predictions and the actual values of target feature \(y\).