Choosing Learning Rate We introduced an important parameter, the learning rate $$\alpha$$, in Linear Regression 2 – Gradient Descent without discussing how to choose its value. In fact, the choice of the learning rate affects the performance of the algorithm significantly. It determines the convergence speed of the gradient descent algorithm, which is the number of iteration to reach the minimum. The below figures, we call it learning graph, show how different learning rates impact the speed of the algorithm.
Why We Need Gradient Descent In the previous article, Linear Regression 1 – Simple Linear Regression and Cost Function, we introduced the concept of simple linear regression, which is basically to find a regression line model $$M_w(x) = w_0 + w_1x_1$$ so that the prediction $$M_w(x)$$ is as close to the $$y$$ of our training data $$(x,y)$$ as possible. To find the best fit regression line, we are actually finding the optimal combination of the weight parameters $$w_0$$ and $$w_1$$ and trying to minimize the errors between the predictions and the actual values of target feature $$y$$.