The Simple Linear Regression can only handle the relationship between the target feature and one descriptive feature, which is not often the case in real life. For example, the number of features in the dataset of our toy example is now expanded to 4, including target feature *Rental Price*:

Size | Rental Price | Floor | Number of bedroom |
---|---|---|---|

350 | 1840 | 15 | 1 |

410 | 1682 | 3 | 2 |

430 | 1555 | 7 | 1 |

550 | 2609 | 4 | 2 |

… | … | … | … |

To generalize simple linear regression to multivariate linear regression is straightforward. We just need to add other parameters to the model as below

where \(w_j\) is the weight of feature \(j\) and \(x_0\) is always equal to 1. Both \(\mathbf{w}\) and \(\mathbf{x}\) are (n+1) dimension real number vectors.

## Gradient Descent for Multiple Variables

The cost function requires just a slight change for the new regression model:

and the gradient descent algorithm now becomes:

for j = 1 to n.

## How Should We Start?

At this moment, we have already introduced the basic of linear regression in this and previous posts. The main concept is first find out the cost function to measure the error and second, to iteratively update weights \(w_j\) using gradient descent until the optimum is reached or other stopping criteria are met. You may start trying to test linear regression on your own dataset, and you may also immediately encounter two questions: how to define the initial values of weights, and what is the learning rate? We shall find out how should we start in the next article.