# Logistic Regression 1 – Classification, Logistic Regression and Sigmoid Function

In previous series of posts we discussed [simple and multivariate linear regression]() that can be used to predict target features with continuous values. Besides that, there are other prediction problems with categorical target features and we want to train a model so that we can use it to predict the class of unknown data. Logistic regression is one of these models.

## Classification Problem

Imagine we have a tumor dataset which contains the *Size* of tumor and whether the tumor is *Malignant* or not. Assume that the class of a tumor can be identified by its size so that it is malignant if its size is bigger than a certain value. In this example, the data is a positive case if the tumor is malignant. We would like to find a model that can classify the tumors according to their size, as shown in below figure:

Since we are trying to classify the tumor by its size, it will be ideal if we can have the model like this:

Where \(\mathbf{x}\) is input feature vector, in this case is the size of tumor. Then we can use the model to separate different classes of tumor:

## Logistic Function

One of the problems of using this model is to find out the weights \(\mathbf{w\) since the *decision boundary*, as shown in previous figure, is discontinuous and we cannot use gradient descent to improve the model. We want a function that is continuous so that we can use the same optimization method as in linear regression, and behave similar to the previous model to separate two classes. The **logistic function**, or **sigmoid function** is the function that serve this purpose. The sigmoid function \(g(x)\) is given as:

If we plot the sigmoid function of \(x\) we can see that it behaves very similarly to the previous model. The use of logistic function as the regression model gives the name **logistic regression**.

Remember that our goal is to find the weight vector \(\mathbf{w}\) so that given a set of descriptive features \(\mathbf{x}\) for an instance, we can classify it whether it belongs to the interested class or not. Now we should replace \(\mathbf{x}\) in the sigmoid function with \(\mathbf{w}\cdot \mathbf{x}\) and the model is representing the probability given a particular input (\mathbf{w}\cdot \mathbf{x}\):

For example, given a sample instance with \(x_0=1\) and \(x_1\) equals the tumor size, \(M_w(x)\) is 0.7 means the probability of the tumor being malignant is 70%.

Equipped with the above model, we can try to predict the class of a given instance. To get there, we need to first train the model. Moreover, in many real life cases, the classes cannot be simply separated by a straight line, as the tumor example above. We shall discuss these two topics in the coming posts.