Python
]
Linear Regression in Python
Linear regression, as its name suggests, predict the value of the outcome variable y
by finding out a linear relationship between input x
and output y
. It is a simple approach for supervised learning and is especially useful for predicting a quantitative response. Depending on the number of predictors x
, we can classify linear regression models into two types: binary linear regression and multiple linear regression.
For this tutorial, we are going to use multiple linear regression to predict the chance of graduate school admissions. The dataset can be downloaded here.
We can import all the required libraries and our dataset:
Using the LinearRegression()
function from sklearn
, we can fit a linear regression model to our data.
train_test_split()
is an especially useful function. It splits a given dataset into random train and test subsets. The argument test_size
varies between 0.0 and 1.0, and represents the proportion of the dataset to include in the train split. Another argument random_state
represents the seed used by the random number generator.
Let’s fit the linear regression model to our data:
We can evaluate our model by calculating the mean squared error of the model.
Our mean squared error is computed to be 0.069, which is slightly less than 10% of the actual mean 0.724.
This indicates that our linear regression model is reasonably accurate and can make good predictions!
References: