K Nearest Neighbor Regression with Python

K Nearest Neighbor Regression (KNN) works in much the same way as KNN for classification. The difference lies in the characteristics of the dependent variable. With classification KNN the dependent variable is categorical. WIth regression KNN the dependent variable is continuous. Both involve the use neighboring examples to predict the class or value of other examples.

This post will provide an example of KNN regression using the turnout dataset from the pydataset module. Our purpose will be to predict the age of a voter through the use of other variables in the dataset. Below is some initial code.

from pydataset import data
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error

We now need to setup our data. We need to upload our actual dataset. Then we need to separate the independnet and dependent variables. Once this is done we need to create our train and test sets using the tarin test spli t funvtion. Below is the code to accmplouh each of these steps.

df=data("turnout")
X=df[['age','income','vote']]
y=df['educate']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3,random_state=0)

We are now ready to train our model. We need to call the function we will use and determine the size of K, which will be 11 in our case. Then we need to train our model and then predict with it. Lastly, we will print out the mean squared error. This value is useful for comparing models but does not have much value by itself. The MSE is calculated by comparing the actual test set with the predicted test data. The code is below

clf=KNeighborsRegressor(11)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print(mean_squared_error(y_test,y_pred))
9.239

If we were to continue with model development we may look for ways to improve our MAE through different nethods such as regular linear regression. However, for our purposes this is adequate.

Conclusison

This post provides an example of regression with KNN in Python. This tool is a practical and simple way to make numeric predictions that can be accurate at times.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.