K Nearest Neighbor Regression (KNN) works in much the same way as KNN for classification. The difference lies in the characteristics of the dependent variable. With classification KNN the dependent variable is categorical. WIth regression KNN the dependent variable is continuous. Both involve the use neighboring examples to predict the class or value of other examples.
This post will provide an example of KNN regression using the turnout dataset from the pydataset module. Our purpose will be to predict the age of a voter through the use of other variables in the dataset. Below is some initial code.
from pydataset import data import pandas as pd from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsRegressor from sklearn.metrics import mean_squared_error
We now need to setup our data. We need to upload our actual dataset. Then we need to separate the independnet and dependent variables. Once this is done we need to create our train and test sets using the tarin test spli t funvtion. Below is the code to accmplouh each of these steps.
df=data("turnout") X=df[['age','income','vote']] y=df['educate'] X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3,random_state=0)
We are now ready to train our model. We need to call the function we will use and determine the size of K, which will be 11 in our case. Then we need to train our model and then predict with it. Lastly, we will print out the mean squared error. This value is useful for comparing models but does not have much value by itself. The MSE is calculated by comparing the actual test set with the predicted test data. The code is below
clf=KNeighborsRegressor(11) clf.fit(X_train,y_train) y_pred=clf.predict(X_test) print(mean_squared_error(y_test,y_pred))
If we were to continue with model development we may look for ways to improve our MAE through different nethods such as regular linear regression. However, for our purposes this is adequate.
This post provides an example of regression with KNN in Python. This tool is a practical and simple way to make numeric predictions that can be accurate at times.