Support Vector Machines Regression with Python

This post will provide an example of how to do regression with support vector machines SVM. SVM is a complex algorithm that allows for the development of non-linear models. This is particularly useful for messy data that does not have clear boundaries.

The steps that we will use are listed below

  1. Data preparation
  2. Model Development

We will use two different kernels in our analysis. The LinearSVR kernel and SVR kernel. The difference between these two kernels has to do with slight changes in the calculations of the boundaries between classes.

Data Preparation

We are going to use the OFP dataset available in the pydataset module. This dataset was used previously for classification with SVM on this site. Our plan this time is that we want to predict family inc (famlinc), which is a continuous variable.  Below is some initial code.

import numpy as np
import pandas as pd
from pydataset import data
from sklearn import svm
from sklearn import model_selection
from import mse

We now need to load our dataset and remove any missing values.


AS in the previous post, we need to change the text variables into dummy variables and we also need to scale the data. The code below creates the dummy variables, removes variables that are not needed, and also scales the data.

df=df.rename(index=str, columns={"yes": "black_person"})
df=df.drop('no', axis=1)

df=df.rename(index=str, columns={"male": "Male"})
df=df.drop('female', axis=1)

df=df.rename(index=str, columns={"yes": "job"})
df=df.drop('no', axis=1)

df=df.rename(index=str, columns={"no": "single"})
df=df.drop('yes', axis=1)

df=df.rename(index=str, columns={"yes": "insured"})
df=df.drop('no', axis=1)
df = (df - df.min()) / (df.max() - df.min())


We now need to set up our datasets. The X dataset will contain the independent variables while the y dataset will contain the dependent variable


We can now move to model development

Model Development

We now need to create our train and test sets for or X and y datasets. We will do a 70/30 split of the data. Below is the code


Next, we will create our two models with the code below.


We will now run our first model and assess the results. Our metric is the mean squared error. Generally, the lower the number the better.  We will use the .fit() function to train the model and the .predict() function for test the model


The mse was 0.27. This number means nothing only and is only beneficial for comparison reasons. Therefore, the second model will be judged as better or worst only if the mse is lower than 0.27. Below are the results of the second model.


We can see that the mse for our second model is 0.34 which is greater than the mse for the first model. This indicates that the first model is superior based on the current results and parameter settings.


This post provided an example of how to use SVM for regression.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.