Elastic net regression combines the power of ridge and lasso regression into one algorithm. What this means is that with elastic net the algorithm can remove weak variables altogether as with lasso or to reduce them to close to zero as with ridge. All of these algorithms are examples of regularized regression.

This post will provide an example of elastic net regression in Python. Below are the steps of the analysis.

- Data preparation
- Baseline model development
- Elastic net model development

To accomplish this, we will use the Fair dataset from the pydataset library. Our goal will be to predict marriage satisfaction based on the other independent variables. Below is some initial code to begin the analysis.

from pydataset import data

import numpy as np

import pandas as pd

pd.set_option('display.max_rows', 5000)

pd.set_option('display.max_columns', 5000)

pd.set_option('display.width', 10000)

from sklearn.model_selection import GridSearchCV

from sklearn.linear_model import ElasticNet

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error

**Data Preparation**

We will now load our data. The only preparation that we need to do is convert the factor variables to dummy variables. Then we will make our and y datasets. Below is the code.

df=pd.DataFrame(data('Fair'))

df.loc[df.sex== 'male', 'sex'] = 0

df.loc[df.sex== 'female','sex'] = 1

df['sex'] = df['sex'].astype(int)

df.loc[df.child== 'no', 'child'] = 0

df.loc[df.child== 'yes','child'] = 1

df['child'] = df['child'].astype(int)

X=df[['religious','age','sex','ym','education','occupation','nbaffairs']]

y=df['rate']

We can now proceed to creating the baseline model** **

**Baseline Model**

This model is a basic regression model for the purpose of comparison. We will instantiate our regression model, use the fit command and finally calculate the mean squared error of the data. The code is below.

regression=LinearRegression()

regression.fit(X,y)

first_model=(mean_squared_error(y_true=y,y_pred=regression.predict(X)))

print(first_model)

1.0498738644696668

This mean standard error score of 1.05 is our benchmark for determining if the elastic net model will be better or worst. Below are the coefficients of this first model. We use a for loop to go through the model and the zip function to combine the two columns.

coef_dict_baseline = {}

for coef, feat in zip(regression.coef_,X.columns):

coef_dict_baseline[feat] = coef

coef_dict_baseline

Out[63]:

{'religious': 0.04235281110639178,

'age': -0.009059645428673819,

'sex': 0.08882013337087094,

'ym': -0.030458802565476516,

'education': 0.06810255742293699,

'occupation': -0.005979506852998164,

'nbaffairs': -0.07882571247653956}

We will now move to making the elastic net model.

**Elastic Net Model**

Elastic net, just like ridge and lasso regression, requires normalize data. This argument is set inside the ElasticNet function. The second thing we need to do is create our grid. This is the same grid as we create for ridge and lasso in prior posts. The only thing that is new is the l1_ratio argument.

When the l1_ratio is set to 0 it is the same as ridge regression. When l1_ratio is set to 1 it is lasso. Elastic net is somewhere between 0 and 1 when setting the l1_ratio. Therefore, in our grid, we need to set several values of this argument. Below is the code.

elastic=ElasticNet(normalize=True)

search=GridSearchCV(estimator=elastic,param_grid={'alpha':np.logspace(-5,2,8),'l1_ratio':[.2,.4,.6,.8]},scoring='neg_mean_squared_error',n_jobs=1,refit=True,cv=10)

We will now fit our model and display the best parameters and the best results we can get with that setup.

search.fit(X,y)

search.best_params_

Out[73]: {'alpha': 0.001, 'l1_ratio': 0.8}

abs(search.best_score_)

Out[74]: 1.0816514028705004

The best hyperparameters was an alpha set to 0.001 and a l1_ratio of 0.8. With these settings we got an MSE of 1.08. This is above our baseline model of MSE 1.05 for the baseline model. Which means that elastic net is doing worse than linear regression. For clarity, we will set our hyperparameters to the recommended values and run on the data.

elastic=ElasticNet(normalize=True,alpha=0.001,l1_ratio=0.75)

elastic.fit(X,y)

second_model=(mean_squared_error(y_true=y,y_pred=elastic.predict(X)))

print(second_model)

1.0566430678343806

Now our values are about the same. Below are the coefficients

coef_dict_baseline = {}

for coef, feat in zip(elastic.coef_,X.columns):

coef_dict_baseline[feat] = coef

coef_dict_baseline

Out[76]:

{'religious': 0.01947541724957858,

'age': -0.008630896492807691,

'sex': 0.018116464568090795,

'ym': -0.024224831274512956,

'education': 0.04429085595448633,

'occupation': -0.0,

'nbaffairs': -0.06679513627963515}

The coefficients are mostly the same. Notice that occupation was completely removed from the model in the elastic net version. This means that this values was no good to the algorithm. Traditional regression cannot do this.

**Conclusion**

This post provided an example of elastic net regression. Elastic net regression allows for the maximum flexibility in terms of finding the best combination of ridge and lasso regression characteristics. This flexibility is what gives elastic net its power.