# Simple Linear Regression Analysis

Simple linear regression analysis is a technique that is used to model the dependency of one dependent variable upon one independent variable. This relationship between these two variables is explained by an equation.

When regression is employed normally the data points are graphed on a scatterplot. Next, the computer draws what is called the “best-fitting” line. The line is the best fit because it reduces the amount of error between actual values and predicted values in the model. The official name of the model is the least square model in that it is the model with the least amount of error. As such, it is the best model for predicting future values

It is important to remember that one of the great enemies of statistics is explaining error or residual. In general, any particular data point that is not the mean is said to have some error in it. For example, if the average is 5 and one of the data points is three 5 -3 = 2 or an error of 2. Statistics often want to explain this error. What is causing this variation from the mean is a common question.

There are two ways that simple regression deals with error

1. The error cannot be explained. This is known as unexplained variation.
2. The error can be explained. This is known as explained variation.

When these two values are added together you get the total variation which is also known as the “sum of squares for error.”

Another important term to be familiar with is the standard error of estimate. The standard error of estimate is a measurement of the standard deviation of the observed dependent variables values from predicted values of the dependent variable. Remember that there is always a slight difference between observed and predicted values and the model wants to explain as much of this as possible.

In general, the smaller the standard error the better because this indicates that there is not much difference between observed data points and predicted data points. In other words, the model fits the data very well.

Another name for the explained variation is the coefficient of determination. The coefficient of determination is the amount of variation that is explained by the regression line and the independent variable. Another name for this value is the r². The coefficient of determination is standardized to have a value between 0 to 1 or 0% to 100%.

The higher your r² the better your model is at explaining the dependent variable. However, there are a lot of qualifiers to this statement that goes beyond this post.

Here are the assumptions of simple regression

• Linearity–The mean of each error is zero
• Independence of error terms–The errors are independent of each other
• Normality of error terms–The error of each variable is normally distributed
• Homoscedasticity–The variance of the error for the value of each variable is the same

There are many ways to check all of this in SPSS which is beyond this post.

Below is an example of simple regression using data from a previous post

You want to know how strong is the relationship of the exam grade on the number of words in the students’ essay. The data is below

1                             79                           147
2                             76                           143
3                             78                           147
4                             84                           168
5                             90                           206
6                             83                           155
7                             93                           192
8                             94                           211
9                             97                           209
10                          85                           187
11                          88                           200
12                          82                           150

Step 1: Find the Slope (The computer does this for you)
slope = 3.74

Step 2: Find the mean of X (exam grade) and Y (words on the essay) (Computer does this for you)
X (Exam grade) = 85.75        Y (Words on Essay) = 176.25

Step 3: Compute the intercept of the simple linear regression (computer does this)
-145.27

Step 4: Create linear regression equation (you do this)
Y (words on essay) = 3.74*(exam grade) – 145.27
NOTE: you can use this equation to predict the number of words on the essay if you know the exam grade or to predict the exam grade if you know how many words they wrote in the essay. It is simple algebra.

Step 5: Calculate Coefficient of Determination r² (computer does this for you)
r² = 0.85
The coefficient of determination explains 85% of the variation in the number of words on the essay. In other words, exam grades strongly predict how many words a student will write in their essay.