Category Archives: Uncategorized

Drag, Pan, & Zoom Elements with D3.js

Mouse events can be combined in order to create some  rather complex interactions using d3.js. Some examples of these complex actions includes dragging,  panning, and zooming. These events are handle with tools called behaviors. The behaviors deal with dragging, panning, and zooming.

In this post, we will look at these three behaviors in two examples.

  • Dragging
  • Panning and zooming

Dragging

Dragging allows the user to move an element around on the screen. What we are going to do is make three circles that are different colors that we can move around as we desire within the element. We start by setting the width, height of the svg element as well as the radius of the circles we will make (line 7). Next, we create our svg by appending it to the body element. We also set a black line around the element so that the user knows where the borders are (lines 8-14).

The next part involves setting the colors for the circles and then creating the circles and setting all of their attributes (lines 21 – 30). Setting the drag behavior comes later, and we use the .drag() and the .on() methods t create this behavior and the .call() method connects the information in this section to our circles variable.

The last part is the use of the onDrag function. This function retrieves the position of the moving element and transform the element within the svg element (lines 36-46). This involves using an if statement as well as setting attributes. If this sounds confusing, below is the code followed by a visual of what the code does.

1

If you look carefully you will notice I can never move the circles beyond the border. This is because the border represents the edge of the element. This is important because you can limit how far an element can travel by determining the size of the elements space.

Panning and Zooming

Panning allows you to move all visuals around at once inside an element. Zooming allows you to expand or contract what you see. Most of this code is a extension  of the what we did in the previous example. The new additions are explained below.

  1. A variable called zoomAction sets the zoom behavior by determining the scale of the zoom and setting the .on() method (Lines 9-11)
  2. We add the .call() method to the svg variable as well as the .append(‘g’) so that this behavior can be used (Lines 20-21).
  3. The dragAction variable is created to allow us to pan or move the entire element around. This same variable is placed inside a .call() method for the circles variable that was created earlier (Lines 40-46).
  4. Lines 48-60 update the position of the element by making two functions. The onDrag function deals with panning and the onZzoom function deal with zooming.

Below is the code and a visual of what it does.

You can clearly see that we can move the circles individually or as a group. In addition, you also were able to see how we could zoom in and out. Unlike the first example this example allows you to leave the border. This is probably due to the zoom capability.

Conclusion

The behaviors shared here provide additional tools that you can use as you design visuals using D3.js. There are other more practical ways to use these tools as we shall see.

Advertisements

Intro to Interactivity with D3.js

The D3.js provides many ways in which the user can interact with visual data. Interaction with a visual can help the user to better understand the nature and characteristics of the data, which can lead to insights. In this post, we will look at three basic examples of interactivity involving mouse events.

Mouse events are actions taken by the browser in response to some action by the mouse. The handler for mouse events is primarily the .on() method. The three examples of mouse events in this post are listed below.

  • Tracking the mouse’s position
  • Highlighting an element based on mouse position
  • Communicating to the user when they have clicked on an element

Tracking the Mouses’s Position

The code for tracking the mouse’s position is rather simple. What is new is Is that we need to create a variable that appends a text element to the svg element. When we do this we need to indicate the position and size of the text as well.

Next, we need to use the .on() method on the svg variable we created. Inside this method is the type of behavior to monitor which in this case is the movement of the mouse. We then create a simple way for the browser to display the x, y coordinates.  Below is the code followed by the actual visual.

1.png

You can see that as the mouse moves the x,y coordinates move as well. The browser is watching the movement of the mouse and communicating this through the changes in the coordinates in the clip above.

Highlighting an Element Based on Mouse Position

This example allows an element to change color when the mouse comes in contact with it. To do this we need to create some data that will contain the radius of four circles with their x,y position (line 13).

Next we use the .selectAll() method to select all circles, load the data, enter the data, append the circles, set the color of the circles to green, and create a function that sets the position of the circles (lines 15-26).

Lastly, we will use the .on() function twice. Once will be for when the mouse touches the circle and the second time for when the mouse leaves the circle. When the mouse touches a circle the circle will turn black. When the mouse leaves a circle the circle will return to the original color of green (lines 27-32). Below is the code followed by the visual.

1

Indicating when a User Clicks on an Element

This example is an extension of the previous one. All the code is the same except you add the following at the bottom of the code right before the close of the script element.

.on('click', function (d, i) {

alert(d + ' ' + i);

});

This .on() method has an alert inside the function. When this is used it will tell the user when they have clicked on an element and will also tell the user the radius of the circle as well what position in the array the data comes from. Below is the visual of this code.

Conclusion

You can perhaps see the fun that is possible with interaction when using D3.js. There is much more that can be done in ways that are much more practical than what was shown here.

Tweening with D3.js

Tweening is a tool that allows you to tell D3.js how to calculate attributes during transitions without keyframes tracking. The problem with keyframes tracking is that it can develop performance issues if there is a lot of animation.

We are going to look at three examples of the use of tweening in this post. The examples are as follows.

  • Counting numbers animation
  • Changing font size animation
  • Spinning shape animation

Counting Numbers Animation

This simple animation involves using the .tween() method to count from 0 to 25. The other information in the code determines the position of the element, the font-size, and the length of the animation.

In order to use the .tween()  method you must make a function. You first give the function a name followed by providing the arguments to be used. Inside the function  we indicate what it should do using the .interpolateRound() method which indicates to d3.js to count from 0 to 25. Below is the code followed by the animation.

1

You can see that the speed of the numbers is not constant. This is because we did not control for this.

Changing Font-Size Animation

The next example is more of the same. This time we simply make the size of a text change. TO do this you use the .text() method in your svg element. In addition, you now use the .styleTween() method. Inside this method we use the .interpolate method and set arguments for the font and font-size at the beginning and the end of the animation. Below is the code and the animation1

Spinning Shape Animation

The last example is somewhat more complicated. It involves create a shape that spins in place. To achieve this we do the following.

  1. Set the width and height of the element
  2. Set the svg element to the width and height
  3. Append a group element to  the svg element.
  4. Transform and translate the g element in order to move it
  5. Append a path to the g element
  6. Set the shape to a diamond using the .symbol(), .type(), and .size() methods.
  7. Set the color of the shape using .style()
  8. Set the .each() method to follow the cycle function
  9. Create the cycle function
  10. Set the .transition(), and .duration() methods
  11. Use the .attrTween() and use the .interpolateString() method to set the rotation of the spinning.
  12. Finish with the .each() method

Below is the code followed by the animation.

1

This animation never stops because we are using a cycle.

Conclusion

Animations can be a lot of fun when using d3.js. The examples here may not be the most practical, but they provide you with an opportunity to look at the code and decide how you will want to use d3.js in the future.

Adding labels to Graphs D3.js

In this post, we will look at how to add the following to a bar graph using d3.js.

  • Labels
  • Margins
  • Axes

Before we begin, you need the initial code that has a bar graph already created. This is shown below follow by what it should look like before we make any changes.

1

1

The first change is in line 16-19. Here, we change the name of the variable and modify the type of element it creates.

1.png

Our next change begins at line 27 and continues until line 38. Here we make two changes. First, we make a variable called barGroup, which selects all the group elements of the variable g. We also use the data, enter, append and attr methods. Starting in line 33 and continuing until line 38 we use the append method on our new variable barGroup to add rect elements as well as the color and size of each bar. Below is the code.

1.png

The last step for adding text appears in lines 42-50. First, we make a variable called textTranslator to move our text. Then we append the text to the bargroup variable. The color, font type, and font size are all set in the code below followed by a visual of what our graph looks like now.

12

Margin

Margins serve to provide spacing in a graph. This is especially useful if you want to add axes. The changes in the code take place in lines 16-39 and include an extensive reworking of the code. In lines 16-20 we create several variables that are used for calculating the margins and the size and shape of the svg element. In lines 22-30 we set the attributes for the svg variable. In line 32-34 we add a group element to hold the main parts of the graph. Lastly, in lines 36-40 we add a gray background for effect. Below is the code followed by our new graph. 1.png

1

Axes

In order for this to work, we have to change the value for the variable maxValue to 150. This would give a little more space at the top of the graph. The code for the axis goes form line 74 to line 98.

  • Line 74-77 we create variables to set up the axis so that it is on the left
  • Lines 78-85 we create two more variables that set the scale and the orientation of the axis
  • Lines 87-99 sets the visual characteristics of the axis.

Below is the code followed by the updated graph

12

You can see the scale off to the left as planned..

Conclusion

Make bar graphs is a basic task for d3.js. Although the code can seem cumbersome to people who do not use JavaScript. The ability to design visuals like this often outweighs the challenges.

Defining Terms in Debates

Defining terms in debates is an important part of the process that can be tricky at times. In this post, we will look at three criteria to consider when dealing with terms in debates. Below are the three criteria

  • When to define
  • What to define
  • How to define

When to Define

Definitions are almost always giving at the beginning of the debate. This is cause it helps to set up limits about what is discussed. It also makes it clear what the issue and potential propositions are.

Some debates focus exclusively on just defining terms. For example, highly controversial ideas such as abortion, non-traditional marriage, etc. Often the focus is just on such definitions as when does life beginning, or what is marriage? Defining terms helps to remove the fuzziness of the controversy and to focus on the exchange of ideas.

What to Define

It is not always clear what needs to be defined when staring a debate. Consider the following proposition of value

Resolved: That  playing videos games is detrimental to the development of children

Here are just a few things that may need to be  defined.

  • Video games: Does this refer to online, mobile, or console games? What about violent vs non-violent? Do educational games also fall into this category as well?
  • Development: What kind of development? Is this referring to emotional, physical, social or some other form of development
  • Children: Is this referring only to small children (0-6), young children (7-12) or teenagers?

These are just some of the questions to consider when trying to determine what to define. Again this is important because the affirmative may be arguing that videos are bad for small children but not for teenagers while the negative may be preparing a debate for the opposite.

How to Define

There are several ways to define a term below are just a few examples of how to do this.

Common Usage

Common usage is the everyday meaning of the term. For example,

We define children as individuals who are under the age of 18

This is clear and simple

Example

Example definitions give an example of the term to illustrate it as shown below.

An example of a video game would be PlayerUnknwon’s Battleground

This provides a context of the type of video games the debate may focus one

Operation

An operational definition is a working definition limited to the specific context. For example,

Video games for us is any game that is played on an electronic device

Fex define video games like this but this is an example.

Authority

Authority is a term that is defined by an expert.

According to technopedia, a video game is…..

Authority uses their experiences and knowledge to set what a term means and this can be used by debaters.

Negation

Negation is defining a word by what it is not. For example,

When we speak of video games we are not talking about educational games such as Oregon Trail. Rather, we are speaking of violent games such as Grand Theft Auto

The contrast between the types of games here is what the debater is using to define their term.

Conclusion

Defining terms is part of debating. Debaters need to be trained to understand the importance of this so that they can enhance their communication and persuasion.

Making Bar Graphs with D3.js

This post will provide an example of how to make a basic bar graph using d3.js. Visualizing data is important and developing bar graphs in one way to communicate information efficiently.

This post has the following steps

  1. Initial Template
  2. Enter the data
  3. Setup for the bar graphs
  4. Svg element
  5. Positioning
  6. Make the bar graph

Initial Template

Below is what the initial code should look like.

1

Entering the Data

For the data we will hard code it into the script using an array. This is not the most optimal way of doing this but it is the simplest for a first time experience.  This code is placed inside the second script element. Below is a picture.

1

The new code is in lines 10-11 save as the variable data.

Setup for the Bars in the Graph

We are now going to create three variables. Each is explained below

  • The barWidth variable will indicate ho wide the bars should be for the graph
  • barPadding variable will put space between the bars in the graph. If this is set to 0 it would make a histogram
  • The variable maxValue scales the height of the bars relative to the largest observation in the array. This variable uses the method .max() to find the largest value.

Below is the code for these three variables

1

The new information was added in lines 13-14

SVG Element

We can now begin to work with the svg element. We are going to create another variable called mainGroup. This will assign the svg element inside the body element using the .select() method. We will append the svg using .append and will set the width and height using .attr. Lastly, we will append a group element inside the svg so that all of our bars are inside the group that is inside the svg element.

The code is getting longer, so we will only show the new additions in the pictures with a reference to older code. Below is the new code in lines 16-19 directly  under the maxValue variable.

1

Positioning

New=x we need to make three functions.

  • The first function will calculate the x location of the bar graph
  • The second function  will calculate the y location of the bar graph
  • The last function will combine the work of the first two functions to place the bar in the proper x,y coordinate in the svg element.

Below is the code for the three functions. These are added in lines 21-251

The xloc function starts in the bottom left of the mainGroup element and adds the barWidth plus the barPadding to make the next bar. The yloc function starts in the top left and subtracts the maxValue from the given data point to calculate the y position. Lastly, the translator combines the output of both the xloc and the yloc functions to position bar using the translate method.

Making the Graph

We can now make our graph. We will use our mainGroup variable with the .selectAll method with the rect argument inside. Next, we use .data(data) to add the data, .enter() to update the element, .append(“rect”) to add the rectangles. Lastly, we use .attr() to set the color, transformation, and height of the bars. Below is the code in lines 27-36 followed by actual bar graph. 1.png

1

The graph is complete but you can see that there is a lot of work that needs to be done in order to improve it. However, that will be done in a future post.

Quadratic Discriminant Analysis with Python

Quadratic discriminant analysis allows for the classifier to assess non -linear relationships. This of course something that linear discriminant analysis is not able to do. This post will go through the steps necessary to complete a qda analysis using Python. The steps that will be conducted are as follows

  1. Data preparation
  2. Model training
  3. Model testing

Our goal will be to predict the gender of examples in the “Wages1” dataset using the available independent variables.

Data Preparation

We will begin by first loading the libraries we will need

import pandas as pd
from pydataset import data
import matplotlib.pyplot as plt
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis as QDA
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import (confusion_matrix,accuracy_score)
import seaborn as sns
from matplotlib.colors import ListedColormap

Next, we will load our data “Wages1” it comes from the “pydataset” library. After loading the data, we will use the .head() method to look at it briefly.

1

We need to transform the variable ‘sex’, our dependent variable, into a dummy variable using numbers instead of text. We will use the .getdummies() method to make the dummy variables and then add them to the dataset using the .concat() method. The code for this is below.

In the code below we have the histogram for the continuous independent variables.  We are using the .distplot() method from seaborn to make the histograms.

fig = plt.figure()
fig, axs = plt.subplots(figsize=(15, 10),ncols=3)
sns.set(font_scale=1.4)
sns.distplot(df['exper'],color='black',ax=axs[0])
sns.distplot(df['school'],color='black',ax=axs[1])
sns.distplot(df['wage'],color='black',ax=axs[2])

1

The variables look reasonable normal. Below is the proportions of the categorical dependent variable.

round(df.groupby('sex').count()/3294,2)
Out[247]: 
exper school wage female male
sex 
female 0.48 0.48 0.48 0.48 0.48
male 0.52 0.52 0.52 0.52 0.52

About half male and half female.

We will now make the correlational matrix

corrmat=df.corr(method='pearson')
f,ax=plt.subplots(figsize=(12,12))
sns.set(font_scale=1.2)
sns.heatmap(round(corrmat,2),
vmax=1.,square=True,
cmap="gist_gray",annot=True)

1

There appears to be no major problems with correlations. The last thing we will do is set up our train and test datasets.

X=df[['exper','school','wage']]
y=df['male']
X_train,X_test,y_train,y_test=train_test_split(X,y,
test_size=.2, random_state=50)

We can now move to model development

Model Development

To create our model we will instantiate an instance of the quadratic discriminant analysis function and use the .fit() method.

qda_model=QDA()
qda_model.fit(X_train,y_train)

There are some descriptive statistics that we can pull from our model. For our purposes, we will look at the group means  Below are the  group means.

exper school wage
Female 7.73 11.84 5.14
Male 8.28 11.49 6.38

You can see from the table that mean generally have more experience, higher wages, but slightly less education.

We will now use the qda_model we create to predict the classifications for the training set. This information will be used to make a confusion matrix.

cm = confusion_matrix(y_train, y_pred)
ax= plt.subplots(figsize=(10,10))
sns.set(font_scale=3.4)
with sns.axes_style('white'):
sns.heatmap(cm, cbar=False, square=True, annot=True, fmt='g',
cmap=ListedColormap(['gray']), linewidths=2.5)

1

The information in the upper-left corner are the number of people who were female and correctly classified as female. The lower-right corner is for the men who were correctly classified as men. The upper-right corner is females who were classified as male. Lastly, the lower-left corner is males who were classified as females. Below is the actually accuracy of our model

round(accuracy_score(y_train, y_pred),2)
Out[256]: 0.6

Sixty percent accuracy is not that great. However, we will now move to model testing.

Model Testing

Model testing involves using the .predict() method again but this time with the testing data. Below is the prediction with the confusion matrix.

 y_pred=qda_model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
from matplotlib.colors import ListedColormap
ax= plt.subplots(figsize=(10,10))
sns.set(font_scale=3.4)
with sns.axes_style('white'):
sns.heatmap(cm, cbar=False, square=True,annot=True,fmt='g',
cmap=ListedColormap(['gray']),linewidths=2.5)

1

The results seem similar. Below is the accuracy.

round(accuracy_score(y_test, y_pred),2)
Out[259]: 0.62

About the same, our model generalizes even though it performs somewhat poorly.

Conclusion

This post provided an explanation of how to do a quadratic discriminant analysis using python. This is just another potential tool that may be useful for the data scientist.

Data Exploration Case Study: Credit Default

Exploratory data analysis is the main task of a Data Scientist with as much as 60% of their time being devoted to this task. As such, the majority of their time is spent on something that is rather boring compared to building models.

This post will provide a simple example of how to analyze a dataset from the website called Kaggle. This dataset is looking at how is likely to default on their credit. The following steps will be conducted in this analysis.

  1. Load the libraries and dataset
  2. Deal with missing data
  3. Some descriptive stats
  4. Normality check
  5. Model development

This is not an exhaustive analysis but rather a simple one for demonstration purposes. The dataset is available here

Load Libraries and Data

Here are some packages we will need

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
from sklearn import tree
from scipy import stats
from sklearn import metrics

You can load the data with the code below

df_train=pd.read_csv('/application_train.csv')

You can examine what variables are available with the code below. This is not displayed here because it is rather long

df_train.columns
df_train.head()

Missing Data

I prefer to deal with missing data first because missing values can cause errors throughout the analysis if they are not dealt with immediately. The code below calculates the percentage of missing data in each column.

total=df_train.isnull().sum().sort_values(ascending=False)
percent=(df_train.isnull().sum()/df_train.isnull().count()).sort_values(ascending=False)
missing_data=pd.concat([total,percent],axis=1,keys=['Total','Percent'])
missing_data.head()
 
                           Total   Percent
COMMONAREA_MEDI           214865  0.698723
COMMONAREA_AVG            214865  0.698723
COMMONAREA_MODE           214865  0.698723
NONLIVINGAPARTMENTS_MODE  213514  0.694330
NONLIVINGAPARTMENTS_MEDI  213514  0.694330

Only the first five values are printed. You can see that some variables have a large amount of missing data. As such, they are probably worthless for inclusion in additional analysis. The code below removes all variables with any missing data.

pct_null = df_train.isnull().sum() / len(df_train)
missing_features = pct_null[pct_null > 0.0].index
df_train.drop(missing_features, axis=1, inplace=True)

You can use the .head() function if you want to see how  many variables are left.

Data Description & Visualization

For demonstration purposes, we will print descriptive stats and make visualizations of a few of the variables that are remaining.

round(df_train['AMT_CREDIT'].describe())
Out[8]: 
count     307511.0
mean      599026.0
std       402491.0
min        45000.0
25%       270000.0
50%       513531.0
75%       808650.0
max      4050000.0

sns.distplot(df_train['AMT_CREDIT']

1.png

round(df_train['AMT_INCOME_TOTAL'].describe())
Out[10]: 
count       307511.0
mean        168798.0
std         237123.0
min          25650.0
25%         112500.0
50%         147150.0
75%         202500.0
max      117000000.0
sns.distplot(df_train['AMT_INCOME_TOTAL']

1.png

I think you are getting the point. You can also look at categorical variables using the groupby() function.

We also need to address categorical variables in terms of creating dummy variables. This is so that we can develop a model in the future. Below is the code for dealing with all the categorical  variables and converting them to dummy variable’s

df_train.groupby('NAME_CONTRACT_TYPE').count()
dummy=pd.get_dummies(df_train['NAME_CONTRACT_TYPE'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['NAME_CONTRACT_TYPE'],axis=1)

df_train.groupby('CODE_GENDER').count()
dummy=pd.get_dummies(df_train['CODE_GENDER'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['CODE_GENDER'],axis=1)

df_train.groupby('FLAG_OWN_CAR').count()
dummy=pd.get_dummies(df_train['FLAG_OWN_CAR'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['FLAG_OWN_CAR'],axis=1)

df_train.groupby('FLAG_OWN_REALTY').count()
dummy=pd.get_dummies(df_train['FLAG_OWN_REALTY'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['FLAG_OWN_REALTY'],axis=1)

df_train.groupby('NAME_INCOME_TYPE').count()
dummy=pd.get_dummies(df_train['NAME_INCOME_TYPE'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['NAME_INCOME_TYPE'],axis=1)

df_train.groupby('NAME_EDUCATION_TYPE').count()
dummy=pd.get_dummies(df_train['NAME_EDUCATION_TYPE'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['NAME_EDUCATION_TYPE'],axis=1)

df_train.groupby('NAME_FAMILY_STATUS').count()
dummy=pd.get_dummies(df_train['NAME_FAMILY_STATUS'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['NAME_FAMILY_STATUS'],axis=1)

df_train.groupby('NAME_HOUSING_TYPE').count()
dummy=pd.get_dummies(df_train['NAME_HOUSING_TYPE'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['NAME_HOUSING_TYPE'],axis=1)

df_train.groupby('ORGANIZATION_TYPE').count()
dummy=pd.get_dummies(df_train['ORGANIZATION_TYPE'])
df_train=pd.concat([df_train,dummy],axis=1)
df_train=df_train.drop(['ORGANIZATION_TYPE'],axis=1)

You have to be careful with this because now you have many variables that are not necessary. For every categorical variable you must remove at least one category in order for the model to work properly.  Below we did this manually.

df_train=df_train.drop(['Revolving loans','F','XNA','N','Y','SK_ID_CURR,''Student','Emergency','Lower secondary','Civil marriage','Municipal apartment'],axis=1)

Below are some boxplots with the target variable and other variables in the dataset.

f,ax=plt.subplots(figsize=(8,6))
fig=sns.boxplot(x=df_train['TARGET'],y=df_train['AMT_INCOME_TOTAL'])

1.png

There is a clear outlier there. Below is another boxplot with a different variable

f,ax=plt.subplots(figsize=(8,6))
fig=sns.boxplot(x=df_train['TARGET'],y=df_train['CNT_CHILDREN'])

2

It appears several people have more than 10 children. This is probably a typo.

Below is a correlation matrix using a heatmap technique

corrmat=df_train.corr()
f,ax=plt.subplots(figsize=(12,9))
sns.heatmap(corrmat,vmax=.8,square=True)

1.png

The heatmap is nice but it is hard to really appreciate what is happening. The code below will sort the correlations from least to strongest, so we can remove high correlations.

c = df_train.corr().abs()

s = c.unstack()
so = s.sort_values(kind="quicksort")
print(so.head())

FLAG_DOCUMENT_12 FLAG_MOBIL 0.000005
FLAG_MOBIL FLAG_DOCUMENT_12 0.000005
Unknown FLAG_MOBIL 0.000005
FLAG_MOBIL Unknown 0.000005
Cash loans FLAG_DOCUMENT_14 0.000005

The list is to long to show here but the following variables were removed for having a high correlation with other variables.

df_train=df_train.drop(['WEEKDAY_APPR_PROCESS_START','FLAG_EMP_PHONE','REG_CITY_NOT_WORK_CITY','REGION_RATING_CLIENT','REG_REGION_NOT_WORK_REGION'],axis=1)

Below we check a few variables for homoscedasticity, linearity, and normality  using plots and histograms

sns.distplot(df_train['AMT_INCOME_TOTAL'],fit=norm)
fig=plt.figure()
res=stats.probplot(df_train['AMT_INCOME_TOTAL'],plot=plt)

12

This is not normal

sns.distplot(df_train['AMT_CREDIT'],fit=norm)
fig=plt.figure()
res=stats.probplot(df_train['AMT_CREDIT'],plot=plt)

12

This is not normal either. We could do transformations, or we can make a non-linear model instead.

Model Development

Now comes the easy part. We will make a decision tree using only some variables to predict the target. In the code below we make are X and y dataset.

X=df_train[['Cash loans','DAYS_EMPLOYED','AMT_CREDIT','AMT_INCOME_TOTAL','CNT_CHILDREN','REGION_POPULATION_RELATIVE']]
y=df_train['TARGET']

The code below fits are model and makes the predictions

clf=tree.DecisionTreeClassifier(min_samples_split=20)
clf=clf.fit(X,y)
y_pred=clf.predict(X)

Below is the confusion matrix followed by the accuracy

print (pd.crosstab(y_pred,df_train['TARGET']))
TARGET       0      1
row_0                
0       280873  18493
1         1813   6332
accuracy_score(y_pred,df_train['TARGET'])
Out[47]: 0.933966589813047

Lastly, we can look at the precision, recall, and f1 score

print(metrics.classification_report(y_pred,df_train['TARGET']))
              precision    recall  f1-score   support

           0       0.99      0.94      0.97    299366
           1       0.26      0.78      0.38      8145

   micro avg       0.93      0.93      0.93    307511
   macro avg       0.62      0.86      0.67    307511
weighted avg       0.97      0.93      0.95    307511

This model looks rather good in terms of accuracy of the training set. It actually impressive that we could use so few variables from such a large dataset and achieve such a high degree of accuracy.

Conclusion

Data exploration and analysis is the primary task of a data scientist.  This post was just an example of how this can be approached. Of course, there are many other creative ways to do this but the simplistic nature of this analysis yielded strong results

Teaching Materials

Regardless of what level a teacher teaches at you are always looking or materials and activities to do in your class. It is always a challenge to have new ideas and activities to support and help students because the world changes. This leads to an constant need to remove old inefficient activities and bring in new fresh one. The primary purpose of  activities is to provide practical application of the skills taught and experience in school.

For the math teacher you can naturally make your own math problems. However, this can quickly become quietly. One solution to this is to employed other worksheets that provide growth opportunities for the students with stressing out the teacher.

There are many great websites for this. For example, education.com provides many different types of worksheets to help students. They have some great simple math worksheets like the ones below

addition_outer space_answers

addition_outer space

There  are many more resources available at education.com  as well as other sites. There is no purpose or benefit to reinventing the wheel. The incorporation of the assignments of others is a great way to expand the resources you have available without the stress of developing them yourself.

Data Science Pipeline

One of the challenges of conducting a data analysis or any form of research is making decisions. You have to decide primarily two things

  1. What to do
  2. When to do it

People who are familiar with statistics may know what to do but may struggle with timing or when to do it. Others who are weaker when it comes to numbers may not know what to do or when to do it. Generally, it is rare for someone to know when to do something but not know how to do it.

In this post, we will look at a process that that can be used to perform an analysis in the context of data science. Keep in mind that this is just an example and there are naturally many ways to perform an analysis. The purpose here is to provide some basic structure for people who are not sure of what to do and when. One caveat, this process is focused primarily on supervised learning which has a clearer beginning, middle, and end in terms of the process.

Generally, there are three steps that probably always take place when conducting a data analysis and they are as follows.

  1. Data preparation (data mugging)
  2. Model training
  3. Model testing

Off course, it is much more complicated than this but this is the minimum. Within each of these steps there are several substeps, However, depending on the context, the substeps can be optional.

There is one pre-step that you have to consider. How you approach these three steps depends a great deal on the algorithm(s) you have in mind to use for developing different models. The assumptions and characteristics of one algorithm are different from another and shape how you prepare the data and develop models. With this in mind, we will go through each of these three steps.

Data Preparation

Data preparation involves several substeps. Some of these steps are necessary but general not all of them happen ever analysis. Below is a list of steps at this level

  • Data mugging
  • Scaling
  • Normality
  • Dimension reduction/feature extraction/feature selection
  • Train, test, validation split

Data mugging is often the first step in data preparation and involves making sure your data is in a readable structure for your algorithm. This can involve changing the format of dates, removing punctuation/text, changing text into dummy variables or factors, combining tables, splitting tables, etc. This is probably the hardest and most unclear aspect of data science because the problems you will face will be highly unique to the dataset you are working with.

Scaling involves making sure all the variables/features are on the same scale. This is important because most algorithms are sensitive to the scale of the variables/features. Scaling can be done through normalization or standardization. Normalization reduces the variables to a range of 0 – 1. Standardization involves converting the examples in the variable to their respective z-score. Which one you use depends on the situation but normally it is expected to do this.

Normality is often an optional step because there are so many variables that can be involved with big data and data science in a given project. However, when fewer variables are involved checking for normality is doable with a few tests and some visualizations. If normality is violated various transformations can be used to deal with this problem. Keep mind that many machine learning algorithms are robust against the influence of non-normal data.

Dimension reduction involves reduce the number of variables that will be included in the final analysis. This is done through factor analysis or principal component analysis. This reduction  in the number of variables is also an example of feature extraction. In some context, feature extraction is the in goal in itself. Some algorithms make their own features such as neural networks through the use of hidden layer(s)

Feature selection is the process of determining which variables to keep for future analysis. This can be done through the use of regularization such or in smaller datasets with subset regression. Whether you extract or select features depends on the context.

After all this is accomplished, it is necessary to split the dataset. Traditionally, the data was split in two. This led to the development of a training set and a testing set. You trained the model on the training set and tested the performance on the test set.

However, now many analyst split the data into three parts to avoid overfitting the data to the test set. There is now a training a set, a validation set, and a testing set. The  validation set allows you to check the model performance several times. Once you are satisfied you use the test set once at the end.

Once the data is prepared, which again is perhaps the most difficult part, it is time to train the model.

Model training

Model training involves several substeps as well

  1. Determine the metric(s) for success
  2. Creating a grid of several hyperparameter values
  3. Cross-validation
  4. Selection of the most appropriate hyperparameter values

The first thing you have to do and this is probably required is determined how you will know if your model is performing well. This involves selecting a metric. It can be accuracy for classification or mean squared error for a regression model or something else. What you pick depends on your goals. You use these metrics to determine the best algorithm and hyperparameters settings.

Most algorithms have some sort of hyperparameter(s). A hyperparameter is a value or estimate that the algorithm cannot learn and must be set by you. Since there is no way of knowing what values to select it is common practice to have several values tested and see which one is the best.

Cross-validation is another consideration. Using cross-validation always you to stabilize the results through averaging the results of the model over several folds of the data if you are using k-folds cross-validation. This also helps to improve the results of the hyperparameters as well.  There are several types of cross-validation but k-folds is probably best initially.

The information for the metric, hyperparameters, and cross-validation are usually put into  a grid that then runs the model. Whether you are using R or Python the printout will tell you which combination of hyperparameters is the best based on the metric you determined.

Validation test

When you know what your hyperparameters are you can now move your model to validation or straight to testing. If you are using a validation set you asses your models performance by using this new data. If the results are satisfying based on your metric you can move to testing. If not, you may move back and forth between training and the validation set making the necessary adjustments.

Test set

The final step is testing the model. You want to use the testing dataset as little as possible. The purpose here is to see how your model generalizes to data it has not seen before. There is little turning back after this point as there is an intense danger of overfitting now. Therefore, make sure you are ready before playing with the test data.

Conclusion

This is just one approach to conducting data analysis. Keep in mind the need to prepare data, train your model, and test it. This is the big picture for a somewhat complex process

Bagging Classification with Python

Bootstrap aggregation aka bagging is a technique used in machine learning that relies on resampling from the sample and running multiple models from the different samples. The mean or some other value is calculated from the results of each model. For example, if you are using Decisions trees, bagging would have you run the model several times with several different subsamples to help deal with variance in statistics.

Bagging is an excellent tool for algorithms that are considered weaker or more susceptible to variances such as decision trees or KNN. In this post, we will use bagging to develop a model that determines whether or not people voted using the turnout dataset. These results will then be compared to a model that was developed in a traditional way.

We will use the turnout dataset available in the pydataset module. Below is some initial code.

from pydataset import data
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report

We will load our dataset. Then we will separate the independnet and dependent variables from each other and create our train and test sets. The code is below.

df=data("turnout")
X=df[['age','educate','income',]]
y=df['vote']
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3,random_state=0)

We can now prepare to run our model. We need to first set up the bagging function. There are several arguments that need to be set. The max_samples argument determines the largest amount of the dataset to use in resampling. The max_features argument is the max number of features to use in a sample. Lastly, the n_estimators is for determining the number of subsamples to draw. The code is as follows

 h=BaggingClassifier(KNeighborsClassifier(n_neighbors=7),max_samples=0.7,max_features=0.7,n_estimators=1000)

Basically, what we told python was to use up to 70% of the samples, 70% of the features, and make 100 different KNN models that use seven neighbors to classify. Now we run the model with the fit function, make a prediction with the predict function, and check the accuracy with the classificarion_reoirts function.

h.fit(X_train,y_train)
y_pred=h.predict(X_test)
print(classification_report(y_test,y_pred))

1

This looks oka below are the results when you do a traditional model without bagging

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=.3,random_state=0)
clf=KNeighborsClassifier(7)
clf.fit(X_train,y_train)
y_pred=clf.predict(X_test)
print(classification_report(y_test,y_pred))

1

The improvement is not much. However, this depends on the purpose and scale of your project. A small improvement can mean millions in the reight context such as for large companies such as Google who deal with billions of people per day.

Conclusion

This post provides an example of the use of bagging in the context of classification. Bagging provides a why to improve your model through the use of resampling.

Support Vector Machines Classification with Python

Support vector machines (SVM) is an algorithm used to fit non-linear models. The details are complex but to put it simply  SVM tries to create the largest boundaries possible between the various groups it identifies in the sample. The mathematics behind this is complex especially if you are unaware of what a vector is as defined in algebra.

This post will provide an example of SVM using Python broken into the following steps.

  1. Data preparation
  2. Model Development

We will use two different kernels in our analysis. The linear kernel and he rbf kernel. The difference in terms of kernels has to do with how the boundaries between the different groups are made.

Data Preparation

We are going to use the OFP dataset available in the pydataset module. We want to predict if someone single or not. Below is some initial code.

import numpy as np
import pandas as pd
from pydataset import data
from sklearn import svm
from sklearn.metrics import classification_report
from sklearn import model_selection

We now need to load our dataset and remove any missing values.

df=pd.DataFrame(data('OFP'))
df=df.dropna()
df.head()

1

Looking at the dataset we need to do something with the variables that have text. We will create dummy variables for all except region and hlth. The code is below.

dummy=pd.get_dummies(df['black'])
df=pd.concat([df,dummy],axis=1)
df=df.rename(index=str, columns={"yes": "black_person"})
df=df.drop('no', axis=1)

dummy=pd.get_dummies(df['sex'])
df=pd.concat([df,dummy],axis=1)
df=df.rename(index=str, columns={"male": "Male"})
df=df.drop('female', axis=1)

dummy=pd.get_dummies(df['employed'])
df=pd.concat([df,dummy],axis=1)
df=df.rename(index=str, columns={"yes": "job"})
df=df.drop('no', axis=1)

dummy=pd.get_dummies(df['maried'])
df=pd.concat([df,dummy],axis=1)
df=df.rename(index=str, columns={"no": "single"})
df=df.drop('yes', axis=1)

dummy=pd.get_dummies(df['privins'])
df=pd.concat([df,dummy],axis=1)
df=df.rename(index=str, columns={"yes": "insured"})
df=df.drop('no', axis=1)

For each variable, we did the following

  1. Created a dummy in the dummy dataset
  2. Combined the dummy variable with our df dataset
  3. Renamed the dummy variable based on yes or no
  4. Drop the other dummy variable from the dataset. Python creates two dummies instead of one.

If you look at the dataset now you will see a lot of variables that are not necessary. Below is the code to remove the information we do not need.

df=df.drop(['black','sex','maried','employed','privins','medicaid','region','hlth'],axis=1)
df.head()

1

This is much cleaner. Now we need to scale the data. This is because SVM is sensitive to scale. The code for doing this is below.

df = (df - df.min()) / (df.max() - df.min())
df.head()

1

We can now create our dataset with the independent variables and a separate dataset with our dependent variable. The code is as follows.

X=df[['ofp','ofnp','opp','opnp','emr','hosp','numchron','adldiff','age','school','faminc','black_person','Male','job','insured']]
y=df['single']

We can now move to model development

Model Development

We need to make our test and train sets first. We will use a 70/30 split.

X_train,X_test,y_train,y_test=model_selection.train_test_split(X,y,test_size=.3,random_state=1)

Now, we need to create the models or the hypothesis we want to test. We will create two hypotheses. The first model is using a linear kernel and the second is one using the rbf kernel. For each of these kernels, there are hyperparameters that need to be set which you will see in the code below.

h1=svm.LinearSVC(C=1)
h2=svm.SVC(kernel='rbf',degree=3,gamma=0.001,C=1.0)

The details about the hyperparameters are beyond the scope of this post. Below are the results for the first model.

1.png

The overall accuracy is 73%. The crosstab() function provides a breakdown of the results and the classification_report() function provides other metrics related to classification. In this situation, 0 means not single or married while 1 means single. Below are the results for model 2

1.png

You can see the results are similar with the first model having a slight edge. The second model really struggls with predicting people who are actually single. You can see thtat the recall in particular is really poor.

Conclusion

This post provided how to ob using SVM in python. How this algorithm works can be somewhat confusing. However, its use can be powerful if use appropriately.

Multiple Regression in Python

In this post, we will go through the process of setting up and a regression model with a training and testing set using Python. We will use the insurance dataset from kaggle. Our goal will be to predict charges. In this analysis, the following steps will be performed.

  1. Data preparation
  2. Model training
  3. model testing

Data Preparation

Below is a list of the modules we will need in order to complete the analysis

import matplotlib.pyplot as plt
import pandas as pd
from sklearn import linear_model,model_selection, feature_selection,preprocessing
import statsmodels.formula.api as sm
from statsmodels.tools.eval_measures import mse
from statsmodels.tools.tools import add_constant
from sklearn.metrics import mean_squared_error

After you download the dataset you need to load it and take a look at it. You will use the  .read_csv function from pandas to load the data and .head() function to look at the data. Below is the code and the output.

insure=pd.read_csv('YOUR LOCATION HERE')

1.png

We need to create some dummy variables for sex, smoker, and region. We will address that in a moment, right now we will look at descriptive stats for our continuous variables. We will use the .describe() function for descriptive stats and the .corr() function to find the correlations.

1.png

The descriptives are left for your own interpretation. As for the correlations, they are generally weak which is an indication that regression may be appropriate.

As mentioned earlier, we need to make dummy variables sex, smoker, and region in order to do the regression analysis. To complete this we need to do the following.

  1. Use the pd.get_dummies function from pandas to create the dummy
  2. Save the dummy variable in an object called ‘dummy’
  3. Use the pd.concat function to add our new dummy variable to our ‘insure’ dataset
  4. Repeat this three times

Below is the code for doing this

dummy=pd.get_dummies(insure['sex'])
insure=pd.concat([insure,dummy],axis=1)
dummy=pd.get_dummies(insure['smoker'])
insure=pd.concat([insure,dummy],axis=1)
dummy=pd.get_dummies(insure['region'])
insure=pd.concat([insure,dummy],axis=1)
insure.head()

1.png

The .get_dummies function requires the name of the dataframe and in the brackets the name of the variable to convert. The .concat function requires the name of the two datasets to combine as well the axis on which to perform it.

We now need to remove the original text variables from the dataset. In addition, we need to remove the y variable “charges” because this is the dependent variable.

y = insure.charges
insure=insure.drop(['sex', 'smoker','region','charges'], axis=1)

We can now move to model development.

Model Training

Are train and test sets are model with the model_selection.trainin_test_split function. We will do an 80-20 split of the data. Below is the code.

X_train, X_test, y_train, y_test = model_selection.train_test_split(insure, y, test_size=0.2)

In this single line of code, we create a train and test set of our independent variables and our dependent variable.

We can not run our regression analysis. This requires the use of the .OLS function from statsmodels module. Below is the code.

answer=sm.OLS(y_train, add_constant(X_train)).fit()

In the code above inside the parentheses, we put the dependent variable(y_train) and the independent variables (X_train). However, we had to use the function add_constant to get the intercept for the output. All of this information is then used inside the .fit() function to fit a model.

To see the output you need to use the .summary() function as shown below.

answer.summary()

1.png

The assumption is that you know regression but our reading this post to learn python. Therefore, we will not go into great detail about the results. The r-square is strong, however, the region and gender are not statistically significant.

We will now move to model testing

Model Testing

Our goal here is to take the model that we developed and see how it does on other data. First, we need to predict values with the model we made with the new data. This is shown in the code below

ypred=answer.predict(add_constant(X_test))

We use the .predict() function for this action and we use the X_test data as well. With this information, we will calculate the mean squared error. This metric is useful for comparing models. We only made one model so it is not that useful in this situation. Below is the code and results.

print(mse(ypred,y_test))
33678660.23480476

For our final trick, we will make a scatterplot with the predicted and actual values of the test set. In addition, we will calculate the correlation of the predict values and test set values. This is an alternative metric for assessing a model.

1.png

You can see the first two lines are for making the plot. Lines 3-4 are for making the correlation matrix and involves the .concat() function. The correlation is high at 0.86 which indicates the model is good at accurately predicting the values. THis is confirmed with the scatterplot which is almost a straight line.

Conclusion

IN this post we learned how to do a regression analysis in Python. We prepared the data, developed a model, and tested a model with an evaluation of it.

Working with a Dataframe in Python

In this post, we will learn to do some basic exploration of a dataframe in Python. Some of the task we will complete include the following…

  • Import data
  • Examine data
  • Work with strings
  • Calculating descriptive statistics

Import Data 

First, you need data, therefore, we will use the Titanic dataset, which is readily available on the internet. We will need to use the pd.read_csv() function from the pandas package. This means that we must also import pandas. Below is the code.

import pandas as pd
df=pd.read_csv('FILE LOCATION HERE')

In the code above we imported pandas as pd so we can use the functions within it. Next, we create an object called ‘df’. Inside this object, we used the pd.read_csv() function to read our file into the system. The location of the file needs to type in quotes inside the parentheses. Having completed this we can now examine the data.

Data Examination

Now we want to get an idea of the size of our dataset, any problems with missing. To determine the size we use the .shape function as shown below.

df.shape
Out[33]: (891, 12)

Results indicate that we have 891 rows and 12 columns/variables. You can view the whole dataset by typing the name of the dataframe “df” and pressing enter. If you do this you may notice there are a lot of NaN values in the “Cabin” variable. To determine exactly how many we can use is.null() in combination with the values_count. variables.

df['Cabin'].isnull().value_counts()
Out[36]: 
True     687
False    204
Name: Cabin, dtype: int64

The code starts with the name of the dataframe. In the brackets, you put the name of the variable. After that, you put the functions you are using. Keep in mind that the order of the functions matters. You can see we have over 200 missing examples. For categorical varable, you can also see how many examples are part of each category as shown below.

df['Embarked'].value_counts()
Out[39]: 
S    644
C    168
Q     77
Name: Embarked, dtype: int64

This time we used our ‘Embarked’ variable. However, we need to address missing values before we can continue. To deal with this we will use the .dropna() function on the dataset. THen we will check the size of the dataframe again with the “shape” function.

df=df.dropna(how='any')
df.shape
Out[40]: (183, 12)

You can see our dataframe is much smaller going 891 examples to 183. We can now move to other operations such as dealing with strings.

Working with Strings

What you do with strings really depends or your goals. We are going to look at extraction, subsetting, determining the length. Our first step will be to extract the last name of the first five people. We will do this with the code below.

df['Name'][0:5].str.extract('(\w+)')
Out[44]: 
1 Cumings
3 Futrelle
6 McCarthy
10 Sandstrom
11 Bonnell
Name: Name, dtype: object

As you can see we got the last names of the first five examples. We did this by using the following format…

dataframe name[‘Variable Name’].function.function(‘whole word’))

.str is a function for dealing with strings in dataframes. The .extract() function does what its name implies.

If you want, you can even determine how many letters each name is. We will do this with the .str and .len() function on the first five names in the dataframe.

df['Name'][0:5].str.len()
Out[64]: 
1 51
3 44
6 23
10 31
11 24
Name: Name, dtype: int64

Hopefully, the code is becoming easier to read and understand.

Aggregation

We can also calculate some descriptive statistics. We will do this for the “Fare” variable. The code is repetitive in that only the function changes so we will run all of them at once. Below we are calculating the mean, max, minimum, and standard deviation  for the price of a fare on the Titanic

df['Fare'].mean()
Out[77]: 78.68246885245901

df['Fare'].max()
Out[78]: 512.32920000000001

df['Fare'].min()
Out[79]: 0.0

df['Fare'].std()
Out[80]: 76.34784270040574

Conclusion

This post provided you with some ways in which you can maneuver around a dataframe in Python.

Numpy Arrays in Python

In this post, we are going to explore arrays is created by the numpy package in Python. Understanding how arrays are created and manipulated is useful when you need to perform complex coding and or analysis. In particular, we will address the following,

  1. Creating and exploring arrays
  2. Math with arrays
  3. Manipulating arrays

Creating and Exploring an Array

Creating an array is simple. You need to import the numpy package and then use the np.array function to create the array. Below is the code.

import numpy as np
example=np.array([[1,2,3,4,5],[6,7,8,9,10]])

Making an array requires the use of square brackets. If you want multiple dimensions or columns than you must use inner square brackets. In the example above I made an array with two dimensions and each dimension has it’s own set of brackets.

Also, notice that we imported numpy as np. This is a shorthand so that we do not have to type the word numpy but only np. In addition, we now created an array with ten data points spread in two dimensions.

There are several functions you can use to get an idea of the size of a data set. Below is a list with the function and explanation.

  • .ndim = number of dimensions
  • .shape =  Shares the number of rows and columns
  • .size = Counts the number of individual data points
  • .dtype.name = Tells you the data structure type

Below is code that uses all four of these functions with our array.

example.ndim
Out[78]: 2

example.shape
Out[79]: (2, 5)

example.size
Out[80]: 10

example.dtype.name
Out[81]: 'int64'

You can see we have 2 dimensions. The .shape function tells us we have 2 dimensions and 5 examples in each one. The .size function tells us we have 10 total examples (5 * 2). Lastly, the .dtype.name function tells us that this is an integer data type.

Math with Arrays

All mathematical operations can be performed on arrays. Below are examples of addition, subtraction, multiplication, and conditionals.

example=np.array([[1,2,3,4,5],[6,7,8,9,10]]) 
example+2
Out[83]: 
array([[ 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12]])

example-2
Out[84]: 
array([[-1, 0, 1, 2, 3],
[ 4, 5, 6, 7, 8]])

example*2
Out[85]: 
array([[ 2, 4, 6, 8, 10],
[12, 14, 16, 18, 20]])

example<3
Out[86]: 
array([[ True, True, False, False, False],
[False, False, False, False, False]], dtype=bool)

Each number inside the example array was manipulated as indicated. For example, if we typed example + 2 all the values in the array increased by 2. Lastly, the example < 3 tells python to look inside the array and find all the values in the array that are less than 3.

Manipulating Arrays

There are also several ways you can manipulate or access data inside an array. For example, you can pull a particular element in an array by doing the following.

example[0,0]
Out[92]: 1

The information in the brackets tells python to access the first bracket and the first number in the bracket. Recall that python starts from 0. You can also access a range of values using the colon as shown below

example=np.array([[1,2,3,4,5],[6,7,8,9,10]]) 
example[:,2:4]
Out[96]: 
array([[3, 4],
[8, 9]])

In this example, the colon means take all values or dimension possible for finding numbers. This means to take columns 1 & 2. After the comma we have 2:4, this means take the 3rd and 4th value but not the 5th.

It is also possible to turn a multidimensional array into a single dimension with the .ravel() function and also to transpose with the transpose() function. Below is the code for each.

example=np.array([[1,2,3,4,5],[6,7,8,9,10]]) 
example.ravel()
Out[97]: array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

example.transpose()
Out[98]: 
array([[ 1, 6],
[ 2, 7],
[ 3, 8],
[ 4, 9],
[ 5, 10]])

You can see the .ravel function made a one-dimensional array. The .transpose broke the array into several more dimensions with two numbers each.

Conclusion

We now have a basic understanding of how numpy array work using python. As mention before, this is valuable information to understand when trying to wrestling with different data science questions.

Lists in Python

Lists allow you to organize information. In the real world, we make list all the time to keep track of things. This same concept applies in Python when making list. A list is a sequence of stored data. By sequence, it is mean a data structure that allows multiple items to exist in a single storage unit. By making list we are explaining to the computer how to store the data in the computer’s memory.

In this post, we learn the following about list

  • How to make a list
  • Accessing items in a list
  • Looping through a list
  • Modifying a list

Making a List

Making a list is not difficult at all. To make one you first create a variable name followed by the equal sign and then place your content inside square brackets. Below is an example of two different lists.

numList=[1,2,3,4,5]
alphaList=['a','b','c','d','e']
print(numList,alphaList)
[1, 2, 3, 4, 5] ['a', 'b', 'c', 'd', 'e']

Above we made two lists, a numeric and a character list. We then printed both. In general, you want your list to have similar items such as all numbers or all characters. This makes it easier to recall what is in them then if you mixed them. However, Python can handle mixed list as well.

Access a List

To access individual items in a list is the same as for a sting. Just employ brackets with the index that you want. Below are some examples.

numList=[1,2,3,4,5]
alphaList=['a','b','c','d','e']

numList[0]
Out[255]: 1

numList[0:3]
Out[256]: [1, 2, 3]

alphaList[0]
Out[257]: 'a'

alphaList[0:3]
Out[258]: ['a', 'b', 'c']

numList[0] gives us the first value in the list. numList[0:3] gives us the first three values. This is repeated with the alphaList as well.

Looping through a List

A list can be looped through as well. Below is a simple example.

for item in numList :
    print(item)


for item in alphaList :
    print(item)


1
2
3
4
5
a
b
c
d
e

By making the two for loops above we are able to print all of the items inside each list.

Modifying List

There are several functions for modifying lists. Below are a few

The append() function as a new item to the list

numList.append(9)
print(numList)
alphaList.append('h')
print(alphaList)

[1, 2, 3, 4, 5, 9]
['a', 'b', 'c', 'd', 'e', 'h']

You can see our lists new have one new member each at the end.

You can also remove the last member of a list with the pop() function.

numList.pop()
print(numList)
alphaList.pop()
print(alphaList)
[1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e']

By using the pop() function we have returned our lists back to there original size.

Another trick is to merge lists together with the extend() function. For this, we will merge the same list with its self. This will cause the list to have duplicates of all of its original values.

numList.extend(numList)
print(numList)
alphaList.extend(alphaList)
print(alphaList)

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
['a', 'b', 'c', 'd', 'e', 'a', 'b', 'c', 'd', 'e']

All the values in each list have been duplicated. Finally, you can sort a list using the sort() function.

numList.sort()
print(numList)
alphaList.sort()
print(alphaList)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
['a', 'a', 'b', 'b', 'c', 'c', 'd', 'd', 'e', 'e']

Now all the numbers and letters are sorted.

Conclusion

THere is way more that could be done with lists. However, the purpose here was just to cover some of the basic ways that list can be used in Python.

Introducing Google Classroom

Google Classroom is yet another player in the learning management system industry. This platform provides most of the basics that are expected in a lms.  This post is not a critique of Google Classroom. Rather, the focus here is on how to use it. It is better for you to decide for yourself about the quality of Google Classroom.

In this post, we will learn how to set up a class in order to prepare the learning experience.

Before we begin it is assumed that you have a Gmail account as this is needed to access Google Classroom. In addition, this demonstration is from an individual account and not through the institutional account that a school would set up with Google if they adopted tGoogle Classroom.

Creating a Google Class

Once you are logged in to your Gmail account you can access Google Classroom by clicking on the little gray squares in the upper right-hand corner of your browser. Doing so will show the following.

1

In the example above, Google Classroom is the icon in the bottom row in the middle. When you click on it you will see the following.

1

You might see a screen before this asking if you are a student or teacher. In the screen above, Google tells you where to click to make your first class. Therefore, click on the plus sign and click on “create class” and you will see the following.

1

Click on the box which promises Google you will only use your classroom with adults. After this, you will see a dialog box where you can give your class a name as shown below.

1

Give your course a name and click “create”. Then you will see the following.

1.pngThere is a lot of information here. The name of the class is at the top followed by the name of the teacher below. In the middle of the page, you have something called the “stream”. This is where most of the action happens in terms of posting assignments, leading discussions, and making announcements. To the left are some options for dealing with the stream, a calendar, and a way to organize information in the stream by topic.

The topic feature is valuable because it allows you to organize information in a way similar to topics in Moodle. When creating an activity just be sure to assign it to a topic so students can see expectations for that week’s work. This will be explained more in the future.

One thing that was not mentioned was the tabs at the very top of the screen.

1

We started in the “stream” tab. If you click on the “students” tab you will see the following.

1.png

The “invite students” button allows you to add students by typing their email. To the left, you have the class code. This is the code people need in order to add your course.

If you click on the “about” tab you will see the following.

1

Here you can access the drive where all files are saved, the class calendar, your Google calendar, and even invite teachers. In the middle, you can edit the information about the course as well as additional materials that the students will need. This page is useful because it is not dynamic like the stream page. Posted files staying easy to find when using the “about” page.

Conclusion

Google Classroom is not extremely difficult to learn. You can set-up a course with minimal computer knowledge in less than 20 minutes. The process shared hear was simply the development of a course. In a future post, we will look at how setup teaching activities and other components of a balanced learning experience.

Luther and Educational Reform

Martin Luther (1483-1546) is best known for his religious work as one of the main catalysts for the Protestant Reformation. However, Luther was also a powerful influence on education during his lifetime. This post will take a look at Luther’s early life and his contributions to education

Early Life

Luther was born during the late 15th century. His father was a tough miner with a severe disciplinarian streak. You would think that this would be a disaster but rather the harsh discipline gave Luther a toughness that would come in handy when standing alone for his beliefs.

Upon reaching adulthood Luther studied law as his father diseased for him to become a lawyer. However, Luther decided instead to become a monk much to the consternation of his father.

As a monk, Luther was a diligent student and studied for several additional degrees. Eventually, he was given an opportunity to visit Rome which was the headquarters of his church. However, Luther saw things there that troubled him and in many laid the foundation for his doubt in the direction of his church.

Eventually, Luther had a serious issue with several church doctrines. This motivated him to nail his 95 theses onto the door of a church in 1517. This act was a challenge to defend the statements in the theses and was actually a common behavior among the scholarly community at the time.

For the next several years it was a back forth intellectual battle with the church. A common pattern was the church would use some sort of psychological torture such as the eternal damnation of his soul and Luther would ask for biblical evidence which was normally not given. Finally, in 1521 at the Diet of Worms, Luther was forced to flee for his life and the Protestant Reformation had in many was begun.

Views on Education

Luther’s views on education would not be considered radical or innovative today but they were during his lifetime. For our purposes, we will look at three tenets of Luther’s position on education

  • People should be educated so they can read the scriptures
  • Men and women should receive an education
  • Education  should benefit the church and state

People Should be Educated so they Can Read the Scriptures

The thought that everyone should be educated was rather radical. By education, we mean developing literacy skills and not some form of vocational training. Education was primarily for those who needed it which was normally the clergy, merchants, and some of the nobility.

If everyone was able to read it would significantly weaken the churches position to control spiritual ideas and the state’s ability to maintain secular control, which is one reason why widespread literacy was uncommon. Luther’s call for universal education would not truly be repeated until Horace Mann and the common school. movement.

The idea of universal literacy also held with it a sense of personal responsibility. No one could rely on another to understand scripture. Everyone needs to know how to read and interpret scripture for themselves.

Men and Women Should be Educated

The second point is related to the first. Luther said that everyone should be educated he truly meant everyone. This means men and women should learn literacy. The women could not hide behind the man for her spiritual development but needed to read for herself.

Again the idea of women education was controversial at the time. The Greeks believed that educating women was embarrassing although this view was not shared by all in any manner.

WOmen were not only educated for spiritual reasons but also so they could manage the household as well. Therefore, there was a spiritual and a practical purpose to the education of women for Luther

Education Benefits the Church and the State

Although it was mentioned that education had been neglected to maintain the power of the church and state. For Luther, educated citizens would be of a greater benefit to the church and state.

The rationale is that the church would receive ministers, teachers, pastors, etc. and the state would receive future civil servants. Therefore, education would not tear down society but would rather build it up.

Conclusion

Luther was primarily a reformer but also was a powerful force in education. His plea for the development of education in Germany led to the construction of schools all over the Protestant controlled parts of Germany. His work was of such importance that he has been viewed as one of the leading educational reformers of the 16th century.

Education During the Reformation

By the 16th century, Europe was facing some major challenges to the established order of doing things. Some of the causes of the upheaval are less obvious than others.

For example, the invention of gunpowder made knights useless. This was significant because now any common soldier could be more efficient and useful in battle than a knight that took over ten years to train. This weakened the prestige of the nobility at least temporarily while adjustments were made within the second estate and led to a growth in the prestige of the third estate who were adept at using guns.

The church was also facing majors issues. After holding power for almost 1000 years people began to chaff at the religious power of Europe. There was a revival in learning that what aggressively attacked by monks, who attacked the study of biblical languages accusing this as the source of all heresies.

The scholars of the day mock religion as a superstition. Furthermore, the church was accused of corruption and for abusing power. The scholars or humanists called for a return to the Greek and Romans classics, which was the prevailing worldview before the ascension of Catholicism.

Out of the chaos sprang the protestant reformation which rejects the teachings of the medieval church. The Protestants did not only have a different view on religion but also on how to educate as we shall see.

Protestant Views of Education

A major tenet of Protestantism that influenced their view on education was the idea of personal responsibility. What this meant was that people needed to study for themselves and not just listen to the teacher. In a spiritual sense that meant reading the Bible for one’s self. In an educational sense, it meant confirming authority with personal observation and study.

Out of this first principal springs two other principles which are education that matches an individual’s interest and the study of nature. Protestants believed that education should support the natural interest and ablities of a person rather than the interest of the church.

This was and still is a radical idea. Most education today is about the student adjusting themselves to various standards and benchmarks developed by the government. Protestants challenged this view and said education should match the talents of the child. If a child shows interest in woodworking teach this to him. If he shows interest in agriculture teach that to him.

To be fair, attempts have been made in education to “meet the needs” of the child and to differentiate instruction. However, these goals are made in order to take a previously determined curriculum and make it palpable to the student rather than designing something specifically for the individual student. The point is that a child is more than a cog in a machine to be trained as a screwdriver or hammer but rather an individual whose value is priceless.

Protestants also support the study of nature. Be actually observing nature it reduced a great deal of the superstition of the time. At one point, the religious power of Europe forbade the study of human anatomy through the performing autopsies. In addition, Galileo was in serious trouble for denying the geocentric model of the solar system. Such restrictions stalled science for years and were removed through Protestantism.

Conclusion

The destabilization that marks the reformation marks a major break in history. With the decline of the church came the rise of the common man to a position of independent thought and action. These ideas of personal responsibility came from the growing influence of Protestants in the world.

Education in Ancient China

As one of the oldest civilizations in the world, China has a rich past when it comes to education. This post will explore education in Ancient China by providing a brief overview of it. The following topics

  1. Background
  2. What was Taught
  3. How was it Taught
  4. The Organization of what was Taught
  5. The Evidence Students Provided of their Learning

Background

Ancient Chinese education is an interesting contrast. On the one hand, they were major innovators of some of the greatest invention of mankind which includes paper, printing, gunpowder, and the compass. On the other hand, Chinese education in the past was strongly collective in nature with heavy governmental control. There was extreme pressure to conform to ancient customs and independent deviate behavior was looked down upon.  Despite this, there as still innovation.

Most communities had a primary school and most major cities had a college. Completing university study was a great way to achieve a government position in ancient China.

What Did they Teach

Ancient Chinese education focused almost exclusively on Chinese Classics. By classics, it is meant the writings of mainly Confucius. Confucius emphasized strict obedience in a hierarchical setting. The order was loosely King, Father, Mother, then the child. Deference to authority was the ultimate duty of everyone. There is little surprise that the government support such an education that demanded obedience to them.

Another aspect of Confucius writings that was stressed was the Five Cardinal Virtues which were charity, justice, righteousness, sincerity, and conformity to tradition. This was the heart of the moral training that young people received. Even leaders needed to demonstrate these traits which limited abuses of power at times.

What China is also famous for in their ancient curriculum is what they did not teach.  Supposedly, they did not cover in great detail geography, history, math, science, or language. The focus was on Confucius apparently almost exclusively.

How Did they Teach

Ancient Chinese education was taught almost exclusively by rote memory. Students were expected to memorized large amounts of information.  This contributed to a focus on the conservation of knowledge rather than the expansion of it. If something new or unusual happened it was difficult to deal with since there was no prior way already developed to address it.

How was Learning Organized

School began at around 6-7 years of age in the local school. After completing studies at the local school. Some students went to the academy for additional studies.  From Academy, some students would go to university with the hopes of completing their studies to obtain a government position.

Generally,  the education was for male students as it was considered shameful to not educate a boy. Girls often did not go to school and often handle traditional roles in the home.

Evidence of Learning

Evidence of learning in the Chinese system was almost strictly through examinations. The examinations were exceedingly demanding and stressful. If a student was able to pass the gauntlet of rot memory exams he would achieve his dream of completing college and joining the prestigious Imperial Academy as a Mandarin.

Conclusion

Education in Ancient China was focused on memorization, tradition,  and examination. Even with this focus, Ancient China developed several inventions that have had a significant influence on the world. Explaining this will only lead to speculation but what can be said is that progress happens whether it is encouraged or not.

Augmented Matrix for a System of Equations

Matrices are a common tool used in algebra. They provide a way to deal with equations that have commonly held variables. In this post, we learn some of the basics of developing matrices.

From Equation to Matrix

Using a matrix involves making sure that the same variables and constants are all in the same column in the matrix. This will allow you to do any elimination or substitution you may want to do in the future. Below is an example

11

Above we have a system of equations to the left and an augmented matrix to the right. If you look at the first column in the matrix it has the same values as the x variables in the system of equations (2 & 3). This is repeated for the y variable (-1 & 3) and the constant (-3 & 6).

The number of variables that can be included in a matrix is unlimited. Generally,  when learning algebra, you will commonly see 2 & 3 variable matrices. The example above is a 2 variable matrix below is a three-variable matrix.

11

If you look closely you can see there is nothing here new except the z variable with its own column in the matrix.

Row Operations 

When a system of equations is in an augmented matrix we can perform calculations on the rows to achieve an answer. You can switch the order of rows as in the following.

1.png

You can multiply a row by a constant of your choice. Below we multiple all values in row 2 by 2. Notice the notation in the middle as it indicates the action performed.

1

You can also add rows together. In the example below row 1 and row 2, are summed to create a new row 1.

1

You can even multiply a row by a constant and then sum it with another row to make a new row. Below we multiply row 2 by 2 and then sum it with row 1 to make a new row 1.

1

The purpose of row operations is to provide a way to solve a system of equations in a matrix. In addition, writing out the matrices provides a way to track the work that was done. It is easy to get confused even the actual math is simple

Conclusion

System of equations can be difficult to solve. However, the use of matrices can reduce the computational load needed to solve them. You do need to be careful with how you modify the rows and columns and this is where the use of row operations can be beneficial.

System of Equations and Mixture Application

Solving a system of equations with a mixture application involves combining two or more quantities. The general setup for the equations is as follows

Quantity * value = total

This equation is used for both equations. You simply read the problem and plug in the information. The examples in this post are primarily related to business as this is one of the more practical applications of solving a system of equations for the average person. However, a system of equations for mixtures can also be used for determining solutions but this is more common in chemistry.

Example 1: Making Food 

John wants to make 20 lbs of granola using nuts and raisins. His budget requires that the granola cost $3.80 per pound. Nuts are $4.50 per pound and raisins are $1.00 per pound. How many pounds of nuts and raisins can he use?

The first thing we need to determine what we know

  • cost of the raisins
  • cost of the nuts
  • total cost of the granola
  • number of pounds of granola to make

Below is all of our information in a table

Pounds * Price Total
Nuts n 4.50 4.5n
Raisins r 1 r
Granola 20 3.80 3.8(20) = 76

What we need to know is how many pounds of nuts and raisins can we use to have the total price per pound be $3.80.

With this information, we can set up our system of equations. We take the pounds column and create the first equation and the total column to create the second equation.

1

We will use elimination to solve this system. We will multiply the first equation by -1 and combine them. Then we solve for n as in the steps below

1.png

We know n = 16 or that we can have 16 pounds of nuts. To determine the amount of raisins we use our first equation in the system.

1.png

You can check this yourself if you desire.

Example 2: Interests

Below is an example that involves two loans with different interest rates. Our job will be to determine the principal amount of the loan.

Tom owes $43,080 on two student loans. The bank’s interest rate is 5.25% and the federal loan rate is 2.95%. The total amount of interest he paid last two years was 6678.72. What was the principal for each loan

The first thing we need to determine what we know

  • bank interest rate
  • Federal interest rate
  • time of repayment
  • Amount of loan
  • Interest paid so far

Below is all of our information in a table

Principal * Rate Time Total
Bank b 0.0525 1 0.0525b
Federal f 0.0295 1 0.0295f
Total 43080 1752.45

Below is our system of equation

1.png

To solve the system of equations we will use substitution. First, we need to solve for b as shown below

1.png

We now substitute  and solve

1

We know the federal loan is $22,141.30 we can use this information to find the bank loan amount using the first equation.

1.png

The bank loan was $20,938.70

Conclusion

Hopefully, it is clear by now that solving a system of equations can have real-world significance. Applications of this concept can be useful in the context of business as shown here.

Education in Ancient India

In this post, we take a look at India education in the ancient past. The sub-continent of India has one of the oldest civilizations in the world. Their culture has had a strong influence on both the East and West.

Background

One unique characteristic of ancient education in India is the influence of religion. The effect of Hinduism is strong. The idea of the caste system is derived from Hinduism with people being divided primarily into four groups

  1. Brahmins-teachers/religious leaders
  2. Kshatriyas-soldiers kings
  3. Vaisyas-farmers/merchants
  4. Sudras-slaves

This system was ridged. There was no moving between caste and marriages between castes was generally forbidden. The Brahmins were the only teachers as it was embarrassing to allow one’s children to be taught by another class. They received no salary but rather received gifts from their students

What Did they Teach

The Brahmins served as the teachers and made it their life work to reinforce the caste system through education. It was taught to all children to understand the importance of this system as well as the role of the  Brahmin at the top of it.

Other subjects taught at the elementary level include the 3 r’s. At the university level, the subjects included grammar, math, history, poetry, philosophy, law, medicine, and astronomy. Only the Brahmins completed formal universities studies so that they could become teachers. Other classes may receive practical technical training to work in the government, serve in the military, or manage a business.

Something that was missing from education in ancient India was physical education. For whatever reason, this was not normally considered important and was rarely emphasized.

How Did they Teach

The teaching style was almost exclusively rote memorization. Students would daily recite mathematical tables and the alphabet. It would take a great deal of time to learn to read and write through this system.

There was also the assistance of an older student to help the younger ones to learn. In a way, this could be considered as a form of tutoring.

How was Learning Organized

School began at 6-7. The next stage of learning was university 12 years later. Women did not go to school beyond the cultural training everyone received in early childhood.

Evidence of Learning

Learning mastery was demonstrated through the ability to memorize. Other forms of thought and effort were not the main criteria for demonstrating mastery.


Conclusion

Education in India serves a purpose that is familiar to many parts of the world. That purpose was social stability. With the focus on the caste system before other forms of education, India was seeking stability before knowledge expansion and personal development. This can be seen in many ways but can be agreed upon is that the country is still mostly intact after several thousand years and few can make such a claim even if their style of education is superior to India’s.

Solving a System of Equations with Direct Translation

In this post, we will look at two simple problems that require us to solve for a system of equations. Recall that a system of equations involves two or more variables that must be solved. With each problem, we will use the direct translation to set up the problem so that it can be solved.

Direct Translation 

Direct translation involves reading a problem and translating it into a system of equations. In order to do this, you must consider the following steps

  1. Determine what you want to know
  2. Assigned variables to what you want to know
  3. Setup the system of equations
  4. Solve the system

Example `1

Below is an example  followed by a step-by-step breakdown

The sum of two numbers is zero. One number is 18 less than the other. Find the numbers.

Step 1: We want to know what the two numbers are

Step 2: n = first number & m =  second number

Step 3: Set up system

1

Solving this is simple we know n = m – 18 so we plug this into the first equation n + m = 0  and solve for m.

1.png

Now that we now m we can solve for n in the second equation

1.png

The answer is m = 9 and n = -9. If you add these together they would come to zero and meet the criteria for the problem.

Example 2

Below is a second example involving a decision for salary options.

Dan has been offered two options for his salary as a salesman. Option A would pay him $50,000 plus $30 for each sale he closes. Option B would pay him $35,000 plus $80 for each sale he closes. How many sales before the salaries are equal

Step 1: We want to know when the salaries are equal based on sales

Step 2: d =  Dan’s salary & s = number of sales

Step 3: Set up system

1.png

To solve this problem we can simply substitute d  for one of the salaries as shown below

1

You can check to see if this answer is correct yourself. In order for the two salaries to equal each other Dan would need to sale 300 units. After 300 units option B is more lucrative. Deciding which salary option to take would probably depend on how many sales Dan expects to make in a year.

Conclusion

Algebraic concepts can move beyond theoretical ideas and rearrange numbers to practical applications. This post showed how even something as obscure as a system of equations can actually be used to make financial decisions.

Solving a System of Equations by Substitution and Elimination

A system of equations involves trying to solve for more than one variable. What this means is that a system of equations helps you to see how to different equations relate or where they intersect if you were to graph them.

There are several different ways to solve a system of equations. In this post, we will solve y using the substitution and the elimination methods.

Substitution

Substitution involves choosing one of the two equations and solving for one variable. Once this is done we substitute the expression into the equation for which we did not solve a variable for. When this is done the second equation only has one unknown variable and this is basic algebra to solve.

The explanation above is abstract so here is a mathematical example

1.png

We are not done. We now need to use are x value to find our y value. We will use the first equation and replace x to find y.

1

This means that our ordered pair is (4, -1) and this is the solution to the system. You can check this answer by plugging both numbers into the x and y variable in both equations.

Elimination

Elimination begins with two equations and two variables but eliminates one variable to have one equation with one variable. This is done through the use of the addition property of equality which states when you add the same quantity to both sides of an equation you still have equality. For example 2+2 = 2 and if at 5 to both sides I get 7 + 7 = 7. The equality remains.

Therefore, we can change one equation using the addition property of equality until one of the variables has the same absolute value for both equations. Then we add across to eliminate one of the variables. If one variable is positive in one equation and negative in the other and has the same absolute value they will eliminate each other. Below is an example using the same system of equations as the previous example.

.1.png

You can take the x value and plug it into y. We already know y =1 from the previous example so we will skip this.

There are also times when you need to multiply both equations by a constant so that you can eliminate one of the variables

1.png

We now replace x with 0 in the second equation

1

Our ordered pair is (0, -3) which also means this is where the two lines intersect if they were graphed.

Conclusion

Solving a system of equations allows you to handle two variables (or more) simultaneously. In terms of what method to use it really boils down to personal choice as all methods should work. Generally, the best method is the one with the least amount of calculation.

Writing Discussion & Conclusions in Research

The Discussion & Conclusion section of a research article/thesis/dissertation is probably the trickiest part of a project to write. Unlike the other parts of a paper, the Discussion & Conclusions are hard to plan in advance as it depends on the results. In addition, since this is the end of a paper the writer is often excited and wants to finish it quickly, which can lead to superficial analysis.

This post will discuss common components of the Discussion & Conclusion section of a paper. Not all disciplines have all of these components nor do they use the same terms as the ones mentioned below.

Discussion

The discussion is often a summary of the findings of a paper. For a thesis/dissertation, you would provide the purpose of the study again but you probably would not need to share this in a short article. In addition, you also provide highlights of what you learn with interpretation. In the results section of a paper, you simply state the statistical results. In the discussion section, you can now explain what those results mean for the average person.

The ordering of the summary matters as well. Some recommend that you go from the most important finding to the least important. Personally, I prefer to share the findings by the order in which the research questions are presented. This maintains a cohesiveness across sections of a paper that a reader can appreciate. However, there is nothing superior to either approach. Just remember to connect the findings with the purpose of the study as this helps to connect the themes of the paper together.

What really makes this a discussion is to compare/contrast your results with the results of other studies and to explain why the results are similar and or different. You also can consider how your results extend the works of other writers. This takes a great deal of critical thinking and familiarity with the relevant literature.

Recommendation/Implications

The next component of this final section of the paper is either recommendations or implications but almost never both. Recommendations are practical ways to apply the results of this study through action. For example, if your study finds that sleeping 8 hours a night improves test scores then the recommendation would be that students should sleep 8 hours a night to improve their test scores. This is not an amazing insight but the recommendations must be grounded in the results and not just opinion.

Implications, on the other hand, explain why the results are important. Implications are often more theoretical in nature and lack the application of recommendations. Often implications are used when it is not possible to provide a strong recommendation.

The terms conclusion and implications are often used interchangeably in different disciplines and this is highly confusing. Therefore, keep in mind your own academic background when considering what these terms mean.

There is one type of recommendation that is almost always present in a study and that is recommendations for further study. This is self-explanatory but recommendations for further study are especially important if the results are preliminary in nature. A common way to recommend further studies is to deal with inconclusive results in the current study. In other words, if something weird happened in your current paper or if something surprised you this could be studied in the future. Another term for this is “suggestions for further research.”

Limitations

Limitations involve discussing some of the weaknesses of your paper. There is always some sort of weakness with a sampling method, statistical analysis, measurement, data collection etc. This section is an opportunity to confess these problems in a transparent matter that further researchers may want to control for.

Conclusion

Finally, the conclusion of the Discussion & Conclusion is where you try to summarize the results in a sentence or two and connect them with the purpose of the study. In other words, trying to shrink the study down to a one-liner. If this sounds repetitive it is and often the conclusion just repeats parts of the discussion.

Blog Conclusion

This post provides an overview of writing the final section of a research paper. The explanation here provides just one view on how to do this. Every discipline and every researcher has there own view on how to construct this section of a paper.

Common Problems with Research for Students

I have worked with supporting undergrad and graduate students with research projects for several years. This post is what I consider to be the top reasons why students and even the occasional faculty member struggles to conduct research. The reasons are as follows

  1. They don’t read
  2. No clue what  a problem is
  3. No questions
  4. No clue how to measure
  5. No clue how to analyze
  6. No clue how to report

Lack of Reading

The first obstacle to conducting research is that students frequently do not read enough to conceptualize how research is done. Reading not just anything bust specifically research allows a student to synthesize the vocabulary and format of research writing. You cannot do research unless you first read research. This axiom applies to all genres of writing.

A common complaint is the difficulty with understanding research articles. For whatever reason, the academic community has chosen to write research articles in an exceedingly dense and unclear manner. This is not going to change because one graduate student cannot understand what the experts are saying. Therefore, the only solution to understand research English is exposure to this form of communication.

Determining the Problem

If a student actually reads they often go to the extreme of trying to conduct Nobel Prize type research. In other words, their expectations are overinflated given what they know. What this means is that the problem they want to study is infeasible given the skillset they currently possess.

The opposite extreme is to find such a minute problem that nobody cares about it. Again, reading will help in avoiding this two pitfalls.

Another problem is not knowing exactly how to articulate a problem. A student will come to me with excellent examples of a problem but they never abstract or take a step away from the examples of the problem to develop a researchable problem. There can be no progress without a clearly defined research problem.

Lack the Ability to Ask Questions about the Problem

If a student actually has a problem they never think of questions that they want to answer about the problem. Another extreme is they ask questions they cannot answer. Without question, you can never better understand your problem. Bad questions or no questions means no answers.

Generally, there are three types of quantitative research questions while qualitative is more flexible. If a student does not know this they have no clue how to even begin to explore their problem.

Issues with Measurement

Let’s say a student does know what their questions are, the next mystery for many is measuring the variables if the study is quantitative. This is were applying statistical knowledge rather than simply taking quizzes and test comes to play. The typical student does not understand often how to operationalize their variables and determine what type of variables they will include in their study. If you don’t know how you will measure your variables you cannot answer any questions about your problem.

Lost at the Analysis Stage

The measurement affects the analysis. I cannot tell you how many times a student or even a colleague wanted me to analyze their data without telling me what the research questions were. How can you find answers without questions? The type of measurement affects the potential ways of analyzing data. How you summary categorical data is different from continuous data. Lacking this knowledge leads to inaction.

No Plan for the Write-Up

If a student makes it to this stage, firstly congratulations are in order, however, many students have no idea what to report or how. This is because students lose track of the purpose of their study which was to answer their research questions about the problem. Therefore, in the write-up, you present the answers systematically. First, you answer question 1, then 2, etc.

If necessary you include visuals of the answers. Again Visuals are determined by the type of variable as well as the type of question. A top reason for article rejection is an unclear write-up. Therefore, great care is needed in order for this process to be successful.

Conclusion

Whenever I deal with research students I often walk through these six concepts. Most students never make it past the second or third concept. Perhaps the results will differ for others.

Successful research writing requires the ability to see the big picture and connection the various section of a paper so that the present a cohesive whole. Too many students focus on the little details and forget the purpose of their study. Losing the main idea makes the details worthless.

If I left out any common problems with research please add them in the comments section.

Reading Comprehension Strategies

Students frequently struggle with understanding what they read. There can be many reasons for this such as vocabulary issues, to struggles with just sounding out the text. Another common problem, frequently seen among native speakers of a language, is the students just read without taking a moment to think about what they read. This lack of reflection and intellectual wrestling with the text can make so that the student knows they read something but knows nothing about what they read.

In this post, we will look at several common strategies to support reading comprehension. These strategies include the following…

Walking a Student Through the Text

As students get older, there is a tendency for many teachers to ignore the need to guide students through a reading before the students read it. One way to improve reading comprehension is to go through the assigned reading and give an idea to the students of what to expect from the text.

Doing this provides a framework within the student’s mind in which they can add the details to as they do the reading. When walking through a text with students the teacher can provide insights into important ideas, explain complex words, explain visuals, and give general ideas as to what is important.

Ask Questions

Asking question either before or after a reading is another great way to support students understanding. Prior questions give an idea of what the students should be expected to know after reading. On the other hand, questions after the reading should aim to help students to coalesce the ideals they were exposed to in the reading.

The type of questions is endless. The questions can be based on Bloom’s taxonomy in order to stimulate various thinking skills. Another skill is probing and soliciting responses from students through encouraging and asking reasonable follow-up questions.

Develop Relevance

Connecting what a student knows what they do not know is known as relevance.If a teacher can stretch a student from what they know and use it to understand what is new it will dramatically improve comprehension.

This is trickier than it sounds. It requires the teacher to have a firm grasp of the subject as well as the habits and knowledge of the students. Therefore, patience is required.

Conclusion

Reading is a skill that can improve a great deal through practice. However, mastery will require the knowledge and application of strategies. Without this next level of training, a student will often become more and more frustrated with reading challenging text.

Criticism of Grades

Grading has recently been under attack with people bringing strong criticism against the practice. Some schools have even stopped using grades altogether. In this post, we will look at problems with grading as well as alternatives.

It Depends on the Subject

The weakness of grading is often seen much more clearly in subjects that have more of a subjective nature to them from the Social sciences and humanities such as English, History, or Music. Subjects from the hard sciences such as biology, math, and engineering are more objective in nature. If a student states that 2 + 2 = 5 there is little left to persuasion or critical thinking to influence the grade.

However, when it comes to judging thinking or musical performance it is much more difficult to assess this without bringing the subjectivity of opinion. This is not bad as a teacher should be an expert in their domain but it still brings an arbitrary unpredictability to the system of grading that is difficult to avoid.

Returning to the math problem, if a student stats 2 +2 =  4 this answer is always right whether the teacher likes the student or not. However, an excellent historical essay on slavery can be graded poorly if the history teacher has issues with the thesis of the student. To assess the essay requires subjective though into the quality of the student’s writing and subjectivity means that the assessment cannot be objective.

Obsession of Students

Many students become obsess and almost worship the grades they receive. This often means that the focus becomes more about getting an ‘A’ than on actually learning. This means that the students take no-risk in their learning and conform strictly to the directions of the teacher. Mindless conformity is not a sign of future success.

There are many comments on the internet about the differences between ‘A’ and ‘C’ students. How ‘A’ students are conformist and ‘C’ students are innovators. The point is that the better the academic performance of a student the better they are at obeying orders and not necessarily on thinking independently.

Alternatives to Grades

There are several alternatives to grading. One of the most common is Pass/fail. Either the student passes the course or they do not. This is common at the tertiary level especially in highly subjective courses such as writing a thesis or dissertation. In such cases, the student meets the “mysterious” standard or they do not.

Another alternative is has been the explosion in the use of gamification. As the student acquires the badges, hit points, etc. it is evidence of learning. Of course, this idea is applied primarily at the K-12 level but it the concept of gamification seems to be used in almost all of the game apps available on cellphones as well as many websites.

Lastly, observation is another alternative. In this approach, the teacher makes weekly observations of each student. These observations are then used to provide feedback for the students. Although time-consuming this is a way to support students without grades.

Conclusion

As long as there is education there must be some sort of way to determine if students are meeting expectations. Grades are the current standard. As with any system, grades have their strengths and weaknesses. With this in mind, it is the responsibility of teachers to always search for ways to improve how students are assessed.

Supporting ESL Student’s Writing

ESL students usually need to learn to write in the second language. This is especially true for those who have academic goals. Learning to write is difficult even in one’s mother tongue let alone in a second language.

In this post, we will look at several practical ways to help students to learn to write in their L2. Below are some useful strategies

  • Build on what they know
  • Encourage coherency in writing
  • Encourage collaboration
  • Support Consistency

Build on Prior Knowledge

It is easier for most students to write about what they know rather than what they do not know.  As such, as a teacher, it is better to have students write about a familiar topic. This reduces the cognitive load on the students allows them to focus more on their language issues.

In addition, building on prior knowledge is consistent with constructivism. Therefore, students are deepening their learning through using writing to express ideas and opinions.

Support Coherency 

Coherency has to do with whether the paragraph makes sense or not. In order to support this, the teacher needs to guide the students in developing main ideas and supporting details and illustrate how these concepts work together at the paragraph level. For more complex writing this involves how various paragraphs work together to support a thesis or purpose statement.

Students struggle tremendously with these big-picture ideas. This in part due to the average student’s obsession with grammar. Grammar is critical after the student has ideas to share clearer and never before that.

Encourage Collaboration

Students should work together to improve their writing. This can involve peer editing and or brainstorming activities. These forms of collaboration give students different perspectives on their writing beyond just depending on the teacher.

Collaboration is also consistent with cooperative learning. In today’s marketplace, few people are granted the privilege of working exclusively alone on anything.  In addition, working together can help the students to develop their English speaking communication skills.

Consistency

Writing needs to be scheduled and happen frequently in order to see progress at the ESL level. This is different from a native speaking context in which the students may have several large papers that they work on alone. In the ESL classroom, the students should write smaller and more frequent papers to provide more feedback and scaffolding.

Small incremental growth should be the primary goal for ESL students. This should be combined with support from the teacher through a consistent commitment to writing.

Conclusion

Writing is a major component of academic life. Many ESL students learning a second language to pursue academic goals. Therefore, it is important that teachers have ideas on how they can support ESL student to achieve the fluency they desire in their writing for further academic success.