Logistic Regression in R

In this post, we will conduct a logistic regression analysis. Logistic regression is used when you want to predict a categorical dependent variable using continuous or categorical dependent variables. In our example, we want to predict Sex (male or female) when using several continuous variables from the “survey” dataset in the “MASS” package.

?MASS::survey #explains the variables in the study

The first thing we need to do is remove the independent factor variables from our dataset. The reason for this is that the function that we will use for the cross-validation does not accept factors. We will first use the “str” function to identify factor variables and then remove them from the dataset. We also need to remove in examples that are missing data so we use the “na.omit” function for this. Below is the code


We now need to check for collinearity using the “corrplot.mixed” function form the “corrplot” package.



We have extreme correlation between “We.Hnd” and “NW.Hnd” this makes sense because people’s hands are normally the same size. Since this blog post  is a demonstration of logistic regression we will not worry about this too much.

We now need to divide our dataset into a train and a test set. We set the seed for. First we need to make a variable that we call “ind” that is randomly assigns 70% of the number of rows of survey 1 and 30% 2. We then subset the “train” dataset by taking all rows that are 1’s based on the “ind” variable and we create the “test” dataset for all the rows that line up with 2 in the “ind” variable. This means our data split is 70% train and 30% test. Below is the code

ind<-sample(2,nrow(survey),replace=T,prob = c(0.7,0.3))

We now make our model. We use the “glm” function for logistic regression. We set the family argument to “binomial”. Next, we look at the results as well as the odds ratios.

## Call:
## glm(formula = Sex ~ ., family = binomial, data = train)
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9875  -0.5466  -0.1395   0.3834   3.4443  
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -46.42175    8.74961  -5.306 1.12e-07 ***
## Wr.Hnd       -0.43499    0.66357  -0.656    0.512    
## NW.Hnd        1.05633    0.70034   1.508    0.131    
## Pulse        -0.02406    0.02356  -1.021    0.307    
## Height        0.21062    0.05208   4.044 5.26e-05 ***
## Age           0.00894    0.05368   0.167    0.868    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for binomial family taken to be 1)
##     Null deviance: 169.14  on 122  degrees of freedom
## Residual deviance:  81.15  on 117  degrees of freedom
## AIC: 93.15
## Number of Fisher Scoring iterations: 6
##  (Intercept)       Wr.Hnd       NW.Hnd        Pulse       Height 
## 6.907034e-21 6.472741e-01 2.875803e+00 9.762315e-01 1.234447e+00 
##          Age 
## 1.008980e+00

The results indicate that only height is useful in predicting if someone is a male or female. The second piece of code shares the odds ratios. The odds ratio tell how a one unit increase in the independent variable leads to an increase in the odds of being male in our model. For example, for every one unit increase in height there is a 1.23 increase in the odds of a particular example being male.

We now need to see how well our model does on the train and test dataset. We first capture the probabilities and save them to the train dataset as “probs”. Next we create a “predict” variable and place the string “Female” in the same number of rows as are in the “train” dataset. Then we rewrite the “predict” variable by changing any example that has a probability above 0.5 as “Male”. Then we make a table of our results to see the number correct, false positives/negatives. Lastly, we calculate the accuracy rate. Below is the code.

train$probs<-predict(fit, type = 'response')
##          Female Male
##   Female     61    7
##   Male        7   48
## [1] 0.8861789

Despite the weaknesses of the model with so many insignificant variables it is surprisingly accurate at 88.6%. Let’s see how well we do on the “test” dataset.

test$prob<-predict(fit,newdata = test, type = 'response')
##          Female Male
##   Female     17    3
##   Male        0   26
## [1] 0.9347826

As you can see, we do even better on the test set with an accuracy of 93.4%. Our model is looking pretty good and height is an excellent predictor of sex which makes complete sense. However, in the next post we will use cross-validation and the ROC plot to further assess the quality of it.


Teaching Vocabulary to ESL Students

Language acquisition  requires the acquisition of thousands of words for fluent communication. This is a daunting task for the most talented and eager student. Fortunately, there are some basic concepts to keep in mind when teaching students vocabulary. This post will share some suggestion and helping students to develop there vocabulary in the target language.

Learn Vocabulary in Context

A common technique for teaching vocabulary in language classrooms is out of context memorization. Students are given a long and often boring list of words to memorize. There is little immediate use of these words and they are quickly forgotten after the quiz.

Instead, it is better to teach new words within a framework in which they will be used. For example, students learn business terms through role play at a bank or store rather than through a stack of index cards. The context of the bank connects the words to a real-world setting, which is critical for retention in the long-term memory.

Reduce Reliance on Bilingual Dictionaries

This may seem as a surprise, however, the proliferation of bilingual dictionaries provides the definition to a word but does not normally help with memorization and  the future use of the word. If the goal is communication then bilingual dictionaries will slow a student’s ability to achieve mastery.

Children learn language much faster do in part to the immense effort it takes to learn what new words mean without the easy answer of a dictionary. The effort leads to memorization which allows for the use of the language. This serves as a valuable lesson for adults who prefer the easy route of bilingual dictionaries.

Set Aside Class Time to Deal with Vocabulary

The teacher should have a systematic plan for helping students to develop relevant vocabulary. This can be done through activities as well as the teaching of context clues. Vocabulary development needs to be intentional, which means there must be a systematic plan for supporting students in this.

However, there are also times were unplanned vocabulary teaching can take place. For example, while the students are reading together they become puzzled over a word you thought they knew (this is common). When this happens a break with explanation can be helpful. This is especially true if you let the students work together without dictionaries to try and determining the meaning of the word.


Vocabulary is a necessary element to language learning. It would be nice to ignores this but normally this is impossible.  As such, teachers need to support students in their vocabulary development.

Probability,Odds, and Odds Ratio

In logistic regression, there are three terms that are used frequently but can be confusing if they are not thoroughly explained. These three terms are probability, odds, and odds ratio. In this post, we will look at these three terms and provide an explanation of them.


Probability is probably (no pun intended) the easiest of these three terms to understand. Probability is simply the likelihood that a certain even will happen.  To calculate the probability in the traditional sense you need to know the number of events and outcomes to find the probability.

Bayesian probability uses prior probabilities to develop a posterior probability based on new evidence. For example, at one point during Super Bowl LI the Atlanta Falcons had a 99.7% chance of winning. This was base don such factors as the number points they were ahead and the time remaining.  As these changed, so did the probability of them winning. yet the Patriots still found a way to win with less then a 1% chance

Bayesian probability was also used for predicting who would win the 2016 US presidential race. It is important to remember that probability is an expression of confidence and not a guarantee as we saw in both examples.


Odds are the expression of relative probabilities. Odds are calculated using the following equation

probability of the event ⁄ 1 – probability of the event

For example, at one point during Super Bowl LI the odds of the Atlanta Falcons winning were as follows

0.997 ⁄ 1 – 0.997 = 332

This can be interpreted as the odds being 332 to 1! This means that Atlanta was 332 times more likely to win the Super Bowl then loss the Super Bowl.

Odds are commonly used in gambling and this is probably (again no pun intended) where most of us have heard the term before. The odds is just an extension of probabilities and the are most commonly expressed as a fraction such as one in four, etc.

Odds Ratio

A ratio is the comparison of of two numbers and indicates how many times one number is contained or contains another number. For example, a ration of boys to girls is 5 to 1 it means that there are five boys for every one girl.

By  extension odds ratio is the comparison of two different odds. For example, if the odds of Team A making the playoffs is 45% and the odds of Team B making the playoffs is 35% the odds ratio is calculated as follows.

0.45 ⁄ 0.35 = 1.28

Team A is 1.28 more likely to make the playoffs then Team B.

The value of the odds and the odds ratio can sometimes be the same.  Below is the odds ratio of the Atlanta Falcons winning and the New Patriots winning Super Bowl LI

0.997⁄ 0.003 = 332

As such there is little difference between odds and odds ratio except that odds ratio is the ratio of two odds ratio. As you can tell, there is a lot of confusion about this for the average person. However, understanding these terms is critical to the application of logistic regression.

Assessing Writing from a Discourse Perspective

Often, when teachers provide feedback on a student’s writing, they tend to focus on the grammatical/punctuation aspects of the paper. However, this often does not make a lasting impression and it also can frequently cause students to freeze up when the need to write as they become obsess with the details of grammar rather than with the shaping of ideas.

Another approach to providing feedback to students is to analyze and assess their writing from the perspective of discourse. Discourse rules have to do with the overall structure of a paper. It is the big picture aspects of writing. Clear discourse can often help to overcome poor grammar/punctuation but excellent grammar/punctuation can overcome a poorly structured paper. This post will provide some of the components of discourse as they relate to writing a paper.

The Organizational Level

At the highest broadest level is the organizational level. At this level, you are looking to be sure that the students have  included an introduction, body, and conclusion to their paper. This seems elementary but it is common for students to forget to include an introduction and or a conclusion to their writing.

You also want to check that the introduction, body, and conclusion are in proportion to each  other based on how long the paper was intended to be. Often, students write short intros, have a long body section, and have little conclusion as they are exhausted from the writing.

At this point thorough reading is not taking place but rather you are glancing to see if all the parts are there.  You also are searching to see if the ideas in the introduction, are present in the body, and reiterated in the conclusion. Students frequently wander when writing as they do not plan what to say but rather what and see what google provides them.

The Section Level

At the section level, you are looking to make sure that the various parts that belong within the introduction, body, and conclusion are present.  For the introduction, if it is a standard research, paper some of the things to look for includes the following

  • background to the problem
  • problem statement
  • objectives
  • significance statement

For the body section, things to look for includes

  • Discussion of first objective
  • Discussion of second objective
  • etc

For the conclusion, it is more fluid in how this can be done but you can look for the following

  • Summary of the introduction
  • Main point of each objective
  • Concluding remark(s)

First, you are checking that these components are there. Second you are checking for the clarity. Normally, if the problem and objectives are unclear the entire paper is doomed to incomprehensibility.

However, bad grammar is not a reason that problems and objectives are unclear. Instead it may be the problem is too broad, cannot be dealt with in the space provide, etc. Objectives normally have the same problem but can also be unrelated to the problem as well.

Sometimes the problem and objectives are to narrowly defined in terms of the expertise of the student. As such, it is highly subjective in terms of what works but the comments given to the student need to be substantive and not just something vague as “look at this a second time.”

If you cannot give substantive feedback it is normally better to ignore whatever weakness you found until you can articulate it clearly. If this is not possible it’s better to remain silent.

The body section must address all objectives mentioned in the introduction. Otherwise, the reader will become confuse as promises made in the introduction were never fulfilled in the body.

The conclusion is more art than science. However, there should be a emphasis on what has been covered as well as what does this mean for the reader.

The Paragraph Level

At the paragraph level, you are looking for two things in every paragraph

  • main idea
  • supporting details

Every paragraph should have one main idea, which summarizes the point of the paragraph. The main idea is always singular. If there are more than one main idea then the student should develop a second paragraph for the second main idea.

In addition, the supporting details in the paragraph should be on topic with the main idea. Often, students will have inconsistencies between the main idea and the supporting details. This can be solved by doing one of the following

  • Change the main idea to be consistent with the supporting details
  • Change the supporting details to be consistent with the main idea

At the paragraph level, you are also assessing that the individual paragraphs are supporting the objective of the section. This again has to do with focusing on a singular thought in a particular section and within each paragraph. Students love to wander when writing as stated previously. Writing is about breaking down a problem into smaller and smaller pieces through explanation.


The assessment of the discourse of a paper should come before the grammatical marking of it. When ideas flow, the grammatical issues are harder to notice often. It is the shaping of discourse that engages the thinking and improves the writing of a student in ways that grammatical comments can never achieve.

Best Subset Regression in R

In this post, we will take a look at best subset regression. Best subset regression fits a model for all possible feature or variable combinations and the decision for the most appropriate model is made by the analyst based on judgment or some statistical criteria.

Best subset regression is an alternative to both Forward and Backward stepwise regression. Forward stepwise selection adds one variable at a time based on the lowest residual sum of squares until no more variables continues to lower the residual sum of squares. Backward stepwise regression starts with all variables in the model and removes variables one at a time. The concern with stepwise methods is they can produce biased regression coefficients, conflicting models, and inaccurate confidence intervals.

Best subset regression bypasses these weaknesses of stepwise models by creating all models possible and then allowing you to assess which variables should be including in your final model. The one drawback to best subset is that a large number of variables means a large number of potential models, which can make it difficult to make a decision among several choices.

In this post, we will use the “Fair” dataset from the “Ecdat” package to predict marital satisfaction based on age, Sex, the presence of children, years married, religiosity, education, occupation, and number of affairs in the past year. Below is some initial code.


We begin our analysis by building the initial model with all variables in it. Below is the code

## Call:
## lm(formula = rate ~ ., data = Fair)
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2049 -0.6661  0.2298  0.7705  2.2292 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.522875   0.358793   9.819  < 2e-16 ***
## sexmale     -0.062281   0.099952  -0.623  0.53346    
## age         -0.009683   0.007548  -1.283  0.20005    
## ym          -0.019978   0.013887  -1.439  0.15079    
## childyes    -0.206976   0.116227  -1.781  0.07546 .  
## religious    0.042142   0.037705   1.118  0.26416    
## education    0.068874   0.021153   3.256  0.00119 ** 
## occupation  -0.015606   0.029602  -0.527  0.59825    
## nbaffairs   -0.078812   0.013286  -5.932 5.09e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 1.03 on 592 degrees of freedom
## Multiple R-squared:  0.1405, Adjusted R-squared:  0.1289 
## F-statistic:  12.1 on 8 and 592 DF,  p-value: 4.487e-16

The initial results are already interesting even though the r-square is low. When couples have children the have less martial satisfaction than couples without children when controlling for the other factors and this is the strongest regression weight. In addition, the more education a person has there is an increase in marital satisfaction. Lastly, as the number of affairs increases there is also a decrease in martial satisfaction. Keep in mind that the “rate” variable goes from 1 to 5 with one meaning a terrible marriage to five being a great one. The mean marital satisfaction was 3.52 when controlling for the other variables.

We will now create our subset models. Below is the code.


In the code above we create the sub models using the “regsubsets” function from the “leaps” package and saved it in the variable called “sub.fit”. We then saved the summary of “sub.fit” in the variable “best.summary”. We will use the “best.summary” “sub.fit variables several times to determine which model to use.

There are many different ways to assess the model. We will use the following statistical methods that come with the results from the “regsubset” function.

  • Mallow’ Cp
  • Bayesian Information Criteria

We will make two charts for each of the criteria above. The plot to the left will explain how many features to include in the model. The plot to the right will tell you which variables to include. It is important to note that for both of these methods, the lower the score the better the model. Below is the code for Mallow’s Cp.

plot(sub.fit,scale = "Cp")


The plot on the left suggests that a four feature model is the most appropriate. However, this chart does not tell me which four features. The chart on the right is read in reverse order. The high numbers are at the bottom and the low numbers are at the top when looking at the y-axis. Knowing this, we can conclude that the most appropriate variables to include in the model are age, children presence, education, and number of affairs. Below are the results using the Bayesian Information Criterion

plot(sub.fit,scale = "bic")


These results indicate that a three feature model is appropriate. The variables or features are years married, education, and number of affairs. Presence of children was not considered beneficial. Since our original model and Mallow’s Cp indicated that presence of children was significant we will include it for now.

Below is the code for the model based on the subset regression.

## Call:
## lm(formula = rate ~ age + child + education + nbaffairs, data = Fair)
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2172 -0.7256  0.1675  0.7856  2.2713 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.861154   0.307280  12.566  < 2e-16 ***
## age         -0.017440   0.005057  -3.449 0.000603 ***
## childyes    -0.261398   0.103155  -2.534 0.011531 *  
## education    0.058637   0.017697   3.313 0.000978 ***
## nbaffairs   -0.084973   0.012830  -6.623 7.87e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Residual standard error: 1.029 on 596 degrees of freedom
## Multiple R-squared:  0.1352, Adjusted R-squared:  0.1294 
## F-statistic: 23.29 on 4 and 596 DF,  p-value: < 2.2e-16

The results look ok. The older a person is the less satisfied they are with their marriage. If children are present the marriage is less satisfying. The more educated the more satisfied they are. Lastly, the higher the number of affairs indicate less marital satisfaction. However, before we get excited we need to check for collinearity and homoscedasticity. Below is the code

##       age     child education nbaffairs 
##  1.249430  1.228733  1.023722  1.014338

No issues with collinearity.For vif values above 5 or 10 indicate a problem. Let’s check for homoscedasticity



The normal qqplot and residuals vs leverage plot can be used for locating outliers. The residual vs fitted and the scale-location plot do not look good as there appears to be a pattern in the dispersion which indicates homoscedasticity. To confirm this we will use Breusch-Pagan test from the “lmtest” package. Below is the code

##  studentized Breusch-Pagan test
## data:  fit2
## BP = 16.238, df = 4, p-value = 0.002716

There you have it. Our model violates the assumption of homoscedasticity. However, this model was developed for demonstration purpose to provide an example of subset regression.

Writing Techniques for the ESL Classroom

In-class writing is common in many many ESL context. This post will provide several different ways that teachers can get their students writing in an ESL classroom.


Perhaps the simplest way to get ESL students writing is to have them imitate what is read to them. This allows the students to learn the conventions of writing in the target language.

This is usually done through some form of dictation. The teacher reads a few words or reads slowly. This provides students with time to write down what they heard.

The actually marking of such an activity would involve the use of rubrics or some sort of count system for the number of words the student was able to write down. Often, spelling and pronunciation are not  considered major factors in the grade because of the rush nature of the writing.

Controlled and Guided

Controlled writing involves having students modify an existing writing sample. For example, changing all the verb in a paragraph from past to present. This will require them to often change more than just the verbs but other aspects of writing as well

Guided writing involves having the students respond to some sort of question or stimuli. For example, the students may watch a video and then are asked to write about and or answer questions. They may also be try to rewrite something that they heard at normal speed.


The most common form of self-writing is the writing of a journal. The writing is only intended for the student. Even note-taking is considered a form of self-writing even though it is not normally comprehensible to others.

Self-writing, particularly journals, can be useful in developing reflective thinking in students in general even with the language barriers of writing in another language.

Display  and Real Writing

Display writing is writing that is primarily intended for the teacher, who already knows the answer that the student is addressing. Examples of this type of writing include essays and other writing for the purpose of a summative assessment. The student is literally displaying what they already know.

Real writing is writing in which  the reader does not know the answer to that the student is addressing. As such, one of the main differences between display and real writing is the knowledge that the audience of the writing has.


When working with students it is important to provide them with learning experiences that stimulate the growth and development that they need. Understanding the various forms of writing that can happen in an ESL classroom can provide teachers with ideas on how to help their students.

Data Wrangling in R

Collecting and preparing data for analysis is the primary job of a data scientist. This experience is called data wrangling. In this post, we will look at an example of data wrangling using a simple artificial data set. You can create the table below in r or excel. If you created it in excel just save it as a csv and load it into r. Below is the initial code

apple <- read_csv("~/Desktop/apple.csv")
## # A tibble: 10 × 2
##        weight      location
##         <chr>         <chr>
## 1         3.2        Europe
## 2       4.2kg       europee
## 3      1.3 kg          U.S.
## 4  7200 grams           USA
## 5          42 United States
## 6         2.3       europee
## 7       2.1kg        Europe
## 8       3.1kg           USA
## 9  2700 grams          U.S.
## 10         24 United States

This a small dataset with the columns of “weight” and “location”. Here are some of the problems

  • Weights are in different units
  • Weights are written in different ways
  • Location is not consistent

In order to have any success with data wrangling you need to state specifically what it is you want to do. Here are our goals for this project

  • Convert the “Weight variable” to a numerical variable instead of character
  • Remove the text and have only numbers in the “weight variable”
  • Change weights in grams to kilograms
  • Convert the “location” variable to a factor variable instead of character
  • Have consistent spelling for Europe and United States in the “location” variable

We will begin with the “weight” variable. We want to convert it to a numerical variable and remove any non-numerical text. Below is the code for this

corrected.weight<-as.numeric(gsub(pattern = "[[:alpha:]]","",apple$weight))
##  [1]    3.2    4.2    1.3 7200.0   42.0    2.3    2.1    3.1 2700.0   24.0

Here is what we did.

  1. We created a variable called “corrected.weight”
  2. We use the function “as.numeric” this makes whatever results inside it to be a numerical variable
  3. Inside “as.numeric” we used the “gsub” function which allows us to substitute one value for another.
  4. Inside “gsub” we used the argument pattern and set it to “[[alpha:]]” and “” this told r to look for any lower or uppercase letters and replace with nothing or remove it. This all pertains to the “weight” variable in the apple dataframe.

We now need to convert the weights in grams to kilograms so that everything is the same unit. Below is the code

gram.error<-grep(pattern = "[[:digit:]]{4}",apple$weight)
##  [1]  3.2  4.2  1.3  7.2 42.0  2.3  2.1  3.1  2.7 24.0

Here is what we did

  1. We created a variable called “gram.error”
  2. We used the grep function to search are the “weight” variable in the apple data frame for input that is a digit and is 4 digits in length this is what the “[[:digit:]]{4}” argument means. We do not change any values yet we just store them in “gram.error”
  3. Once this information is stored in “gram.error” we use it as a subset for the “corrected.weight” variable.
  4. We tell r to save into the “corrected.weight” variable any value that is changeable according to the criteria set in “gram.error” and to divided it by 1000. Dividing it by 1000 converts the value from grams to kilograms.

We have completed the transformation of the “weight” and will move to dealing with the problems with the “location” variable in the “apple” dataframe. To do this we will first deal with the issues related to the values that relate to Europe and then we will deal with values related to United States. Below is the code.

europe<-agrep(pattern = "europe",apple$location,ignore.case = T,max.distance = list(insertion=c(1),deletions=c(2)))
america<-agrep(pattern = "us",apple$location,ignore.case = T,max.distance = list(insertion=c(0),deletions=c(2),substitutions=0))
corrected.location<-gsub(pattern = "United States","US",corrected.location)
##  [1] "europe" "europe" "US"     "US"     "US"     "europe" "europe"
##  [8] "US"     "US"     "US"

The code is a little complicated to explain but in short We used the “agrep” function to tell r to search the “location” to look for values similar to our term “europe”. The other arguments provide some exceptions that r should change because the exceptions are close to the term europe. This process is repeated for the term “us”. We then store are the location variable from the “apple” dataframe in a new variable called “corrected.location” We then apply the two objects we made called “europe” and “america” to the “corrected.location” variable. Next we have to make some code to deal with “United States” and apply this using the “gsub” function.

We are almost done, now we combine are two variables “corrected.weight” and “corrected.location” into a new data.frame. The code is below

##    weight location
## 1     3.2   europe
## 2     4.2   europe
## 3     1.3       US
## 4     7.2       US
## 5    42.0       US
## 6     2.3   europe
## 7     2.1   europe
## 8     3.1       US
## 9     2.7       US
## 10   24.0       US

If you use the “str” function on the “cleaned.apple” dataframe you will see that “location” was automatically converted to a factor.

This looks much better especially if you compare it to the original dataframe that is printed at the top of this post.

Understanding ESL Writing Patterns Across Cultures

When people are learning the English they will almost always bring how they communicate with them when they are speaking or writing in English. However, for native speakers of English the written communication style of ESL students can be bewildering even if it is grammatically sound.

This phenomenon of the L1 influencing the writing style of the L2 is known as contrastive rhetoric. This post will provide examples from different cultures in terms of how they approach writing in English and compare it to how a native-speaking person from a Western country writes to show the differences.

The Native English Speaker Writing Example

Below is a simple paragraph written by a Native English speaking person.

Exercise is good for a person for several reasons. For example, exercises helps to strengthen the body. As a person moves he or she is utilizing their muscles which promotes maintenance and potentially growth of the muscle. Second, exercises helps to remove waste from the body. Strenuous exercise causes people to sweat and  breath deeply and this increases the removal of harmful elements from the body. Lastly, exercise makes people feel good. Exercise encourages the release of various hormones that makes a person feel better.  Therefore, people should exercise in order to enjoy these clear benefits

The writing style of an English speaker is usually highly linear in nature. In the example above, the first sentence is clearly the main idea or the point. Right from the beginning the English writer shares with you where they stand on the subject. There is little mystery or suspense as to what will be talked about.

The  rest of the paragraph are supporting details for the main idea. The supporting details start with the discourse markers of “for example”, “second”, and “lastly”. Everything in the paragraph is laid out in a step-by-step manner that is highly clear as this is important for English speakers.

Unfortunately, this style of writing is what many ESL students from other cultures is compared too. The next examples have perfect “English” however, the style of communication is not in this linear manner.

Eastern Writing Style

According to Robert Kaplan, people from  Eastern countries write in a circular indirect manner. This means that Eastern writing lacks the direct point or main idea of western writing and also lacks the clearly structured supporting details. Below is the same paragraph example as the one in the English example but written in a more Eastern style

As a person moves he or she is utilizing their muscles which promotes maintenance and potentially growth of the muscle. Strenuous exercise causes people to sweat and  breath deeply and this increases the removal of harmful elements from the body. Exercise encourages the release of various hormones that makes a person feel better.

The example is grammatical sound but for an native English speaker there are several problems with the writing

  1. There is no main idea. The entire paragraph is missing a point. The writer is laying down claims about their point but they never actually tell you what the point is. Native speakers want a succinct summary of the point when information is shared with them. Eastern writers prefer an indirect or implied main ideas because being too direct is considered rude. In addition, if you are too clear in an Eastern context it is hard to evade and prevaricate if someone is offended by what is said.
  2. The discourse markers are missing. There are no “for example” or “second” mention. Discourse markers give a paragraph a strong sense of linear direction. The native English speaker can tell where they are in a train of thought when these markers are there. When they are missing the English reader is wondering when is the experience is going to be over.
  3. There are no transition sentences. In the native English speaking example, every discourse marker served as the first word in a transition sentence which move the reader from the first supporting detail to the next supporting detail. The Eastern example has only details without in guidepost from one detail to the other. If a paragraph is really long this can become overwhelming for the Native English speaker.

The example is highly fluent and this kind of writing is common in many English speaking countries that are not found in the West. Even with excellent knowledge of the language the discourse skills affect the ability to communicate.


My student have shared with me that English writing is clear and easy to understand but too direct in nature. Whereas the complaints of teachers is the the ESL students written is unclear and indirect.

This is not a matter of right in wrong but differences in how to communicate when writing. A student who is aware of how the communicate can make adjustments so that whoever they are speaking with can understand them. The goal should not be to change students but to make them aware of their assumptions so they can adjust depending on the situation and to not change them to act a certain way all the time.

Principal Component Analysis in R

This post will demonstrate the use of principal component analysis (PCA). PCA is useful for several reasons. One it allows you place your examples into groups similar to linear discriminant analysis but you do not need to know beforehand what the groups are. Second, PCA is used for the purpose of dimension reduction. For example, if you have 50 variables PCA can allow you to reduce this while retaining a certain threshold of variance. If you are working with a large dataset this can greatly reduce the computational time and general complexity of your models.

Keep in mind that there really is not a dependent variable as this is unsupervised learning. What you are trying to see is how different examples can be mapped in space based on whatever independent variables are used. For our example, we will use the “Carseats” dataset form the “ISLR”. Our goal is to understanding the relationship among the variables when examining the shelve location of the car seat. Below is the initial code to begin the analysis


We first need to rearrange the data and remove the variables we are not going to use in the analysis. Below is the code.


Here is what we did 1. We made a copy of the “Carseats” data called “Careseats1” 2. We rearranged the order of the variables so that the factor variables are at the end. This will make sense later 3.We removed the “Urban” and “US” variables from the table as they will not be a part of our analysis

We will now do the PCA. We need to scale and center our data otherwise the larger numbers will have a much stronger influence on the results than smaller numbers. Fortunately, the “prcomp” function has a “scale” and a “center” argument. We will also use only the first 7 columns for the analysis  as “sheveLoc” is not useful for this analysis. If we hadn’t moved “shelveLoc” to the end of the dataframe it would cause some headache. Below is the code.

Carseats.pca<-prcomp(Carseats1[,1:7],scale. = T,center = T)
## Importance of components:
##                           PC1    PC2    PC3    PC4    PC5     PC6     PC7
## Standard deviation     1.3315 1.1907 1.0743 0.9893 0.9260 0.80506 0.41320
## Proportion of Variance 0.2533 0.2026 0.1649 0.1398 0.1225 0.09259 0.02439
## Cumulative Proportion  0.2533 0.4558 0.6207 0.7605 0.8830 0.97561 1.00000

The summary of “Carseats.pca” Tells us how much of the variance each component explains. Keep in mind that number of components is equal to the number of variables. The “proportion of variance” tells us the contribution each component makes and the “cumulative proportion”.

If your goal is dimension reduction than the number of components to keep depends on the threshold you set. For example, if you need around 90% of the variance you would keep the first 5 components. If you need 95% or more of the variance you would keep the first six. To actually use the components you would take the “Carseats.pca$x” data and move it to your data frame.

Keep in mind that the actual components have no conceptual meaning but is a numerical representation of a combination of several variables that were reduce using PCA to fewer variables such as going form 7 variables to 5 variables.

This means that PCA is great for reducing variables for prediction purpose but is much harder for explanatory studies unless you can explain what the new components represent.

For our purposes, we will keep 5 components. This means that we have reduce our dimensions from 7 to 5 while still keeping almost 90% of the variance. Graphing our results is tricky because we have 5 dimensions but the human mind can only conceptualize 3 at the best and normally 2. As such we will plot the first two components and label them by shelf location using ggplot2. Below is the code



From the plot you can see there is little separation when using the first two components of the PCA analysis. This makes sense as we can only graph to components so we are missing a lot of the variance. However for demonstration purposes the analysis is complete.

Writing as a Process or Product

In writing pedagogy, there are at least two major ways of seeing writing. These two approaches see writing as a process or as a product. This post will explain each along with some of the drawbacks of both.

Writing as a Product

Writing as a product entailed the teacher setting forth standards in terns of rhetoric, vocabulary use, organization, etc. The students were given several different examples that could be used as models form which to base their own paper.

The teacher may be available for one-on-one support but this was not necessarily embedded in the learning experience. In addition, the teacher was probably only going to see the finally draft.

For immature writers, this is an intimidating learning experience. To be  required to develop a paper with only out of context examples from former students is difficult to deal with. In addition, without prior feedback in terms of progress, students have no idea if they are meeting expectations. The teacher is also clueless as to student progress and this means that both students and teachers can be “surprised” by poorly written papers and failing students.

The lack of communication while writing can encourage students to try and overcome their weaknesses through plagiarism. This is especially true for ESL students who lack the mastery of the language while also often having different perspectives on what academic dishonesty is.

Another problem is the ‘A’ students will simply copy the examples the teacher provided and just put in their own topic and words in it. This leads to an excellent yet mechanical paper that does not allow the students to develop as writers. In other words the product approach provide too much support for strong students and not enough support for weak ones.

Writing as a Process

In writing as a process, the teacher supports the student through several revisions of a paper. The teacher provides support for the develop of ideas, organization, coherency, and other aspects of writing. All this is done through the teacher providing feedback to the student was well as dealing with any questions and or concerns the student may have with their paper.

This style of writing teaching helps students to understand what kind of writer they are. Students are often so focused on completing writing assignments that they never learn  what their tendencies and habits as a writer our. Understanding their own strengths and weaknesses can help them to develop compensatory strategies to complete assignments. This can of self-discovery can happen through one-on-one conferences with the teacher.

Off course, such personal attention takes a great deal of time. However, even brief 5 minutes conferences with students can reap huge rewards in their writing. It also saves time at the end when marking because you as the teacher are already familiar with what the students are writing about and the check of the final papers is just to see if the students have revised their paper according to the advice you gave.

The process perspective give each student individual attention to grow as individual. ‘A’ students get what they need as well as weaker students. Everyone is compared to their own progress as a writer.


Generally, the process approach is more appropriate for teaching writing. The exceptions being that the students are unusually competent or they are already familiar with your expectations from prior writing experiences.

Discourse Markers and ESL

Discourse markers are used in writing to help organize ideas. They are often those “little words” that native speakers use effortlessly as they communicate but are misunderstood by ESL speakers. This post will provide examples of various discourse markers.

Logical Sequence

Logical sequence discourse markers are used to place ideas in an order that is comprehensible to the listener/reader. They can be summative for concluding a longer section or resultative which is used to indicate the effect of something.

Examples of summative discourse markers includes

  • overall, to summarize, therefore, so far

An example of summarize discourse markers is below. The bold word is the marker.

Smoking causes cancer. Studies show that people who smoke have higher rates of lung, esophagus, and larynx. Therefore, it is dangerous to smoke.

The paragraph is clear. The marker “Therefore” is summarizing what was said in the prior two sentences.

Examples of resultative discourse markers includes the following

  • so, consequently, therefore, as a result

An example of resultative discourse markers is below. The bold word is the marker.

Bob smoked cigarettes for 20 years. As a result,he developed lung cancer

Again, the second sentence with the marker “As a result” explain the consequence of smoking for 20 years.


Constrastive markers are words that  indicate that the next idea is the opposite of the previous idea. There are three ways that this can be done. Replacive share an alternative idea, antithetic markers share ideas in opposition to the previous one. Lastly, concessive markers share unexpected information given the context.

Below are several words and or phrases that are replacive markers

  • alternatively, on  the other hand, rather

Below is an example of a replacive contrast marker used in a short paragraph. Bold word is the replacive

Smoking is a deadly lifestyle choice. This bad habit has killed millions of people. On the other hand, a vegetarian lifestyle has been found to be beneficial to the health of many people

Antithetic markers include the following

  • conversely, instead, by contrast

Below is an example of antithetic marker used in a paragraph

A long and healthy life is unusually for those who choose to smoke. Instead, people who smoke live lives that are shorter and more full of disease and sickness.

Concsessive markers includes some of the words below

  • In spite of, nevertheless, anyway, anyhow

Below is an example of a concessive marker used in a paragraph

Bob smoked for 20 years. In spite of this, he was an elite athlete and had perfect health.


Discourse markers play a critical role in communicating the  finer points of ideas hat are used in communication. Understanding how these words are used can help ESL students in comprehending what they hear and read.

Developing a Data Analysis Plan

It is extremely common for beginners and perhaps even experience researchers to lose track of what they are trying to achieve or do when trying to complete a research project. The open nature of research allows for a multitude of equally acceptable ways to complete a project. This leads to  an inability to make decision and or stay on course when doing research.

One way to reduce and eliminate the roadblock to decision making and focus in research is to develop a plan. In this post we will look at one version of a data analysis plan.

Data Analysis Plan

A data analysis plan includes many features of a research project in it with a particular emphasis on mapping out how research questions will be answered and what is necessary to answer the question. Below is a sample template of the analysis plan.


The majority of this diagram should be familiar to someone who has ever done research. At the top, you state the problem, this is the overall focus of the paper. Next comes the purpose, the purpose is the over-arching goal of a research project.

After purpose comes the research questions. The research questions are questions about the problem that are answerable. People struggle with developing clear and answerable research questions. It is critical that research questions are written in a way that they can be answered and that the questions are clearly derived from the problem. Poor questions means poor or even no answers.

After the research questions it is important to know what variables are available for the entire study and specifically what variables can be used to answer each research question. Lastly, you must indicate what analysis or visual you will develop in order to answer your research questions about your problem. This requires you to know how you will answer your research questions


Below is an example of a completed analysis plan for  simple undergraduate level research paper


In the example above, the  student want to understand the perceptions of university students about the cafeteria food quality and their satisfaction with the university. There were four research questions, a demographic descriptive question, a descriptive question about the two main variables, a comparison question, and lastly a relationship question.

The variables available for answering the questions are listed of to the left  side. Under that, the student indicates the variables needed to answer each question. For example, the demographic variables of sex, class level, and major are needed to answer the question about the demographic profile.

The last section is the analysis. For the demographic profile the student found the percentage of the population in each sub group of the demographic variables.


A data analysis plan provides an excellent way to determine what needs to be done to complete a study. It also helps a researcher to clearly understand what they are trying to do and provides a visuals for those who the research wants to communicate  with about the progress of a study.

Developing Purpose to Improve Reading Comprehension

Many of us are familiar with the experience of being able to read almost anything but perhaps not being able to understand what it is that we read. As the ability to sound out words becomes automatic there is not always a corresponding increase in being able to comprehend text.

It is common, especially in school, for students to be required to read something without much explanation. For more mature readers, what is often needed is a sense of purpose for reading. In this post, we will look at ways to develop a sense of purpose in reading.

Purpose Provides Motivation

Students who know why they are reading know what the are looking for while reading. The natural result of this is that students are less likely to get distract by information that is not useful for them.

For example, if the teacher tells their students to read “the passage and identifying all of the animals in it and be ready to share tomorrow.” Students know what they are suppose to do (identifying all animals in the passage) and why they need to do it (share tomorrow). the clear directions prevent students from getting distracted by other information in the reading.

Providing purpose doesn’t necessarily require the students love and enjoy the rational but it is helpful if a teacher can provide a purpose that is motivating.

Different Ways to Instill Purpose

In addition to the example above there are several quick ways to provide purpose.

  • Provide vocabulary list-Having the students search for the meaning of specific words provides a clear sense of purpose and provides a context in which the words appear naturally. However, students often get bogged down with the minutia of the definitions and completely miss the overall meaning of the reading passage. This approach is great for beginning and low intermediate readers.
  • Identifying the main ideas in the reading-This is a great way to gets students to see the “big picture” of a reading. It is especially useful for short to moderately long readings such as articles and perhaps chapters and useful for intermediate to advanced readers in particular.
  •  Let students develop their own questions about the text-By fair my most favorite strategy. Students will initial skim the passage to get an idea of what it is about. After this, they develop several questions about the passage that they want to find the answer too. While reading the passage, the students answer their own questions. This approach provides opportunities for metacognition as well developing autonomous learning skills. This strategy is for advanced readers who are comfortable with vocabulary and summarizing text.


Students, like most people,  need a raison de faire (reason to do) something. The teacher can provide this, which has benefits. Another approach would be to allow the students to develop their own purpose. How this is done depends on the philosophy of the teacher as well as the abilities and tendencies of the students

Linear Discriminant Analysis in R

In this post we will look at an example of linear discriminant analysis (LDA). LDA is used to develop a statistical model that classifies examples in a dataset. In the example in this post, we will use the “Star” dataset from the “Ecdat” package. What we will do is try to predict the type of class the students learned in (regular, small, regular with aide) using their math scores, reading scores, and the teaching experience of the teacher. Below is the initial code


We first need to examine the data by using the “str” function

## 'data.frame':    5748 obs. of  8 variables:
##  $ tmathssk: int  473 536 463 559 489 454 423 500 439 528 ...
##  $ treadssk: int  447 450 439 448 447 431 395 451 478 455 ...
##  $ classk  : Factor w/ 3 levels "regular","small.class",..: 2 2 3 1 2 1 3 1 2 2 ...
##  $ totexpk : int  7 21 0 16 5 8 17 3 11 10 ...
##  $ sex     : Factor w/ 2 levels "girl","boy": 1 1 2 2 2 2 1 1 1 1 ...
##  $ freelunk: Factor w/ 2 levels "no","yes": 1 1 2 1 2 2 2 1 1 1 ...
##  $ race    : Factor w/ 3 levels "white","black",..: 1 2 2 1 1 1 2 1 2 1 ...
##  $ schidkn : int  63 20 19 69 79 5 16 56 11 66 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:5850] 1 4 6 7 8 9 10 15 16 17 ...
##   .. ..- attr(*, "names")= chr [1:5850] "1" "4" "6" "7" ...

We will use the following variables

  • dependent variable = classk (class type)
  • independent variable = tmathssk (Math score)
  • independent variable = treadssk (Reading score)
  • independent variable = totexpk (Teaching experience)

We now need to examine the data visually by looking at histograms for our independent variables and a table for our dependent variable







##           regular       small.class regular.with.aide 
##         0.3479471         0.3014962         0.3505567

The data mostly looks good. The results of the “prop.table” function will help us when we develop are training and testing datasets. The only problem is with the “totexpk” variable. IT is not anywhere near to be normally distributed. TO deal with this we will use the square root for teaching experience. Below is the code



Much better. We now need to check the correlation among the variables as well and we will use the code below.

##                        star.sqrt.tmathssk star.sqrt.treadssk
## star.sqrt.tmathssk             1.00000000          0.7135489
## star.sqrt.treadssk             0.71354889          1.0000000
## star.sqrt.totexpk.sqrt         0.08647957          0.1045353
##                        star.sqrt.totexpk.sqrt
## star.sqrt.tmathssk                 0.08647957
## star.sqrt.treadssk                 0.10453533
## star.sqrt.totexpk.sqrt             1.00000000

None of the correlations are too bad. We can now develop our model using linear discriminant analysis. First, we need to scale are scores because the test scores and the teaching experience are measured differently. Then, we need to divide our data into a train and test set as this will allow us to determine the accuracy of the model. Below is the code.


Now we develop our model. In the code before the “prior” argument indicates what we expect the probabilities to be. In our data the distribution of the the three class types is about the same which means that the apriori probability is 1/3 for each class type.

train.lda<-lda(classk~tmathssk+treadssk+totexpk.sqrt, data = 
## Call:
## lda(classk ~ tmathssk + treadssk + totexpk.sqrt, data = train.star, 
##     prior = c(1, 1, 1)/3)
## Prior probabilities of groups:
##           regular       small.class regular.with.aide 
##         0.3333333         0.3333333         0.3333333 
## Group means:
##                      tmathssk    treadssk totexpk.sqrt
## regular           -0.04237438 -0.05258944  -0.05082862
## small.class        0.13465218  0.11021666  -0.02100859
## regular.with.aide -0.05129083 -0.01665593   0.09068835
## Coefficients of linear discriminants:
##                      LD1         LD2
## tmathssk      0.89656393 -0.04972956
## treadssk      0.04337953  0.56721196
## totexpk.sqrt -0.49061950  0.80051026
## Proportion of trace:
##    LD1    LD2 
## 0.7261 0.2739

The printout is mostly readable. At the top is the actual code used to develop the model followed by the probabilities of each group. The next section shares the means of the groups. The coefficients of linear discriminants are the values used to classify each example. The coefficients are similar to regression coefficients. The computer places each example in both equations and probabilities are calculated. Whichever class has the highest probability is the winner. In addition, the higher the coefficient the more weight it has. For example, “tmathssk” is the most influential on LD1 with a coefficient of 0.89.

The proportion of trace is similar to principal component analysis

Now we will take the trained model and see how it does with the test set. We create a new model called “predict.lda” and use are “train.lda” model and the test data called “test.star”

predict.lda<-predict(train.lda,newdata = test.star)

We can use the “table” function to see how well are model has done. We can do this because we actually know what class our data is beforehand because we divided the dataset. What we need to do is compare this to what our model predicted. Therefore, we compare the “classk” variable of our “test.star” dataset with the “class” predicted by the “predict.lda” model.

##                     regular small.class regular.with.aide
##   regular               155         182               249
##   small.class           145         198               174
##   regular.with.aide     172         204               269

The results are pretty bad. For example, in the first row called “regular” we have 155 examples that were classified as “regular” and predicted as “regular” by the model. In rhe next column, 182 examples that were classified as “regular” but predicted as “small.class”, etc. To find out how well are model did you add together the examples across the diagonal from left to right and divide by the total number of examples. Below is the code

## [1] 0.3558352

Only 36% accurate, terrible but ok for a demonstration of linear discriminant analysis. Since we only have two-functions or two-dimensions we can plot our model.  Below I provide a visual of the first 50 examples classified by the predict.lda model.



The first function, which is the vertical line, doesn’t seem to discriminant anything as it off to the side and not separating any of the data. However, the second function, which is the horizontal one, does a good of dividing the “regular.with.aide” from the “small.class”. Yet, there are problems with distinguishing the class “regular” from either of the other two groups.  In order improve our model we need additional independent variables to help to distinguish the groups in the dependent variable.

Factors that Affect Pronunciation

Understanding and teaching pronunciation has been controversial in TESOL for many years. At one time, pronunciation was taught in a high bottom-up behavioristic manner. Students were drilled until they had the appropriate “accent” (American, British, Australian, etc.). To be understood meant capturing one of the established accents.

Now there is more of an emphasis on top-down features such as stress, tone, and rhythm. There is now an emphasis on being more non-directive and focus not on the sounds being generate by the student but the comprehensibility of what they say.

This post will explain several common factors that influence pronunciation. This common factors include

  • Motivation & Attitude
  • Age & Exposure
  • Native language
  • Natural ability

Motivation & Language Ego

For many people, it’s hard to get something done when they don’t care. Excellent pronunciation is often affected by motivation. If the student does not care they will probably not improve much. This is particularly true when the student reaches a level where people can understand them. Once they are comprehensible many students loss interests in further pronunciation development

Fortunately, a teacher can use various strategies to motivate students to focus on improving their pronunciation. Creating relevance is one way in which students intrinsic motivation can be developed.

Attitude is closely related to motivation. If the students have negative views of the target language and are worried that learning the target language is a cultural threat this will make language acquisition difficult. Students need to understand that language learning does involve learning of the culture of the target language.

Age & Exposure

Younger students, especially 1-12 years of age, have the best chance at developing native-like pronunciation. If the student is older they will almost always retain an “accent.” However, fluency and accuracy can achieve the same levels regards of the initial age at which language study began.

Exposure is closely related to age. The more authentic experiences that a student has with the language the better their pronunciation normally is. The quality of the exposure is the the naturalness of the setting and the actual engagement of the student in hearing and interacting with the language.

For example, an ESL student who lives in America will probably have much more exposure to the actual use of English than someone in China. This in turn will impact their pronunciation.

Native Language

The similarities between the mother tongue and the  target language can influence pronunciation. For example, it is much easier to move from Spanish to English pronunciation than from Chinese to English.

For the teacher, understanding the sound system’s of your students’ languages can help a great deal in helping them with difficulties in pronunciation.

Innate Ability

Lastly, some just get it while others don’t. Different students have varying ability to pick up the sounds of another language. A way around this is helping students to know their own strengths and weaknesses. This will allow them to develop strategies to improve.


Whatever your position on pronunciation. There are ways to improve your students pronunciation if you are familiar with what influences it. The examples in this post provided some basic insight into what affects this.

Tips for Developing Techniques for ESL Students

Technique development is the actual practice of TESOL. All of the ideas expressed in approaches and methods are just ideas. The development of a technique is the application of knowledge in a way that benefits the students. This post would provide ideas and guidelines on developing speaking and listening techniques.

Techniques should Encourage Intrinsic Motivation

When developing techniques for your students. The techniques need consider the goals, abilities, and interest of the students whenever possible. If the students are older adults who want to develop conversational skills heavy stress on reading would be demotivating. This is  because reading was not on of the students goals.

When techniques do not align with student goals there is a lost of relevance, which is highly demotivating. Of course, as the teacher, you do not always give them what they want but general practice suggest some sort of dialog over the direction of the techniques.

Techniques should be Authentic

The point here is closely related to the first one on motivation. Techniques should generally be as authentic as possible. If you have a choice between real text and textbook it is usually better to go with real world text.

Realistic techniques provide a context in which students can apply their skills in a setting that is similar to the wold but within the safety of a classroom.

Techniques should Develop Skills through Integration and Isolation

When developing techniques there should be a blend of techniques that develop skill in an integrated manner, such as listening and speaking and or some other combination. There should also be ab equal focus on techniques that develop on one skill such as writing.

The reason for this is so that the students develop balanced skills. Skill-integrated techniques are highly realistic but students can use one skill to compensate for weaknesses in others. For example, a talker just keeps on talking without ever really listening.

When skills our work on in isolation it allows for deficiencies to be clearly identified and work on. Doing this will only help the students in integrated situations.

Encourage Strategy Development

Through techniques students need to develop their abilities to learn on their own autonomously. This can be done through having students practice learning strategies you have shown them in the past. Examples include context clues, finding main ideas, identifying  facts from opinions etc

The development of skills takes a well planned approach to how you will teach and provide students with the support to succeed.


Understanding some of the criteria that can be used in creating techniques for the ESL classroom is beneficial for teachers. The ideas presented here provide some basic guidance for enabling technique development.

Generalized Additive Models in R

In this post, we will learn how to create a generalized additive model (GAM). GAMs are non-parametric generalized linear models. This means that linear predictor of the model uses smooth functions on the predictor variables. As such, you do not need to specific the functional relationship between the response and continuous variables. This allows you to explore the data for potential relationships that can be more rigorously tested with other statistical models

In our example, we will use the “Auto” dataset from the “ISLR” package and use the variables “mpg”,“displacement”,“horsepower”,and “weight” to predict “acceleration”. We will also use the “mgcv” package. Below is some initial code to begin the analysis


We will now make the model we want to understand the response of “accleration” to the explanatory variables of “mpg”,“displacement”,“horsepower”,and “weight”. After setting the model we will examine the summary. Below is the code

## Family: gaussian 
## Link function: identity 
## Formula:
## acceleration ~ s(mpg) + s(displacement) + s(horsepower) + s(weight)
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.54133    0.07205   215.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Approximate significance of smooth terms:
##                   edf Ref.df      F  p-value    
## s(mpg)          6.382  7.515  3.479  0.00101 ** 
## s(displacement) 1.000  1.000 36.055 4.35e-09 ***
## s(horsepower)   4.883  6.006 70.187  < 2e-16 ***
## s(weight)       3.785  4.800 41.135  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-sq.(adj) =  0.733   Deviance explained = 74.4%
## GCV = 2.1276  Scale est. = 2.0351    n = 392

All of the explanatory variables are significant and the adjust r-squared is .73 which is excellent. edf stands for “effective degrees of freedom”. This modified version of the degree of freedoms is due to the smoothing process in the model. GCV stands for generalized cross validation and this number is useful when comparing models. The model with the lowest number is the better model.

We can also examine the model visually by using the “plot” function. This will allow us to examine if the curvature fitted by the smoothing process was useful or not for each variable. Below is the code.



We can also look at a 3d graph that includes the linear predictor as well as the two strongest predictors. This is done with the “vis.gam” function. Below is the code



If multiple models are developed. You can compare the GCV values to determine which model is the best. In addition, another way to compare models is with the “AIC” function. In the code below, we will create an additional model that includes “year” compare the GCV scores and calculate the AIC. Below is the code.

## Family: gaussian 
## Link function: identity 
## Formula:
## acceleration ~ s(mpg) + s(displacement) + s(horsepower) + s(weight) + 
##     s(year)
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.54133    0.07203   215.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Approximate significance of smooth terms:
##                   edf Ref.df      F p-value    
## s(mpg)          5.578  6.726  2.749  0.0106 *  
## s(displacement) 2.251  2.870 13.757 3.5e-08 ***
## s(horsepower)   4.936  6.054 66.476 < 2e-16 ***
## s(weight)       3.444  4.397 34.441 < 2e-16 ***
## s(year)         1.682  2.096  0.543  0.6064    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## R-sq.(adj) =  0.733   Deviance explained = 74.5%
## GCV = 2.1368  Scale est. = 2.0338    n = 392
#model1 GCV
##   GCV.Cp 
## 2.127589
#model2 GCV
##   GCV.Cp 
## 2.136797

As you can see, the second model has a higher GCV score when compared to the first model. This indicates that the first model is a better choice. This makes sense because in the second model the variable “year” is not significant. To confirm this we will calculate the AIC scores using the AIC function.

##              df      AIC
## model1 18.04952 1409.640
## model2 19.89068 1411.156

Again, you can see that model1 s better due to its fewer degrees of freedom and slightly lower AIC score.


Using GAMs is most common for exploring potential relationships in your data. This is stated because they are difficult to interpret and to try and summarize. Therefore, it is normally better to develop a generalized linear model over a GAM due to the difficulty in understanding what the data is trying to tell you when using GAMs.

Listening Techniques for the ESL Classroom

Listening is one of the four core skills of language acquisition along with reading, writing, and speaking. This post will explain several broad categories of listening that can happen within the ESL classroom.

Reactionary Listening

Reactionary listening involves having the students listen to an utterance and repeat back to you as the teacher. The student is not generating any meaning. This can be useful perhaps for developing pronunciation in terms of speaking.

Common techniques that utilize reactionary listening are drills and choral speaking. Both of these techniques are commonly associated with audiolingualism.

Responsive Listening

Responsive listening  requires the student to create a reply to something that they heard. Not only does the student have to understand what was said but they must also be able to generate a meaningful reply. The response can be verbal such as answering a question and or non-verbal such as obeying a command.

Common techniques that are responsive in nature includes anything that involves asking questions and or obeying commands. As such, almost all methods and approaches have some aspect of responsive listening in them.

Discriminatory Listening

Discriminatory listening techniques involves listening that is selective. The listener needs to identify what is important from a dialog or monologue. The listener might need to identify the name of a person, the location of something, or develop the main idea of the recording.

Discriminatory listening is probably a universal technique used by almost everyone. It is also popular with English proficiency test such as the IELTS.

Intensive Listening

Intensive listening is focused on breaking down what the student has heard into various aspect of grammar and speaking. Examples include intonation, stress, phonemes, contractions etc.

This is more of an analytical approach to listening. In particular, using intensive listening techniques may be useful to help learners understand the nuances of the language.

Extensive Listening

Extensive listening is about listening to a monologue or dialog and developing an overall summary and comprehension of it.  Examples of this could be having students listening to a clip from a documentary or a newscast.

Again, this is so common in language teaching that almost all styles incorporate this in one way or another.

Interactive Listening

Interactive listening is the mixing of all of the previously mentioned types of listening simultaneously. Examples include role plays, debates, and various other forms of group work.

All of the examples mentioned require repeating what others say (reactionary), replying to to others comments (responsive),  identifying main ideas (discriminatory & extensive), and perhaps some focus on intonation and stress (intensive).  As such, interactive listening is the goal of listening in a second language.

Interactive listening is used by most methods most notable communicative language  teaching, which has had a huge influence on the last 40 years of TESOL.


The listening technique categories provided here gives some insight into how one can organize various listening experiences in the classroom. What combination of techniques to employ depends on many different factors but knowing what’s available empowers the teacher to determine what course of action to take.

Wire Framing with Moodle

Before teaching a Moodle course it is critical that a teacher design what they want to do. For many teachers, they believe that they begin the design process by going to Moodle and adding activity and other resources to their class. For someone who is thoroughly familiar with Moodle and have developed courses before this might work. However, for the majority online teachers they need to wire frame what they want their moodle course to look like online.

Why Wire frame a Moodle Course

In the world of  web developers a wire frame is a prototype of what a potential website will look like. The actual wire frame can be made in many different platforms from Word, powerpoint, and even just paper and pencil. Since Moodle is online a Moodle course in many ways is a website so wire framing applies to this context.

It doesn’t matter how a you wire frames their Moodle course. What matters is that you actually do this. Designing what you want to see in your course helps you to make decisions much faster when you are actually adding activities and resources to your Moodle course. It also helps your Moodle support to help you if they have a picture of what the you wants rather than wild hand gestures and frustration.

Wire farming a course also reduces the cognitive load on the teacher. Instead of designing and building the course a the same time. Wire framing splits this task into two steps, which are designing, and then building. This prevents extreme frustration as it is common for a teacher just to stare at the computer screen when trying to design and develop a Moodle course simultaneously.

You never see and architect making his plans while building the building. This would seem careless and even dangerous because the architect doesn’t even know what he wants while he is throwing around concrete and steel. The same analogy applies with designing Moodle courses. A teacher must know what they want, write it down, and then implement it by creating the course.

Another benefit of planning in Word is that it is easier to change things in Word when compared to Moodle. Moodle is amazing but it is not easy to use for those who are not tech-savvy. However, it’s easiest for most of us to copy, paste, and edit in Word.

One Way to Wire Frame a Moodle Course

When supporting teachers to wire frame a Moodle course, I always encourage them to start by developing the course in Microsoft Word. The reason being that the teacher is already familiar with Word and they do not have to struggle to make decisions when using it. This helps them to focus on content and not on how to use Microsoft Word.

One of the easiest ways to wire frame a Moodle course is to take the default topics of a course such as General Information, Week 1, Week 2, etc. and copy these headings into Word, as shown below.

Screenshot from 2017-01-20 09-15-19.png

Now, all that is needed is to type in using bullets exactly what activities and resources you want in each section. It is also possible to add pictures and other content to the Word document that can be added to Moodle later.  Below is a preview of a generic Moodle sample course with the general info and week 1 of the course completed.

Screenshot from 2017-01-20 09-26-00.png

You can see for yourself how this class is developed. The General Info section has an image to serve as a welcome and includes the name of the course. Under this the course outline and rubrics for the course. The information in the parentheses indicate what type of module it is.

For Week 1, there are several activities. There is a forum for introducing yourself. A page that shares the objectives of that week. Following this are the readings for the week, then a discussion forum, and lastly an assignment. This process completes for however many weeks are topics you have in the course.

Depending on the your need to plan, you can even planned other pages on the site beside the main page. For example, I can wire frame what I want my “Objectives” page to look like or even the discussion topics for my “Discussion” forum.

Of course, the ideas for all these activities comes from the course outline or syllabus that was developed first. In other words, before we even wire frame we have some sort of curriculum document with what the course needs to cover.


The example above is an extremely simple way of utilizing the power of wire framing. With this template, you can confidently go to Moodle and find the different modules to make your class come to life. Trying to conceptualize this in your head is possible but much more difficult. As such, thorough planning is a hallmark of learning.


Generalized Models in R

Generalized linear models are another way to approach linear regression. The advantage of of GLM is that allows the error to follow many different distributions rather than only the normal distribution which is an assumption of traditional linear regression.

Often GLM is used for response or dependent variables that are binary or represent count data. THis post will provide a brief explanation of GLM as well as provide an example.

Key Information

There are three important components to a GLM and they are

  • Error structure
  • Linear predictor
  • Link function

The error structure is the type of distribution you will use in generating the model. There are many different distributions in statistical modeling such as binomial, gaussian, poission, etc. Each distribution comes with certain assumptions that govern their use.

The linear predictor is the sum of the effects of the independent variables. Lastly, the link function determines the relationship between the linear predictor and the mean of the dependent variable. There are many different link functions and the best link function is the one that reduces the residual deviances the most.

In our example, we will try to predict if a house will have air conditioning based on the interactioon between number of bedrooms and bathrooms, number of stories, and the price of the house. To do this, we will use the “Housing” dataset from the “Ecdat” package. Below is some initial code to get started.


The dependent variable “airco” in the “Housing” dataset is binary. This calls for us to use a GLM. To do this we will use the “glm” function in R. Furthermore, in our example, we want to determine if there is an interaction between number of bedrooms and bathrooms. Interaction means that the two independent variables (bathrooms and bedrooms) influence on the dependent variable (aircon) is not additive, which means that the combined effect of the independnet variables is different than if you just added them together. Below is the code for the model followed by a summary of the results

model<-glm(Housing$airco ~ Housing$bedrooms * Housing$bathrms + Housing$stories + Housing$price, family=binomial)
## Call:
## glm(formula = Housing$airco ~ Housing$bedrooms * Housing$bathrms + 
##     Housing$stories + Housing$price, family = binomial)
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7069  -0.7540  -0.5321   0.8073   2.4217  
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                      -6.441e+00  1.391e+00  -4.632 3.63e-06
## Housing$bedrooms                  8.041e-01  4.353e-01   1.847   0.0647
## Housing$bathrms                   1.753e+00  1.040e+00   1.685   0.0919
## Housing$stories                   3.209e-01  1.344e-01   2.388   0.0170
## Housing$price                     4.268e-05  5.567e-06   7.667 1.76e-14
## Housing$bedrooms:Housing$bathrms -6.585e-01  3.031e-01  -2.173   0.0298
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for binomial family taken to be 1)
##     Null deviance: 681.92  on 545  degrees of freedom
## Residual deviance: 549.75  on 540  degrees of freedom
## AIC: 561.75
## Number of Fisher Scoring iterations: 4

To check how good are model is we need to check for overdispersion as well as compared this model to other potential models. Overdispersion is a measure to determine if there is too much variablity in the model. It is calcualted by dividing the residual deviance by the degrees of freedom. Below is the solution for this

## [1] 1.018056

Our answer is 1.01, which is pretty good because the cutoff point is 1, so we are really close.

Now we will make several models and we will compare the results of them

Model 2

#add recroom and garagepl
model2<-glm(Housing$airco ~ Housing$bedrooms * Housing$bathrms + Housing$stories + Housing$price + Housing$recroom + Housing$garagepl, family=binomial)
## Call:
## glm(formula = Housing$airco ~ Housing$bedrooms * Housing$bathrms + 
##     Housing$stories + Housing$price + Housing$recroom + Housing$garagepl, 
##     family = binomial)
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6733  -0.7522  -0.5287   0.8035   2.4239  
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                      -6.369e+00  1.401e+00  -4.545 5.51e-06
## Housing$bedrooms                  7.830e-01  4.391e-01   1.783   0.0745
## Housing$bathrms                   1.702e+00  1.047e+00   1.626   0.1039
## Housing$stories                   3.286e-01  1.378e-01   2.384   0.0171
## Housing$price                     4.204e-05  6.015e-06   6.989 2.77e-12
## Housing$recroomyes                1.229e-01  2.683e-01   0.458   0.6470
## Housing$garagepl                  2.555e-03  1.308e-01   0.020   0.9844
## Housing$bedrooms:Housing$bathrms -6.430e-01  3.054e-01  -2.106   0.0352
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for binomial family taken to be 1)
##     Null deviance: 681.92  on 545  degrees of freedom
## Residual deviance: 549.54  on 538  degrees of freedom
## AIC: 565.54
## Number of Fisher Scoring iterations: 4
#overdispersion calculation
## [1] 1.02145

Model 3

model3<-glm(Housing$airco ~ Housing$bedrooms * Housing$bathrms + Housing$stories + Housing$price + Housing$recroom + Housing$fullbase + Housing$garagepl, family=binomial)
## Call:
## glm(formula = Housing$airco ~ Housing$bedrooms * Housing$bathrms + 
##     Housing$stories + Housing$price + Housing$recroom + Housing$fullbase + 
##     Housing$garagepl, family = binomial)
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6629  -0.7436  -0.5295   0.8056   2.4477  
## Coefficients:
##                                    Estimate Std. Error z value Pr(>|z|)
## (Intercept)                      -6.424e+00  1.409e+00  -4.559 5.14e-06
## Housing$bedrooms                  8.131e-01  4.462e-01   1.822   0.0684
## Housing$bathrms                   1.764e+00  1.061e+00   1.662   0.0965
## Housing$stories                   3.083e-01  1.481e-01   2.082   0.0374
## Housing$price                     4.241e-05  6.106e-06   6.945 3.78e-12
## Housing$recroomyes                1.592e-01  2.860e-01   0.557   0.5778
## Housing$fullbaseyes              -9.523e-02  2.545e-01  -0.374   0.7083
## Housing$garagepl                 -1.394e-03  1.313e-01  -0.011   0.9915
## Housing$bedrooms:Housing$bathrms -6.611e-01  3.095e-01  -2.136   0.0327
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Dispersion parameter for binomial family taken to be 1)
##     Null deviance: 681.92  on 545  degrees of freedom
## Residual deviance: 549.40  on 537  degrees of freedom
## AIC: 567.4
## Number of Fisher Scoring iterations: 4
#overdispersion calculation
## [1] 1.023091

Now we can assess the models by using the “anova” function with the “test” argument set to “Chi” for the chi-square test.

anova(model, model2, model3, test = "Chi")
## Analysis of Deviance Table
## Model 1: Housing$airco ~ Housing$bedrooms * Housing$bathrms + Housing$stories + 
##     Housing$price
## Model 2: Housing$airco ~ Housing$bedrooms * Housing$bathrms + Housing$stories + 
##     Housing$price + Housing$recroom + Housing$garagepl
## Model 3: Housing$airco ~ Housing$bedrooms * Housing$bathrms + Housing$stories + 
##     Housing$price + Housing$recroom + Housing$fullbase + Housing$garagepl
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1       540     549.75                     
## 2       538     549.54  2  0.20917   0.9007
## 3       537     549.40  1  0.14064   0.7076

The results of the anova indicate that the models are all essentially the same as there is no statistical difference. The only criteria on which to select a model is the measure of overdispersion. The first model has the lowest rate of overdispersion and so is the best when using this criteria. Therefore, determining if a hous has air conditioning depends on examining number of bedrooms and bathrooms simultenously as well as the number of stories and the price of the house.


The post explained how to use and interpret GLM in R. GLM can be used primarilyy for fitting data to disrtibutions that are not normal.

Common Challenges with Listening for ESL Students

Listening is always a challenge as students acquire any language. Both teachers and students know that it takes time to developing comprehension when listening to a second language.

This post will explain some of the common obstacles to listening for ESL students. Generally, some common roadblocks includes the following.

  • Slang
  • Contractions
  • Rate of Delivery
  • Emphasis in speech
  • Clustering
  • Repetition
  • Interaction


Slang or colloquial language is a major pain for language learners. There are so many ways that we communicate in English that does not meet the prescribed “textbook” way. This can leave ESL learners completely lost as to what is going on.

A simple example would be to say “what’s up”. Even the most austere English teacher knows what this means but this is in no way formal English. For someone new to English it would be confusing at least initially.


Contractions are unique form of slang or colloquialism that is more readily accept as standard English. A challenge with contractions is there omission of information. With this missing information there can be confusion.

An example would be “don’t” or “shouldn’t”. Other more complicated contractions can include “djeetyet” for “did you eat yet”. These common phrase leave out or do not pronounce important information.

Rate of Delivery 

When listening to someone in a second language it always seems too fast. The speed at which we speak our own language is always too swift for someone learning it.

Pausing at times during the delivery is one way to allow comprehension with actually slowing the speed at which one speaks. The main way to overcome this is to learn to listening faster if this makes any sense.

Emphasis in Speech

In many languages there are complex rules for understanding which vowels to stress, which do not make sense to a non-native speaker. In fact, native speakers do not always agree on the vowels to stress. English speakers have been arguing or how to pronounce potato and tomato for ages.

Another aspect is the intonation. The inflection in many languages can change when asking a question, a statement, or being bored, angry or some other emotion. These little nuances of language as difficult to replicate and understand.


Clustering is the ability to break language down into phrases. This helps in capturing the core of a language and is not easy to do. Language learners normally try to remember everything which leads to them understanding nothing.

For the teacher,  the students need help in determining what is essential information and what is not. This takes practice and demonstrations of what is considered critical and not in listening comprehension.


Repetition is closely related to clustering and involves the redundant use of words and phrases. Constantly re-sharing the same information can become confusing for students. An example would be someone saying “you know” and  “you see what I’m saying.” This information is not critical to understanding most conversations and can throw of the comprehension of a language learner.


Interaction has to do with a language learner understanding how to negotiate a conversation. This means being able to participate in a discussion, ask questions, and provide feedback.

The ultimate goal of listening is to speak. Developing  interactive skills is yet another challenge to listening as students must develop participatory skills.


The challenges mentioned here are intended to help teachers to be able to identify what may be impeding their students from growing in their ability to listen. Naturally, this is not exhaustive list but serves as a brief survey.


Types of Oral Language

Within communication and language teaching there are actually many different forms or types of oral language. Understanding this is beneficial if a teacher is trying to support students to develop their listening skills. This post will provide examples of several oral language forms.


A monologue is the use of language without any feedback verbally form others. There are two types of monologue which  are planned and unplanned. Planned monologues include such examples as speeches, sermons, and verbatim reading.

When a monologue is planned there is little repetition of the ideas and themes of the subject. This makes it very difficult for ESL students to follow and comprehend the information. ESL students need to hear the content several times to better understand what is being discussed.

Unplanned monologues are more improvisational in nature. Examples can include classroom lectures and one-sided conversations. There is usually more repetition in unplanned monologues which is beneficial. However, the stop and start of unplanned monologues can be confusing at times as well.


A dialogue is the use of oral language involving two or more people . Within dialogues there are two main sub-categories which are interpersonal and transactional. Interpersonal dialogues encourage the development of personal relationships. Such dialogues that involve asking people how are they or talking over dinner may fall in this category.

Transactional dialogue is dialogue for sharing factual information. An example might be  if someone you do not know asks you “where is the bathroom.” Such a question is not for developing relationships but rather for seeking information.

Both interpersonal and transactional dialogues can be either familiar or unfamiliar. Familiarity has to do with how well the people speaking know each other. The more familiar the people talking are the more assumptions  and hidden meanings they bring to the discussion. For example, people who work at the same company in the same department use all types of acronyms to communicate with each other that outsiders do not understand.

When two people are unfamiliar with each other, effort must be  made to provide information explicitly to avoid confusion. This carries over when a native speaker speaks in a familiar manner to ESL students. The style of communication  is inappropriate because of the lack of familiarity of the ESL students with the language.


The boundary between monologue and dialogue is much clear than the boundaries between the other categories mentioned such as planned/unplanned, interpersonal/transactional, and familiar/unfamiliar. In general, the ideas presented here represent a continuum and not either or propositions.


Proportion Test in R

Proportions are are a fraction or “portion” of a total amount. For example, if there are ten men and ten women in a room the proportion of men in the room is 50% (5 / 10). There are times when doing an analysis that you want to evaluate proportions in our data rather than individual measurements of mean, correlation, standard deviation etc.

In this post we will learn how to do a test of proportions using R. We will use the dataset “Default” which is found in the “ISLR” pacakage. We will compare the proportion of those who are students in the dataset to a theoretical value. We will calculate the results using the z-test and the binomial exact test. Below is some initial code to get started.


We first need to determine the actual number of students that are in the sample. This is calculated below using the “table” function.

##   No  Yes 
## 7056 2944

We have 2944 students in the sample and 7056 people who are not students. We now need to determine how many people are in the sample. If we sum the results from the table below is the code.

## [1] 10000

There are 10000 people in the sample. To determine the proprtion of students we take the number 2944 / 10000 which equals 29.44 or 29.44%. Below is the code to calculate this

table(Default$student) / sum(table(Default$student))
##     No    Yes 
## 0.7056 0.2944

The proportion test is used to compare a particular value with a theoretical value. For our example, the particular value we have is 29.44% of the people were students. We want to compare this value with a theoretical value of 50%. Before we do so it is better to state specificallt what are hypotheses are. NULL = The value of 29.44% of the sample being students is the same as 50% found in the population ALTERNATIVE = The value of 29.44% of the sample being students is NOT the same as 50% found in the population.

Below is the code to complete the z-test.

prop.test(2944,n = 10000, p = 0.5, alternative = "two.sided", correct = FALSE)
##  1-sample proportions test without continuity correction
## data:  2944 out of 10000, null probability 0.5
## X-squared = 1690.9, df = 1, p-value < 2.2e-16
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
##  0.2855473 0.3034106
## sample estimates:
##      p 
## 0.2944

Here is what the code means. 1. prop.test is the function used 2. The first value of 2944 is the total number of students in the sample 3. n = is the sample size 4. p= 0.5 is the theoretical proportion 5. alternative =“two.sided” means we want a two-tail test 6. correct = FALSE means we do not want a correction applied to the z-test. This is useful for small sample sizes but not for our sample of 10000

The p-value is essentially zero. This means that we reject the null hypothesis and conclude that the proprtion of students in our sample is different from a theortical proprition of 50% in the population.

Below is the same analysis using the binomial exact test.

binom.test(2944, n = 10000, p = 0.5)
##  Exact binomial test
## data:  2944 and 10000
## number of successes = 2944, number of trials = 10000, p-value <
## 2.2e-16
## alternative hypothesis: true probability of success is not equal to 0.5
## 95 percent confidence interval:
##  0.2854779 0.3034419
## sample estimates:
## probability of success 
##                 0.2944

The results are the same. Whether to use the “prop.test”” or “binom.test” is a major argument among statisticians. The purpose here was to provide an example of the use of both

Learning Styles and Strategies

All students have distinct traits in terms of how they learn and what they do to ensure that they learn. These two vague categories of how a student learns and what they do to learn are know as learning styles and learning strategies.

This post will explain what learning styles and learning strategies are.

Learning Styles

Learning styles are consistent traits that are long-lasting over time. For example, the various learning styles identified by Howard Gardner such as auditory, kinesthetic, or musical learner. A auditory learner prefers to learn through hearing things.

Learning styles are also associated with personality. For example, introverts prefer quiet time and fewer social interaction when compared to extroverts. This personality trait of introversion my affect an introverts ability to learn while working in small groups but not necessarily.

Learning Strategies

Strategies are specific methods a student uses to master and apply information. Examples include asking friends for help,  repeating information to one’s self, rephrasing, and or using context clues to determine the meaning of unknown words.

Strategies are much more unpredictable and flexible than styles are. Students can acquire styles through practice and exposure. In addition, it is common to use several strategies simultaneously to learn and use information.

Successful Students

Successful students understand what their style and strategies are. Furthermore, they can use these tendencies in learning and acquiring knowledge to achieve goals. For example, an introvert who knows they prefer to be alone and not work in groups will know when there are times when this naturally tendency must be resisted.

The key to understanding one’s styles and strategies is self-awareness. A teacher can support a student in understanding what their style and strategies are through the use of various informal checklist and psychological test.

A teacher can also support students in developing a balance set of strategies through compensatory activities. These are activities that force students to use strategies they are weak. For example, having auditory learners learn through kinesthetic means. This helps students to acquire skills that may be highly beneficial in their learning in the future.

To help students to develop compensatory skills requires that the teacher know and understand the strengths and weaknesses of their students. This naturally takes time and implies that compensatory activities should not take place at the beginning of a semester or should they be pre-planned into an unit plan before meeting students.


Strategies can play a powerful role in information processing. As such, students need to be aware of how they learn and what they do to learn. The teacher can provide support in this by helping students to figure out who they are as a learner.

Theoretical Distribution and R

This post will explore an example of testing if a dataset fits a specific theoretical distribution. This is a very important aspect of statistical modeling as it allows to understand the normality of the data and the appropriate steps needed to take to prepare for analysis.

In our example, we will use the “Auto” dataset from the “ISLR” package. We will check if the horsepower of the cars in the dataset is normally distributed or not. Below is some initial code to begin the process.


Determining if a dataset is normally distributed is simple in R. This is normally done visually through making a Quantile-Quantile plot (Q-Q plot). It involves using two functions the “qnorm” and the “qqline”. Below is the code for the Q-Q plot



We now need to add the Q-Q line to see how are distribution lines up with the theoretical normal one. Below is the code. Note that we have to repeat the code above in order to get the completed plot.

qqline(Auto$horsepower, distribution = qnorm, probs=c(.25,.75))


The “qqline” function needs the data you want to test as well as the distribution and probability. The distribution we wanted is normal and is indicated by the argument “qnorm”. The probs argument means probability. The default values are .25 and .75. The resulting graph indicates that the distribution of “horsepower”, in the “Auto” dataset is not normally distributed. That are particular problems with the lower and upper values.

We can confirm our suspicion by running a statistical test. The Anderson-Darling test from the “nortest” package will allow us to test whether our data is normally distributed or not. The code is below

##  Anderson-Darling normality test
## data:  Auto$horsepower
## A = 12.675, p-value < 2.2e-16

From the results, we can conclude that the data is not normally distributed. This could mean that we may need to use non-parametric tools for statistical analysis.

We can further explore our distribution in terms of its skew and kurtosis. Skew measures how far to the left or right the data leans and kurtosis measures how peaked or flat the data is. This is done with the “fBasics” package and the functions “skewness” and “kurtosis”.

First we will deal with skewness. Below is the code for calculating skewness.

## [1] 1.079019
## attr(,"method")
## [1] "moment"

We now need to determine if this value of skewness is significantly different from zero. This is done with a simple t-test. We must calculate the t-value before calculating the probability. The standard error of the skew is defined as the square root of six divided by the total number of samples. The code is below

## [1] 8.721607
## attr(,"method")
## [1] "moment"

Now we take the standard error of Horsepower and plug this into the “pt” function (t probability) with the degrees of freedom (sample size – 1 = 391) we also put in the number 1 and subtract all of this information. Below is the code

## [1] 0
## attr(,"method")
## [1] "moment"

The value zero means that we reject the null hypothesis that the skew is not significantly different form zero and conclude that the skew is different form zero. However, the value of the skew was only 1.1 which is not that non-normal.

We will now repeat this process for the kurtosis. The only difference is that instead of taking the square root divided by six we divided by 24 in the example below.

## [1] 0.6541069
## attr(,"method")
## [1] "excess"
## [1] 2.643542
## attr(,"method")
## [1] "excess"
## [1] 0.004267199
## attr(,"method")
## [1] "excess"

Again the pvalue is essentially zero, which means that the kurtosis is significantly different from zero. With a value of 2.64 this is not that bad. However, when both skew and kurtosis are non-normally it explains why our overall distributions was not normal either.


This post provided insights into assessing the normality of a dataset. Visually inspection can take place using  Q-Q plots. Statistical inspection can be done through hypothesis testing along with checking skew and kurtosis.

Overcoming Plagiarism in an ESL Context

Academic dishonesty in the form of plagiarism is a common occurrence in academia. Generally, most students know that cheating is inappropriate on exams and what they are really doing is hoping that they are not caught.

However, plagiarism is much more sticky and subjective offense for many students. This holds especially true for ESL students. Writing in a second language is difficult for everybody regardless of one’s background. As such, students often succumb to the temptation of plagiarism to complete writing assignments.

Many ideas are being used to reduce plagarism. Software like turnitin do work but they lead to an environment of mistrust and an arms race between students and teachers. Other measures should be considered for dealing with plagarism

This post will will explain how seeing writing from the perspective of a process rather than a product can reduce the chances of plagiarism in the ESL context.

 Writing as a Product

In writing pedagogy the two most common views on writing are writing as a product and writing as a process. Product writing views writing as the submission of a writing assignment that meets a certain standard, is grammatically near perfection, and highly structured. Students are given examples of excellence and are expected to emulate them.

Holding to this view is fine but it can contribute to plagiarism in many ways.

  • Students cannot meet the expectation for grammatical perfection. This encourages  them to copy excellently written English from Google into their papers.
  • Focus on grammar leads to over-correction of the final paper. The overwhelming red pen marks from the teacher on the paper can stifle a desire for students to write in fear of additional correction.
  • The teacher often provides little guidance beyond providing examples. Without daily, constant feedback, students have no idea what to do and rely on Google.
  • People who write in a second language often struggle to structure their thoughts because we all think much more shallower in a second language with reduced vocabulary. Therefore, an ESL paper is always messier because of the difficulty of executing complex cognitive processes in a second language.

These pressures mentioned above can contribute to a negative classroom environment in which students do not really want to write but survive a course however it takes. For native-speakers this works but is really hard for ESL students to have success.

Writing as a Process

The other view of writing is writing as a process. This approach sees writing as the teacher providing constant one-on-one guidance through the writing process. Students begin to learn how they write and develop an understanding of the advantages of rewriting and revisions. Teacher and peer feedback are utilized throughout the various drafts of the paper.

The view of writing as a product has the following advantages for avoiding plagarism

  • Grammar is slowly fixed over time through feedback from the teacher. This allows the students to make corrections before the final submission.
  • Any instances of plagiarism can be caught before final submission. Many teachers do not give credit for rough drafts. Therefore, plagiarism in a rough draft normally does not affect the final grade.
  • The teacher can coach the students on how to reword plagiarize statements and also how to give appropriate credit through using APA.
  • The de-emphasis on  perfection allows the student to grow and mature as a writer on the constant support of the teacher and peers.
  • Guiding the students thought process is especially critically across cultures as communication style vary widely across the world. Learning to write for a Western academic audience requires training in how Western academics think and communicate. This cannot be picked up alone and is another reason why plagarism is useful because the stole idea is communicated appropriately.

In a writing as a process environment the students and teacher work together to develop papers that meet standards in the students own words. It takes much more time and effort but it can reduce the temptation  of just copying from whatever Google offers.


Grammar plays a role in  writing but the shaping of ideas and their communication is of up most concern for many in TESOL. The analogy I use is that grammar is like the paint on the walls of a house or the tile on the floor. It makes the house look nice but is not absolutely necessary. The ideas and thoughts of a paper are like the foundation, walls, and roof. Nobody wants to live in a house that lacks tile, or is not painted but you cannot live in a house that does not have walls and a roof.

The stress on native-like communication stresses out ESL students to the point of not even trying to write at times. With a change in view on the writing experience from product to process this can be  alleviated. We should only ask our students to do what we are able to do. If we cannot write in a second language in a fluent manner how can we ask them?

Academic Dishonesty and Cultural Difference

Academic dishonesty, which includes plagiarism and cheating ,are problems that most teachers have dealt with in their career. Students sometimes succumb to the temptation of finding ways to excel or just survive a course by doing things that are highly questionable. This post will attempt to deal with some of the issues related to academic dishonesty. In particular, we will look at how perceptions of academic dishonesty vary across context.

Cultural Variation

This may be frustrating to many but there is little agreement in terms of what academic dishonesty is once one leaves their own cultural context. In the West, people often believe that a person can create and “own” an idea, that people should “know” their stuff, and that “credit” should be giving one using other people’s ideas. These foundational assumptions shape how teachers and students view using others ideas and using the answers of friends to complete assignments

However, in other cultures there is more of an “ends justifies the means” approach. This manifests itself in using ideas without giving credit because ideas belong to nobody and having friends “help” you to complete an assignment or quiz because they know the answer and you do not, if the situation was different you would give them the answer. Therefore, in many context doesn’t matter how the assignment or quiz is completed as long as it is done.

This has a parallel in many situations. If you are working on a project for your boss and got stuck. Would it be deceptive to ask for help from a colleague to get the project done? Most of us have done this at one time or another. The problem is that this is almost always frown upon during an assignment or assessment in the world of academics.

The purpose here is not to judge one side or the other but rather to allow people to identify the assumptions they have about academic dishonesty so that they avoid jumping to conclusion when confronted with this by people who are not from the same part of the world as them.

Our views on academic dishonesty are shaped in the context we grow up in

Clear Communication

One way to deal with the misunderstandings of academic dishonesty across cultures is for the teacher to clearly define what academic dishonesty is to them. This means providing examples an explaining how this violates the norms of academia. In the context of academia, academic dishonesty in the forms of cheating and plagiarism are completely unacceptable.

One strategy that I have used to explain academic dishonesty is to compare academic dishonesty that is totally culturally repulsive locally. For example, I have compare plagiarism to wearing your shoes in someone’s house in Asia (major no no in most parts). Students never understand what plagiarism is when defined in isolation abstractly (or so they say). However, when plagiarism is compared to wearing your shoes in someone house, they begin to see how much academics hate this behavior. They  also realize how they need to adjust their behavior for the context they are in.

By presenting a cultural argument against plagiarism and cheating rather than a moral one students are able to understand how in the context of school this is not acceptable. Outside of school there are normally different norms of acceptable behavior.


The steps to take with people who share the same background are naturally different than with the suggestion provided here. The primary point to remember is that academic dishonesty is not seen the same way be everyone. This requires that the teacher communicate what they mean when referring to this and to provide a relevant example of academic dishonesty so the students can understand.

Probability Distribution and Graphs in R

In this post, we will use probability distributions and ggplot2 in R to solve a hypothetical example. This provides a practical example of the use of R in everyday life through the integration of several statistical and coding skills. Below is the scenario.

At a busing company the average number of stops for a bus is 81 with a standard deviation of 7.9. The data is normally distributed. Knowing this complete the following.

  • Calculate the interval value to use using the 68-95-99.7 rule
  • Calculate the density curve
  • Graph the normal curve
  • Evaluate the probability of a bus having less then 65 stops
  • Evaluate the probability of a bus having more than 93 stops

Calculate the Interval Value

Our first step is to calculate the interval value. This is the range in which 99.7% of the values falls within. Doing this requires knowing the mean and the standard deviation and subtracting/adding the standard deviation as it is multiplied by three from the mean. Below is the code for this.

## [1] 104.7
## [1] 57.3

The values above mean that we can set are interval between 55 and 110 with 100 buses in the data. Below is the code to set the interval.

interval<-seq(55,110, length=100) #length here represents 
100 fictitious buses

Density Curve

The next step is to calculate the density curve. This is done with our knowledge of the interval, mean, and standard deviation. We also need to use the “dnorm” function. Below is the code for this.


We will now plot the normal curve of our data using ggplot. Before we need to put our “interval” and “densityCurve” variables in a dataframe. We will call the dataframe “normal” and then we will create the plot. Below is the code.

normal<-data.frame(interval, densityCurve)
ggplot(normal, aes(interval, densityCurve))+geom_line()+ggtitle("Number of Stops for Buses")


Probability Calculation

We now want to determine what is the provability of a bus having less than 65 stops. To do this we use the “pnorm” function in R and include the value 65, along with the mean, standard deviation, and tell R we want the lower tail only. Below is the code for completing this.

pnorm(65,mean = 81,sd=7.9,lower.tail = TRUE)
## [1] 0.02141744

As you can see, at 2% it would be unusually to. We can also plot this using ggplot. First, we need to set a different density curve using the “pnorm” function. Combine this with our “interval” variable in a dataframe and then use this information to make a plot in ggplot2. Below is the code.

CumulativeProb<-pnorm(interval, mean=81,sd=7.9,lower.tail = TRUE)
pnormal<-data.frame(interval, CumulativeProb)
ggplot(pnormal, aes(interval, CumulativeProb))+geom_line()+ggtitle("Cumulative Density of Stops for Buses")


Second Probability Problem

We will now calculate the probability of a bus have 93 or more stops. To make it more interesting we will create a plot that shades the area under the curve for 93 or more stops. The code is a little to complex to explain so just enjoy the visual.

pnorm(93,mean=81,sd=7.9,lower.tail = FALSE)
## [1] 0.06438284
p<-ggplot(MyDF,aes(x,y))+geom_line()+scale_x_continuous(limits = c(50, 110))
+ggtitle("Probabilty of 93 Stops or More is 6.4%")
shade <- rbind(c(93,0), subset(MyDF, x > 93), c(MyDF[nrow(MyDF), "X"], 0))

p + geom_segment(aes(x=93,y=0,xend=93,yend=ytop)) +
        geom_polygon(data = shade, aes(x, y))



A lot of work was done but all in a practical manner. Looking at realistic problem. We were able to calculate several different probabilities and graph them accordingly.

Dealing with Classroom Management

Classroom management is one of the most difficult aspects of teaching. Despite the difficulties of behavioral problems there are several steps teachers can make to mitigate this problem. This post will provide some practical ways to reduce or even eliminate the headache of classroom management.

Deal with the Learning Space

The learning space is another name for the classroom that the teacher have authority over. If a teacher is fortunate enough to have their own classroom (this is not always the case) he or she may need to consider some of the following.

  • A clean, neat, visually appealing classroom helps in settling students.
  • The temperature should be moderate. Too cold or too hot leads to problems
  • The acoustics of the classroom affects performance. If it’s hard to hear each other it makes direct instruction impossible as well as any whole-class discussion. This includes noise coming from outside the classroom

If the teacher do not have their own classroom, he may need to work with the administration or the teachers in whose classroom he teaches to deal with some of these issues.

Dealing with Seating Arrangements

There are essential four seating arrangements in a classroom

  • Rows
  • Full circle
  • Half circle
  • Groups

Each of these arrangements have there advantages and disadvantages. Rows are use for a teacher-centered classroom and lecture style. They are for individual work as well. However, rows limit interaction among students. Despite this, at the beginning of the year it may be better to start with rows until a teacher has a handle on the students.

Full/Half circle or great for whole-class discussion. Students are able to all make eye-contact and this helps with supporting a discussion. However, this also makes it hard to concentrate if there is some sort of assignment that needs to be completed. As such, the full/half circle approach  is normally used for special occasions.

Groups are used for high interaction settings. In groups, students, can work together on project or support each other for regular assignments. Normally, groups lead to the largest amount of management problems. As such, groups are great for teachers who have more experience with classroom management.

Dealing with Presence

Presence has to do with the voice and body language of a teacher. Learning to control the voice is a common problem for new teachers and losing one’s voice happens frequently. The voice of a teacher most project without yelling and this requires practice, which can be accelerated through taking voice lessons. Speaking must also be done at a reasonable rate. Too fast or slow will make it hard to pay attention.

The body language of teacher should project a sense of calm, confidence, and optimism. This can be done by moving about the room while teaching, feigning confidence even if the teacher don’t have it, and always maintaining composure no matter what the students do. A teacher losing control of their temper means the students have control and they will enjoy laughing at the one who is suppose to be in control.


Teachers need to exert the authority that they are the leader of the classroom. This requires being organized and confident while having a sense of direction in where the lesson is going. This is not easy but is often necessary when dealing with students.