The Grader Report in Moodle VIDEO

This video explains the grader report view in Moodle

Advertisements

Best Subset Regression in R

In this post, we will take a look at best subset regression. Best subset regression fits a model for all possible feature or variable combinations and the decision for the most appropriate model is made by the analyst based on judgment or some statistical criteria.

Best subset regression is an alternative to both Forward and Backward stepwise regression. Forward stepwise selection adds one variable at a time based on the lowest residual sum of squares until no more variables continues to lower the residual sum of squares. Backward stepwise regression starts with all variables in the model and removes variables one at a time. The concern with stepwise methods is they can produce biased regression coefficients, conflicting models, and inaccurate confidence intervals.

Best subset regression bypasses these weaknesses of stepwise models by creating all models possible and then allowing you to assess which variables should be including in your final model. The one drawback to best subset is that a large number of variables means a large number of potential models, which can make it difficult to make a decision among several choices.

In this post, we will use the “Fair” dataset from the “Ecdat” package to predict marital satisfaction based on age, Sex, the presence of children, years married, religiosity, education, occupation, and number of affairs in the past year. Below is some initial code.

library(leaps);library(Ecdat);library(car);library(lmtest)
data(Fair)

We begin our analysis by building the initial model with all variables in it. Below is the code

fit<-lm(rate~.,Fair)
summary(fit)
## 
## Call:
## lm(formula = rate ~ ., data = Fair)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2049 -0.6661  0.2298  0.7705  2.2292 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.522875   0.358793   9.819  < 2e-16 ***
## sexmale     -0.062281   0.099952  -0.623  0.53346    
## age         -0.009683   0.007548  -1.283  0.20005    
## ym          -0.019978   0.013887  -1.439  0.15079    
## childyes    -0.206976   0.116227  -1.781  0.07546 .  
## religious    0.042142   0.037705   1.118  0.26416    
## education    0.068874   0.021153   3.256  0.00119 ** 
## occupation  -0.015606   0.029602  -0.527  0.59825    
## nbaffairs   -0.078812   0.013286  -5.932 5.09e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.03 on 592 degrees of freedom
## Multiple R-squared:  0.1405, Adjusted R-squared:  0.1289 
## F-statistic:  12.1 on 8 and 592 DF,  p-value: 4.487e-16

The initial results are already interesting even though the r-square is low. When couples have children the have less martial satisfaction than couples without children when controlling for the other factors and this is the strongest regression weight. In addition, the more education a person has there is an increase in marital satisfaction. Lastly, as the number of affairs increases there is also a decrease in martial satisfaction. Keep in mind that the “rate” variable goes from 1 to 5 with one meaning a terrible marriage to five being a great one. The mean marital satisfaction was 3.52 when controlling for the other variables.

We will now create our subset models. Below is the code.

sub.fit<-regsubsets(rate~.,Fair)
best.summary<-summary(sub.fit)

In the code above we create the sub models using the “regsubsets” function from the “leaps” package and saved it in the variable called “sub.fit”. We then saved the summary of “sub.fit” in the variable “best.summary”. We will use the “best.summary” “sub.fit variables several times to determine which model to use.

There are many different ways to assess the model. We will use the following statistical methods that come with the results from the “regsubset” function.

  • Mallow’ Cp
  • Bayesian Information Criteria

We will make two charts for each of the criteria above. The plot to the left will explain how many features to include in the model. The plot to the right will tell you which variables to include. It is important to note that for both of these methods, the lower the score the better the model. Below is the code for Mallow’s Cp.

par(mfrow=c(1,2))
plot(best.summary$cp)
plot(sub.fit,scale = "Cp")

1

The plot on the left suggest that a four feature model is the most appropriate. However, this chart does not tell me which four features. The chart on the right is read in reverse order. The high numbers are at the bottom and the low numbers are at the top when looking at the y-axis. Knowing this, we can conclude that the most appropriate variables to include in the model are age, children presence, education, and number of affairs. Below are the results using the Bayesian Information Criterion

par(mfrow=c(1,2))
plot(best.summary$bic)
plot(sub.fit,scale = "bic")

1

These results indicate that a three feature model is appropriate. The variables or features are years married, education, and number of affairs. Presence of children was not considered beneficial. Since our original model and Mallow’s Cp indicated that presence of children was significant we will include it for now.

Below is the code for the model based on the subset regression.

fit2<-lm(rate~age+child+education+nbaffairs,Fair)
summary(fit2)
## 
## Call:
## lm(formula = rate ~ age + child + education + nbaffairs, data = Fair)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2172 -0.7256  0.1675  0.7856  2.2713 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.861154   0.307280  12.566  < 2e-16 ***
## age         -0.017440   0.005057  -3.449 0.000603 ***
## childyes    -0.261398   0.103155  -2.534 0.011531 *  
## education    0.058637   0.017697   3.313 0.000978 ***
## nbaffairs   -0.084973   0.012830  -6.623 7.87e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.029 on 596 degrees of freedom
## Multiple R-squared:  0.1352, Adjusted R-squared:  0.1294 
## F-statistic: 23.29 on 4 and 596 DF,  p-value: < 2.2e-16

The results look ok. The older a person is the less satisfied they are with their marriage. If children are presence the marriage is less satisfying. The more educated the more satisfied they are. Lastly, the higher the number of affairs indicate less marital satisfaction. However, before we get excited we need to check for collinearity and homoscedasticity. Below is the code

vif(fit2)
##       age     child education nbaffairs 
##  1.249430  1.228733  1.023722  1.014338

No issues with collinearity.For vif values above 5 or 10 indicate a problem. Let’s check for homoscedasticity

par(mfrow=c(2,2))
plot(fit2)

1.jpeg

The normal qqplot and residuals vs leverage plot can be used for locating outliers. The residual vs fitted and the scale-location plot do not look good as there appears to be a pattern in the dispersion which indicates homoscedasticity. To confirm this we will use Breusch-Pagan test from the “lmtest” package. Below is the code

bptest(fit2)
## 
##  studentized Breusch-Pagan test
## 
## data:  fit2
## BP = 16.238, df = 4, p-value = 0.002716

There you have it. Our model violates the assumption of homoscedasticity. However, this model was developed for demonstration purpose to provide an example of subset regression.

1

Writing Techniques for the ESL Classroom

In-class writing is common in many many ESL context. This post will provide several different ways that teachers can get their students writing in an ESL classroom.

Imitation

Perhaps the simplest way to get ESL students writing is to have them imitate what is read to them. This allows the students to learn the conventions of writing in the target language.

This is usually done through some form of dictation. The teacher reads a few words or reads slowly. This provides students with time to write down what they heard.

The actually marking of such an activity would involve the use of rubrics or some sort of count system for the number of words the student was able to write down. Often, spelling and pronunciation are not  considered major factors in the grade because of the rush nature of the writing.

Controlled and Guided

Controlled writing involves having students modify an existing writing sample. For example, changing all the verb in a paragraph from past to present. This will require them to often change more than just the verbs but other aspects of writing as well

Guided writing involves having the students respond to some sort of question or stimuli. For example, the students may watch a video and then are asked to write about and or answer questions. They may also be try to rewrite something that they heard at normal speed.

Self-Writing

The most common form of self-writing is the writing of a journal. The writing is only intended for the student. Even note-taking is considered a form of self-writing even though it is not normally comprehensible to others.

Self-writing, particularly journals, can be useful in developing reflective thinking in students in general even with the language barriers of writing in another language.

Display  and Real Writing

Display writing is writing that is primarily intended for the teacher, who already knows the answer that the student is addressing. Examples of this type of writing include essays and other writing for the purpose of a summative assessment. The student is literally displaying what they already know.

Real writing is writing in which  the reader does not know the answer to that the student is addressing. As such, one of the main differences between display and real writing is the knowledge that the audience of the writing has.

Conclusion

When working with students it is important to provide them with learning experiences that stimulate the growth and development that they need. Understanding the various forms of writing that can happen in an ESL classroom can provide teachers with ideas on how to help their students.

1

Data Wrangling in R

Collecting and preparing data for analysis is the primary job of a data scientist. This experience is called data wrangling. In this post, we will look at an example of data wrangling using a simple artificial data set. You can create the table below in r or excel. If you created it in excel just save it as a csv and load it into r. Below is the initial code

library(readr)
apple <- read_csv("~/Desktop/apple.csv")
## # A tibble: 10 × 2
##        weight      location
##         <chr>         <chr>
## 1         3.2        Europe
## 2       4.2kg       europee
## 3      1.3 kg          U.S.
## 4  7200 grams           USA
## 5          42 United States
## 6         2.3       europee
## 7       2.1kg        Europe
## 8       3.1kg           USA
## 9  2700 grams          U.S.
## 10         24 United States

This a small dataset with the columns of “weight” and “location”. Here are some of the problems

  • Weights are in different units
  • Weights are written in different ways
  • Location is not consistent

In order to have any success with data wrangling you need to state specifically what it is you want to do. Here are our goals for this project

  • Convert the “Weight variable” to a numerical variable instead of character
  • Remove the text and have only numbers in the “weight variable”
  • Change weights in grams to kilograms
  • Convert the “location” variable to a factor variable instead of character
  • Have consistent spelling for Europe and United States in the “location” variable

We will begin with the “weight” variable. We want to convert it to a numerical variable and remove any non-numerical text. Below is the code for this

corrected.weight<-as.numeric(gsub(pattern = "[[:alpha:]]","",apple$weight))
corrected.weight
##  [1]    3.2    4.2    1.3 7200.0   42.0    2.3    2.1    3.1 2700.0   24.0

Here is what we did.

  1. We created a variable called “corrected.weight”
  2. We use the function “as.numeric” this makes whatever results inside it to be a numerical variable
  3. Inside “as.numeric” we used the “gsub” function which allows us to substitute one value for another.
  4. Inside “gsub” we used the argument pattern and set it to “[[alpha:]]” and “” this told r to look for any lower or uppercase letters and replace with nothing or remove it. This all pertains to the “weight” variable in the apple dataframe.

We now need to convert the weights in grams to kilograms so that everything is the same unit. Below is the code

gram.error<-grep(pattern = "[[:digit:]]{4}",apple$weight)
corrected.weight[gram.error]<-corrected.weight[gram.error]/1000
corrected.weight
##  [1]  3.2  4.2  1.3  7.2 42.0  2.3  2.1  3.1  2.7 24.0

Here is what we did

  1. We created a variable called “gram.error”
  2. We used the grep function to search are the “weight” variable in the apple data frame for input that is a digit and is 4 digits in length this is what the “[[:digit:]]{4}” argument means. We do not change any values yet we just store them in “gram.error”
  3. Once this information is stored in “gram.error” we use it as a subset for the “corrected.weight” variable.
  4. We tell r to save into the “corrected.weight” variable any value that is changeable according to the criteria set in “gram.error” and to divided it by 1000. Dividing it by 1000 converts the value from grams to kilograms.

We have completed the transformation of the “weight” and will move to dealing with the problems with the “location” variable in the “apple” dataframe. To do this we will first deal with the issues related to the values that relate to Europe and then we will deal with values related to United States. Below is the code.

europe<-agrep(pattern = "europe",apple$location,ignore.case = T,max.distance = list(insertion=c(1),deletions=c(2)))
america<-agrep(pattern = "us",apple$location,ignore.case = T,max.distance = list(insertion=c(0),deletions=c(2),substitutions=0))
corrected.location<-apple$location
corrected.location[europe]<-"europe"
corrected.location[america]<-"US"
corrected.location<-gsub(pattern = "United States","US",corrected.location)
corrected.location
##  [1] "europe" "europe" "US"     "US"     "US"     "europe" "europe"
##  [8] "US"     "US"     "US"

The code is a little complicated to explain but in short We used the “agrep” function to tell r to search the “location” to look for values similar to our term “europe”. The other arguments provide some exceptions that r should change because the exceptions are close to the term europe. This process is repeated for the term “us”. We then store are the location variable from the “apple” dataframe in a new variable called “corrected.location” We then apply the two objects we made called “europe” and “america” to the “corrected.location” variable. Next we have to make some code to deal with “United States” and apply this using the “gsub” function.

We are almost done, now we combine are two variables “corrected.weight” and “corrected.location” into a new data.frame. The code is below

cleaned.apple<-data.frame(corrected.weight,corrected.location)
names(cleaned.apple)<-c('weight','location')
cleaned.apple
##    weight location
## 1     3.2   europe
## 2     4.2   europe
## 3     1.3       US
## 4     7.2       US
## 5    42.0       US
## 6     2.3   europe
## 7     2.1   europe
## 8     3.1       US
## 9     2.7       US
## 10   24.0       US

If you use the “str” function on the “cleaned.apple” dataframe you will see that “location” was automatically converted to a factor.

This looks much better especially if you compare it to the original dataframe that is printed at the top of this post.

1

Understanding ESL Writing Patterns Across Cultures

When people are learning the English they will almost always bring how they communicate with them when they are speaking or writing in English. However, for native speakers of English the written communication style of ESL students can be bewildering even if it is grammatically sound.

This phenomenon of the L1 influencing the writing style of the L2 is known as contrastive rhetoric. This post will provide examples from different cultures in terms of how they approach writing in English and compare it to how a native-speaking person from a Western country writes to show the differences.

The Native English Speaker Writing Example

Below is a simple paragraph written by a Native English speaking person.

Exercise is good for a person for several reasons. For example, exercises helps to strengthen the body. As a person moves he or she is utilizing their muscles which promotes maintenance and potentially growth of the muscle. Second, exercises helps to remove waste from the body. Strenuous exercise causes people to sweat and  breath deeply and this increases the removal of harmful elements from the body. Lastly, exercise makes people feel good. Exercise encourages the release of various hormones that makes a person feel better.  Therefore, people should exercise in order to enjoy these clear benefits

The writing style of an English speaker is usually highly linear in nature. In the example above, the first sentence is clearly the main idea or the point. Right from the beginning the English writer shares with you where they stand on the subject. There is little mystery or suspense as to what will be talked about.

The  rest of the paragraph are supporting details for the main idea. The supporting details start with the discourse markers of “for example”, “second”, and “lastly”. Everything in the paragraph is laid out in a step-by-step manner that is highly clear as this is important for English speakers.

Unfortunately, this style of writing is what many ESL students from other cultures is compared too. The next examples have perfect “English” however, the style of communication is not in this linear manner.

Eastern Writing Style

According to Robert Kaplan, people from  Eastern countries write in a circular indirect manner. This means that Eastern writing lacks the direct point or main idea of western writing and also lacks the clearly structured supporting details. Below is the same paragraph example as the one in the English example but written in a more Eastern style

As a person moves he or she is utilizing their muscles which promotes maintenance and potentially growth of the muscle. Strenuous exercise causes people to sweat and  breath deeply and this increases the removal of harmful elements from the body. Exercise encourages the release of various hormones that makes a person feel better.

The example is grammatical sound but for an native English speaker there are several problems with the writing

  1. There is no main idea. The entire paragraph is missing a point. The writer is laying down claims about their point but they never actually tell you what the point is. Native speakers want a succinct summary of the point when information is shared with them. Eastern writers prefer an indirect or implied main ideas because being too direct is considered rude. In addition, if you are too clear in an Eastern context it is hard to evade and prevaricate if someone is offended by what is said.
  2. The discourse markers are missing. There are no “for example” or “second” mention. Discourse markers give a paragraph a strong sense of linear direction. The native English speaker can tell where they are in a train of thought when these markers are there. When they are missing the English reader is wondering when is the experience is going to be over.
  3. There are no transition sentences. In the native English speaking example, every discourse marker served as the first word in a transition sentence which move the reader from the first supporting detail to the next supporting detail. The Eastern example has only details without in guidepost from one detail to the other. If a paragraph is really long this can become overwhelming for the Native English speaker.

The example is highly fluent and this kind of writing is common in many English speaking countries that are not found in the West. Even with excellent knowledge of the language the discourse skills affect the ability to communicate.

Conclusion

My student have shared with me that English writing is clear and easy to understand but too direct in nature. Whereas the complaints of teachers is the the ESL students written is unclear and indirect.

This is not a matter of right in wrong but differences in how to communicate when writing. A student who is aware of how the communicate can make adjustments so that whoever they are speaking with can understand them. The goal should not be to change students but to make them aware of their assumptions so they can adjust depending on the situation and to not change them to act a certain way all the time.

Principal Component Analysis in R

This post will demonstrate the use of principal component analysis (PCA). PCA is useful for several reasons. One it allows you place your examples into groups similar to linear discriminant analysis but you do not need to know beforehand what the groups are. Second, PCA is used for the purpose of dimension reduction. For example, if you have 50 variables PCA can allow you to reduce this while retaining a certain threshold of variance. If you are working with a large dataset this can greatly reduce the computational time and general complexity of your models.

Keep in mind that there really is not a dependent variable as this is unsupervised learning. What you are trying to see is how different examples can be mapped in space based on whatever independent variables are used. For our example, we will use the “Carseats” dataset form the “ISLR”. Our goal is to understanding the relationship among the variables when examining the shelve location of the car seat. Below is the initial code to begin the analysis

library(ggplot2)
library(ISLR)
data("Carseats")

We first need to rearrange the data and remove the variables we are not going to use in the analysis. Below is the code.

Carseats1<-Carseats
Carseats1<-Carseats1[,c(1,2,3,4,5,6,8,9,7,10,11)]
Carseats1$Urban<-NULL
Carseats1$US<-NULL

Here is what we did 1. We made a copy of the “Carseats” data called “Careseats1” 2. We rearranged the order of the variables so that the factor variables are at the end. This will make sense later 3.We removed the “Urban” and “US” variables from the table as they will not be a part of our analysis

We will now do the PCA. We need to scale and center our data otherwise the larger numbers will have a much stronger influence on the results than smaller numbers. Fortunately, the “prcomp” function has a “scale” and a “center” argument. We will also use only the first 7 columns for the analysis  as “sheveLoc” is not useful for this analysis. If we hadn’t moved “shelveLoc” to the end of the dataframe it would cause some headache. Below is the code.

Carseats.pca<-prcomp(Carseats1[,1:7],scale. = T,center = T)
summary(Carseats.pca)
## Importance of components:
##                           PC1    PC2    PC3    PC4    PC5     PC6     PC7
## Standard deviation     1.3315 1.1907 1.0743 0.9893 0.9260 0.80506 0.41320
## Proportion of Variance 0.2533 0.2026 0.1649 0.1398 0.1225 0.09259 0.02439
## Cumulative Proportion  0.2533 0.4558 0.6207 0.7605 0.8830 0.97561 1.00000

The summary of “Carseats.pca” Tells us how much of the variance each component explains. Keep in mind that number of components is equal to the number of variables. The “proportion of variance” tells us the contribution each component makes and the “cumulative proportion”.

If your goal is dimension reduction than the number of components to keep depends on the threshold you set. For example, if you need around 90% of the variance you would keep the first 5 components. If you need 95% or more of the variance you would keep the first six. To actually use the components you would take the “Carseats.pca$x” data and move it to your data frame.

Keep in mind that the actual components have no conceptual meaning but is a numerical representation of a combination of several variables that were reduce using PCA to fewer variables such as going form 7 variables to 5 variables.

This means that PCA is great for reducing variables for prediction purpose but is much harder for explanatory studies unless you can explain what the new components represent.

For our purposes, we will keep 5 components. This means that we have reduce our dimensions from 7 to 5 while still keeping almost 90% of the variance. Graphing our results is tricky because we have 5 dimensions but the human mind can only conceptualize 3 at the best and normally 2. As such we will plot the first two components and label them by shelf location using ggplot2. Below is the code

scores<-as.data.frame(Carseats.pca$x)
pcaplot<-ggplot(scores,(aes(PC1,PC2,color=Carseats1$ShelveLoc)))+geom_point()
pcaplot

1.png

From the plot you can see there is little separation when using the first two components of the PCA analysis. This makes sense as we can only graph to components so we are missing a lot of the variance. However for demonstration purposes the analysis is complete.

1

Writing as a Process or Product

In writing pedagogy, there are at least two major ways of seeing writing. These two approaches see writing as a process or as a product. This post will explain each along with some of the drawbacks of both.

Writing as a Product

Writing as a product entailed the teacher setting forth standards in terns of rhetoric, vocabulary use, organization, etc. The students were given several different examples that could be used as models form which to base their own paper.

The teacher may be available for one-on-one support but this was not necessarily embedded in the learning experience. In addition, the teacher was probably only going to see the finally draft.

For immature writers, this is an intimidating learning experience. To be  required to develop a paper with only out of context examples from former students is difficult to deal with. In addition, without prior feedback in terms of progress, students have no idea if they are meeting expectations. The teacher is also clueless as to student progress and this means that both students and teachers can be “surprised” by poorly written papers and failing students.

The lack of communication while writing can encourage students to try and overcome their weaknesses through plagiarism. This is especially true for ESL students who lack the mastery of the language while also often having different perspectives on what academic dishonesty is.

Another problem is the ‘A’ students will simply copy the examples the teacher provided and just put in their own topic and words in it. This leads to an excellent yet mechanical paper that does not allow the students to develop as writers. In other words the product approach provide too much support for strong students and not enough support for weak ones.

Writing as a Process

In writing as a process, the teacher supports the student through several revisions of a paper. The teacher provides support for the develop of ideas, organization, coherency, and other aspects of writing. All this is done through the teacher providing feedback to the student was well as dealing with any questions and or concerns the student may have with their paper.

This style of writing teaching helps students to understand what kind of writer they are. Students are often so focused on completing writing assignments that they never learn  what their tendencies and habits as a writer our. Understanding their own strengths and weaknesses can help them to develop compensatory strategies to complete assignments. This can of self-discovery can happen through one-on-one conferences with the teacher.

Off course, such personal attention takes a great deal of time. However, even brief 5 minutes conferences with students can reap huge rewards in their writing. It also saves time at the end when marking because you as the teacher are already familiar with what the students are writing about and the check of the final papers is just to see if the students have revised their paper according to the advice you gave.

The process perspective give each student individual attention to grow as individual. ‘A’ students get what they need as well as weaker students. Everyone is compared to their own progress as a writer.

Conclusion

Generally, the process approach is more appropriate for teaching writing. The exceptions being that the students are unusually competent or they are already familiar with your expectations from prior writing experiences.

download

Discourse Markers and ESL

Discourse markers are used in writing to help organize ideas. They are often those “little words” that native speakers use effortlessly as they communicate but are misunderstood by ESL speakers. This post will provide examples of various discourse markers.

Logical Sequence

Logical sequence discourse markers are used to place ideas in an order that is comprehensible to the listener/reader. They can be summative for concluding a longer section or resultative which is used to indicate the effect of something.

Examples of summative discourse markers includes

  • overall, to summarize, therefore, so far

An example of summarize discourse markers is below. The bold word is the marker.

Smoking causes cancer. Studies show that people who smoke have higher rates of lung, esophagus, and larynx. Therefore, it is dangerous to smoke.

The paragraph is clear. The marker “Therefore” is summarizing what was said in the prior two sentences.

Examples of resultative discourse markers includes the following

  • so, consequently, therefore, as a result

An example of resultative discourse markers is below. The bold word is the marker.

Bob smoked cigarettes for 20 years. As a result,he developed lung cancer

Again, the second sentence with the marker “As a result” explain the consequence of smoking for 20 years.

Constrastive

Constrastive markers are words that  indicate that the next idea is the opposite of the previous idea. There are three ways that this can be done. Replacive share an alternative idea, antithetic markers share ideas in opposition to the previous one. Lastly, concessive markers share unexpected information given the context.

Below are several words and or phrases that are replacive markers

  • alternatively, on  the other hand, rather

Below is an example of a replacive contrast marker used in a short paragraph. Bold word is the replacive

Smoking is a deadly lifestyle choice. This bad habit has killed millions of people. On the other hand, a vegetarian lifestyle has been found to be beneficial to the health of many people

Antithetic markers include the following

  • conversely, instead, by contrast

Below is an example of antithetic marker used in a paragraph

A long and healthy life is unusually for those who choose to smoke. Instead, people who smoke live lives that are shorter and more full of disease and sickness.

Concsessive markers includes some of the words below

  • In spite of, nevertheless, anyway, anyhow

Below is an example of a concessive marker used in a paragraph

Bob smoked for 20 years. In spite of this, he was an elite athlete and had perfect health.

Conclusion

Discourse markers play a critical role in communicating the  finer points of ideas hat are used in communication. Understanding how these words are used can help ESL students in comprehending what they hear and read.

Developing a Data Analysis Plan

It is extremely common for beginners and perhaps even experience researchers to lose track of what they are trying to achieve or do when trying to complete a research project. The open nature of research allows for a multitude of equally acceptable ways to complete a project. This leads to  an inability to make decision and or stay on course when doing research.

One way to reduce and eliminate the roadblock to decision making and focus in research is to develop a plan. In this post we will look at one version of a data analysis plan.

Data Analysis Plan

A data analysis plan includes many features of a research project in it with a particular emphasis on mapping out how research questions will be answered and what is necessary to answer the question. Below is a sample template of the analysis plan.

analysis-plan-page-001-2

The majority of this diagram should be familiar to someone who has ever done research. At the top, you state the problem, this is the overall focus of the paper. Next comes the purpose, the purpose is the over-arching goal of a research project.

After purpose comes the research questions. The research questions are questions about the problem that are answerable. People struggle with developing clear and answerable research questions. It is critical that research questions are written in a way that they can be answered and that the questions are clearly derived from the problem. Poor questions means poor or even no answers.

After the research questions it is important to know what variables are available for the entire study and specifically what variables can be used to answer each research question. Lastly, you must indicate what analysis or visual you will develop in order to answer your research questions about your problem. This requires you to know how you will answer your research questions

Example

Below is an example of a completed analysis plan for  simple undergraduate level research paper

example-analysis-plan-page-001

In the example above, the  student want to understand the perceptions of university students about the cafeteria food quality and their satisfaction with the university. There were four research questions, a demographic descriptive question, a descriptive question about the two main variables, a comparison question, and lastly a relationship question.

The variables available for answering the questions are listed of to the left  side. Under that, the student indicates the variables needed to answer each question. For example, the demographic variables of sex, class level, and major are needed to answer the question about the demographic profile.

The last section is the analysis. For the demographic profile the student found the percentage of the population in each sub group of the demographic variables.

Conclusion

A data analysis plan provides an excellent way to determine what needs to be done to complete a study. It also helps a researcher to clearly understand what they are trying to do and provides a visuals for those who the research wants to communicate  with about the progress of a study.

images

Developing Purpose to Improve Reading Comprehension

Many of us are familiar with the experience of being able to read almost anything but perhaps not being able to understand what it is that we read. As the ability to sound out words becomes automatic there is not always a corresponding increase in being able to comprehend text.

It is common, especially in school, for students to be required to read something without much explanation. For more mature readers, what is often needed is a sense of purpose for reading. In this post, we will look at ways to develop a sense of purpose in reading.

Purpose Provides Motivation

Students who know why they are reading know what the are looking for while reading. The natural result of this is that students are less likely to get distract by information that is not useful for them.

For example, if the teacher tells their students to read “the passage and identifying all of the animals in it and be ready to share tomorrow.” Students know what they are suppose to do (identifying all animals in the passage) and why they need to do it (share tomorrow). the clear directions prevent students from getting distracted by other information in the reading.

Providing purpose doesn’t necessarily require the students love and enjoy the rational but it is helpful if a teacher can provide a purpose that is motivating.

Different Ways to Instill Purpose

In addition to the example above there are several quick ways to provide purpose.

  • Provide vocabulary list-Having the students search for the meaning of specific words provides a clear sense of purpose and provides a context in which the words appear naturally. However, students often get bogged down with the minutia of the definitions and completely miss the overall meaning of the reading passage. This approach is great for beginning and low intermediate readers.
  • Identifying the main ideas in the reading-This is a great way to gets students to see the “big picture” of a reading. It is especially useful for short to moderately long readings such as articles and perhaps chapters and useful for intermediate to advanced readers in particular.
  •  Let students develop their own questions about the text-By fair my most favorite strategy. Students will initial skim the passage to get an idea of what it is about. After this, they develop several questions about the passage that they want to find the answer too. While reading the passage, the students answer their own questions. This approach provides opportunities for metacognition as well developing autonomous learning skills. This strategy is for advanced readers who are comfortable with vocabulary and summarizing text.

Conclusion

Students, like most people,  need a raison de faire (reason to do) something. The teacher can provide this, which has benefits. Another approach would be to allow the students to develop their own purpose. How this is done depends on the philosophy of the teacher as well as the abilities and tendencies of the students

Linear Discriminant Analysis in R

In this post we will look at an example of linear discriminant analysis (LDA). LDA is used to develop a statistical model that classifies examples in a dataset. In the example in this post, we will use the “Star” dataset from the “Ecdat” package. What we will do is try to predict the type of class the students learned in (regular, small, regular with aide) using their math scores, reading scores, and the teaching experience of the teacher. Below is the initial code

library(Ecdat)
library(MASS)
data(Star)

We first need to examine the data by using the “str” function

str(Star)
## 'data.frame':    5748 obs. of  8 variables:
##  $ tmathssk: int  473 536 463 559 489 454 423 500 439 528 ...
##  $ treadssk: int  447 450 439 448 447 431 395 451 478 455 ...
##  $ classk  : Factor w/ 3 levels "regular","small.class",..: 2 2 3 1 2 1 3 1 2 2 ...
##  $ totexpk : int  7 21 0 16 5 8 17 3 11 10 ...
##  $ sex     : Factor w/ 2 levels "girl","boy": 1 1 2 2 2 2 1 1 1 1 ...
##  $ freelunk: Factor w/ 2 levels "no","yes": 1 1 2 1 2 2 2 1 1 1 ...
##  $ race    : Factor w/ 3 levels "white","black",..: 1 2 2 1 1 1 2 1 2 1 ...
##  $ schidkn : int  63 20 19 69 79 5 16 56 11 66 ...
##  - attr(*, "na.action")=Class 'omit'  Named int [1:5850] 1 4 6 7 8 9 10 15 16 17 ...
##   .. ..- attr(*, "names")= chr [1:5850] "1" "4" "6" "7" ...

We will use the following variables

  • dependent variable = classk (class type)
  • independent variable = tmathssk (Math score)
  • independent variable = treadssk (Reading score)
  • independent variable = totexpk (Teaching experience)

We now need to examine the data visually by looking at histograms for our independent variables and a table for our dependent variable

hist(Star$tmathssk)

025a4efb-21eb-42d8-8489-b4de4e225e8c.png

hist(Star$treadssk)

c25f67b0-ea43-4caa-91a6-2f165cd815a5.png

hist(Star$totexpk)

12ab9cc3-99d2-41c1-897d-20d5f66a8424

prop.table(table(Star$classk))
## 
##           regular       small.class regular.with.aide 
##         0.3479471         0.3014962         0.3505567

The data mostly looks good. The results of the “prop.table” function will help us when we develop are training and testing datasets. The only problem is with the “totexpk” variable. IT is not anywhere near to be normally distributed. TO deal with this we will use the square root for teaching experience. Below is the code

star.sqrt<-Star
star.sqrt$totexpk.sqrt<-sqrt(star.sqrt$totexpk)
hist(sqrt(star.sqrt$totexpk))

374c0dad-d9b4-4ba5-9bcb-d1f19895e060

Much better. We now need to check the correlation among the variables as well and we will use the code below.

cor.star<-data.frame(star.sqrt$tmathssk,star.sqrt$treadssk,star.sqrt$totexpk.sqrt)
cor(cor.star)
##                        star.sqrt.tmathssk star.sqrt.treadssk
## star.sqrt.tmathssk             1.00000000          0.7135489
## star.sqrt.treadssk             0.71354889          1.0000000
## star.sqrt.totexpk.sqrt         0.08647957          0.1045353
##                        star.sqrt.totexpk.sqrt
## star.sqrt.tmathssk                 0.08647957
## star.sqrt.treadssk                 0.10453533
## star.sqrt.totexpk.sqrt             1.00000000

None of the correlations are too bad. We can now develop our model using linear discriminant analysis. First, we need to scale are scores because the test scores and the teaching experience are measured differently. Then, we need to divide our data into a train and test set as this will allow us to determine the accuracy of the model. Below is the code.

star.sqrt$tmathssk<-scale(star.sqrt$tmathssk)
star.sqrt$treadssk<-scale(star.sqrt$treadssk)
star.sqrt$totexpk.sqrt<-scale(star.sqrt$totexpk.sqrt)
train.star<-star.sqrt[1:4000,]
test.star<-star.sqrt[4001:5748,]

Now we develop our model. In the code before the “prior” argument indicates what we expect the probabilities to be. In our data the distribution of the the three class types is about the same which means that the apriori probability is 1/3 for each class type.

train.lda<-lda(classk~tmathssk+treadssk+totexpk.sqrt, data = 
train.star,prior=c(1,1,1)/3)
train.lda
## Call:
## lda(classk ~ tmathssk + treadssk + totexpk.sqrt, data = train.star, 
##     prior = c(1, 1, 1)/3)
## 
## Prior probabilities of groups:
##           regular       small.class regular.with.aide 
##         0.3333333         0.3333333         0.3333333 
## 
## Group means:
##                      tmathssk    treadssk totexpk.sqrt
## regular           -0.04237438 -0.05258944  -0.05082862
## small.class        0.13465218  0.11021666  -0.02100859
## regular.with.aide -0.05129083 -0.01665593   0.09068835
## 
## Coefficients of linear discriminants:
##                      LD1         LD2
## tmathssk      0.89656393 -0.04972956
## treadssk      0.04337953  0.56721196
## totexpk.sqrt -0.49061950  0.80051026
## 
## Proportion of trace:
##    LD1    LD2 
## 0.7261 0.2739

The printout is mostly readable. At the top is the actual code used to develop the model followed by the probabilities of each group. The next section shares the means of the groups. The coefficients of linear discriminants are the values used to classify each example. The coefficients are similar to regression coefficients. The computer places each example in both equations and probabilities are calculated. Whichever class has the highest probability is the winner. In addition, the higher the coefficient the more weight it has. For example, “tmathssk” is the most influential on LD1 with a coefficient of 0.89.

The proportion of trace is similar to principal component analysis

Now we will take the trained model and see how it does with the test set. We create a new model called “predict.lda” and use are “train.lda” model and the test data called “test.star”

predict.lda<-predict(train.lda,newdata = test.star)

We can use the “table” function to see how well are model has done. We can do this because we actually know what class our data is beforehand because we divided the dataset. What we need to do is compare this to what our model predicted. Therefore, we compare the “classk” variable of our “test.star” dataset with the “class” predicted by the “predict.lda” model.

table(test.star$classk,predict.lda$class)
##                    
##                     regular small.class regular.with.aide
##   regular               155         182               249
##   small.class           145         198               174
##   regular.with.aide     172         204               269

The results are pretty bad. For example, in the first row called “regular” we have 155 examples that were classified as “regular” and predicted as “regular” by the model. In rhe next column, 182 examples that were classified as “regular” but predicted as “small.class”, etc. To find out how well are model did you add together the examples across the diagonal from left to right and divide by the total number of examples. Below is the code

(155+198+269)/1748
## [1] 0.3558352

Only 36% accurate, terrible but ok for a demonstration of linear discriminant analysis. Since we only have two-functions or two-dimensions we can plot our model.  Below I provide a visual of the first 50 examples classified by the predict.lda model.

plot(predict.lda$x[1:50])
text(predict.lda$x[1:50],as.character(predict.lda$class[1:50]),col=as.numeric(predict.lda$class[1:100]))
abline(h=0,col="blue")
abline(v=0,col="blue")

Rplot01.jpeg

The first function, which is the vertical line, doesn’t seem to discriminant anything as it off to the side and not separating any of the data. However, the second function, which is the horizontal one, does a good of dividing the “regular.with.aide” from the “small.class”. Yet, there are problems with distinguishing the class “regular” from either of the other two groups.  In order improve our model we need additional independent variables to help to distinguish the groups in the dependent variable.

download-27

Factors that Affect Pronunciation

Understanding and teaching pronunciation has been controversial in TESOL for many years. At one time, pronunciation was taught in a high bottom-up behavioristic manner. Students were drilled until they had the appropriate “accent” (American, British, Australian, etc.). To be understood meant capturing one of the established accents.

Now there is more of an emphasis on top-down features such as stress, tone, and rhythm. There is now an emphasis on being more non-directive and focus not on the sounds being generate by the student but the comprehensibility of what they say.

This post will explain several common factors that influence pronunciation. This common factors include

  • Motivation & Attitude
  • Age & Exposure
  • Native language
  • Natural ability

Motivation & Language Ego

For many people, it’s hard to get something done when they don’t care. Excellent pronunciation is often affected by motivation. If the student does not care they will probably not improve much. This is particularly true when the student reaches a level where people can understand them. Once they are comprehensible many students loss interests in further pronunciation development

Fortunately, a teacher can use various strategies to motivate students to focus on improving their pronunciation. Creating relevance is one way in which students intrinsic motivation can be developed.

Attitude is closely related to motivation. If the students have negative views of the target language and are worried that learning the target language is a cultural threat this will make language acquisition difficult. Students need to understand that language learning does involve learning of the culture of the target language.

Age & Exposure

Younger students, especially 1-12 years of age, have the best chance at developing native-like pronunciation. If the student is older they will almost always retain an “accent.” However, fluency and accuracy can achieve the same levels regards of the initial age at which language study began.

Exposure is closely related to age. The more authentic experiences that a student has with the language the better their pronunciation normally is. The quality of the exposure is the the naturalness of the setting and the actual engagement of the student in hearing and interacting with the language.

For example, an ESL student who lives in America will probably have much more exposure to the actual use of English than someone in China. This in turn will impact their pronunciation.

Native Language

The similarities between the mother tongue and the  target language can influence pronunciation. For example, it is much easier to move from Spanish to English pronunciation than from Chinese to English.

For the teacher, understanding the sound system’s of your students’ languages can help a great deal in helping them with difficulties in pronunciation.

Innate Ability

Lastly, some just get it while others don’t. Different students have varying ability to pick up the sounds of another language. A way around this is helping students to know their own strengths and weaknesses. This will allow them to develop strategies to improve.

Conclusion

Whatever your position on pronunciation. There are ways to improve your students pronunciation if you are familiar with what influences it. The examples in this post provided some basic insight into what affects this.

download

Tips for Developing Techniques for ESL Students

Technique development is the actual practice of TESOL. All of the ideas expressed in approaches and methods are just ideas. The development of a technique is the application of knowledge in a way that benefits the students. This post would provide ideas and guidelines on developing speaking and listening techniques.

Techniques should Encourage Intrinsic Motivation

When developing techniques for your students. The techniques need consider the goals, abilities, and interest of the students whenever possible. If the students are older adults who want to develop conversational skills heavy stress on reading would be demotivating. This is  because reading was not on of the students goals.

When techniques do not align with student goals there is a lost of relevance, which is highly demotivating. Of course, as the teacher, you do not always give them what they want but general practice suggest some sort of dialog over the direction of the techniques.

Techniques should be Authentic

The point here is closely related to the first one on motivation. Techniques should generally be as authentic as possible. If you have a choice between real text and textbook it is usually better to go with real world text.

Realistic techniques provide a context in which students can apply their skills in a setting that is similar to the wold but within the safety of a classroom.

Techniques should Develop Skills through Integration and Isolation

When developing techniques there should be a blend of techniques that develop skill in an integrated manner, such as listening and speaking and or some other combination. There should also be ab equal focus on techniques that develop on one skill such as writing.

The reason for this is so that the students develop balanced skills. Skill-integrated techniques are highly realistic but students can use one skill to compensate for weaknesses in others. For example, a talker just keeps on talking without ever really listening.

When skills our work on in isolation it allows for deficiencies to be clearly identified and work on. Doing this will only help the students in integrated situations.

Encourage Strategy Development

Through techniques students need to develop their abilities to learn on their own autonomously. This can be done through having students practice learning strategies you have shown them in the past. Examples include context clues, finding main ideas, identifying  facts from opinions etc

The development of skills takes a well planned approach to how you will teach and provide students with the support to succeed.

Conclusion

Understanding some of the criteria that can be used in creating techniques for the ESL classroom is beneficial for teachers. The ideas presented here provide some basic guidance for enabling technique development.

Generalized Additive Models in R

In this post, we will learn how to create a generalized additive model (GAM). GAMs are non-parametric generalized linear models. This means that linear predictor of the model uses smooth functions on the predictor variables. As such, you do not need to specific the functional relationship between the response and continuous variables. This allows you to explore the data for potential relationships that can be more rigorously tested with other statistical models

In our example, we will use the “Auto” dataset from the “ISLR” package and use the variables “mpg”,“displacement”,“horsepower”,and “weight” to predict “acceleration”. We will also use the “mgcv” package. Below is some initial code to begin the analysis

library(mgcv)
library(ISLR)
data(Auto)

We will now make the model we want to understand the response of “accleration” to the explanatory variables of “mpg”,“displacement”,“horsepower”,and “weight”. After setting the model we will examine the summary. Below is the code

model1<-gam(acceleration~s(mpg)+s(displacement)+s(horsepower)+s(weight),data=Auto)
summary(model1)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## acceleration ~ s(mpg) + s(displacement) + s(horsepower) + s(weight)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.54133    0.07205   215.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                   edf Ref.df      F  p-value    
## s(mpg)          6.382  7.515  3.479  0.00101 ** 
## s(displacement) 1.000  1.000 36.055 4.35e-09 ***
## s(horsepower)   4.883  6.006 70.187  < 2e-16 ***
## s(weight)       3.785  4.800 41.135  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.733   Deviance explained = 74.4%
## GCV = 2.1276  Scale est. = 2.0351    n = 392

All of the explanatory variables are significant and the adjust r-squared is .73 which is excellent. edf stands for “effective degrees of freedom”. This modified version of the degree of freedoms is due to the smoothing process in the model. GCV stands for generalized cross validation and this number is useful when comparing models. The model with the lowest number is the better model.

We can also examine the model visually by using the “plot” function. This will allow us to examine if the curvature fitted by the smoothing process was useful or not for each variable. Below is the code.

plot(model1)

d71839c6-1baf-4886-98dd-7de8eac27855f4402e71-29f4-44e3-a941-3102fea89c78.pngcdbb392a-1d53-4dd0-8350-8b6d65284b00.pngbf28dd7a-d250-4619-bea0-5666e031e991.png

We can also look at a 3d graph that includes the linear predictor as well as the two strongest predictors. This is done with the “vis.gam” function. Below is the code

vis.gam(model1)

2136d310-b3f5-4c78-b166-4f6c4a1d0e12.png

If multiple models are developed. You can compare the GCV values to determine which model is the best. In addition, another way to compare models is with the “AIC” function. In the code below, we will create an additional model that includes “year” compare the GCV scores and calculate the AIC. Below is the code.

model2<-gam(acceleration~s(mpg)+s(displacement)+s(horsepower)+s(weight)+s(year),data=Auto)
summary(model2)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## acceleration ~ s(mpg) + s(displacement) + s(horsepower) + s(weight) + 
##     s(year)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.54133    0.07203   215.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                   edf Ref.df      F p-value    
## s(mpg)          5.578  6.726  2.749  0.0106 *  
## s(displacement) 2.251  2.870 13.757 3.5e-08 ***
## s(horsepower)   4.936  6.054 66.476 < 2e-16 ***
## s(weight)       3.444  4.397 34.441 < 2e-16 ***
## s(year)         1.682  2.096  0.543  0.6064    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.733   Deviance explained = 74.5%
## GCV = 2.1368  Scale est. = 2.0338    n = 392
#model1 GCV
model1$gcv.ubre
##   GCV.Cp 
## 2.127589
#model2 GCV
model2$gcv.ubre
##   GCV.Cp 
## 2.136797

As you can see, the second model has a higher GCV score when compared to the first model. This indicates that the first model is a better choice. This makes sense because in the second model the variable “year” is not significant. To confirm this we will calculate the AIC scores using the AIC function.

AIC(model1,model2)
##              df      AIC
## model1 18.04952 1409.640
## model2 19.89068 1411.156

Again, you can see that model1 s better due to its fewer degrees of freedom and slightly lower AIC score.

Conclusion

Using GAMs is most common for exploring potential relationships in your data. This is stated because they are difficult to interpret and to try and summarize. Therefore, it is normally better to develop a generalized linear model over a GAM due to the difficulty in understanding what the data is trying to tell you when using GAMs.

download-25

Listening Techniques for the ESL Classroom

Listening is one of the four core skills of language acquisition along with reading, writing, and speaking. This post will explain several broad categories of listening that can happen within the ESL classroom.

Reactionary Listening

Reactionary listening involves having the students listen to an utterance and repeat back to you as the teacher. The student is not generating any meaning. This can be useful perhaps for developing pronunciation in terms of speaking.

Common techniques that utilize reactionary listening are drills and choral speaking. Both of these techniques are commonly associated with audiolingualism.

Responsive Listening

Responsive listening  requires the student to create a reply to something that they heard. Not only does the student have to understand what was said but they must also be able to generate a meaningful reply. The response can be verbal such as answering a question and or non-verbal such as obeying a command.

Common techniques that are responsive in nature includes anything that involves asking questions and or obeying commands. As such, almost all methods and approaches have some aspect of responsive listening in them.

Discriminatory Listening

Discriminatory listening techniques involves listening that is selective. The listener needs to identify what is important from a dialog or monologue. The listener might need to identify the name of a person, the location of something, or develop the main idea of the recording.

Discriminatory listening is probably a universal technique used by almost everyone. It is also popular with English proficiency test such as the IELTS.

Intensive Listening

Intensive listening is focused on breaking down what the student has heard into various aspect of grammar and speaking. Examples include intonation, stress, phonemes, contractions etc.

This is more of an analytical approach to listening. In particular, using intensive listening techniques may be useful to help learners understand the nuances of the language.

Extensive Listening

Extensive listening is about listening to a monologue or dialog and developing an overall summary and comprehension of it.  Examples of this could be having students listening to a clip from a documentary or a newscast.

Again, this is so common in language teaching that almost all styles incorporate this in one way or another.

Interactive Listening

Interactive listening is the mixing of all of the previously mentioned types of listening simultaneously. Examples include role plays, debates, and various other forms of group work.

All of the examples mentioned require repeating what others say (reactionary), replying to to others comments (responsive),  identifying main ideas (discriminatory & extensive), and perhaps some focus on intonation and stress (intensive).  As such, interactive listening is the goal of listening in a second language.

Interactive listening is used by most methods most notable communicative language  teaching, which has had a huge influence on the last 40 years of TESOL.

Conclusion

The listening technique categories provided here gives some insight into how one can organize various listening experiences in the classroom. What combination of techniques to employ depends on many different factors but knowing what’s available empowers the teacher to determine what course of action to take.