Monthly Archives: February 2016

The Direct Method

In reaction to the grammar-translation approach that had been used for several centuries, many educators placed an emphasis on oral communication skills. By the late 19th century, the natural method was primarily a method that focused on oral skills.

Many methods are derived from the natural method approach. One of the most influential methods in language teaching that came from the natural method approach was the direct method in the late 19th century.  In this post, we will examine the characteristics of the direct method as well as its impact in teaching language.

Traits of Direct Method

The direct method stressed the use of only the target language in the classroom. Instead of using the students’ native language the teacher would demonstrate and use body language to express meaning. Due to this reliance on the target language, only common, everyday vocabulary was taught. As such, this method may not be appropriate for academic language learning.

Speaking and listening was the primary purpose of the direct method. These skills were developed through a question and answer approach. This supported the development of communication skills as well as strengthening comprehension.

Correct grammar was also important as was pronunciation. Grammar was taught inductively with the teacher sharing examples that illustrated the principle of the grammar lesson.

Impact of the Direct Method

The direct method was highly successful in private language schools were motivated students came to learn a language. However, this method never replicated this success in public schools. There are several reasons for this lack of broad-based success.

The direct method was lacking in any form of linguistic theory to support its principles. This method was basically developed by amateurs who were unfamiliar with the details of language learning but instead were trying to overcome problems strictly through the use of common sense rather than common sense with research.

The direct method also requires the use of native speaking teachers. This is not always possible. The strict avoidance of the students’ language was often too cumbersome when teaching for many people.

With these and other concerns, the direct method was mostly abandoned by the 1920s in Europe. This method was never popular in the US.

Conclusion

The direct method was perhaps the first major fad method in language teaching. For over 100 years language teaching went from one method to another as it searched for the perfect method for teaching language. As we well see in a future post, each method always claimed to be an improvement in relation to its predecessors. The reality is that there is no single best method but a collection of choices to be made depending on the situation one is facing.

Big Data & Data Mining

Dealing with large amounts of data has been a problem throughout most of human history. Ancient civilizations had to keep large amounts of clay tablets, papyrus, steles, parchments, scrolls etc. to keep track of all the details of an empire.

However, whenever it seemed as though there would be no way to hold any more information a new technology would be developed to alleviate the problem. When people could not handle keeping records on stone paper scrolls were invented. When scrolls were no longer practical books were developed. When hand-copying books were too much the printing press came along.

By the mid 20th century there were concerns that we would not be able to have libraries large enough to keep all of the books that were being developed. With this problem came the solution of the computer. One computer could hold the information of several dozen if not hundreds of libraries.

Now even a single computer can no longer cope with all of the information that is constantly being developed for just a single subject. This has lead to computers working together in networks to share the task of storing information. With data spread across several computers, it makes analyzing data much more challenging. It was now necessary to mine for useful information in a way that people used to mine for gold in the 19th century.

Big data is data that is too large to fit within the memory of a single computer. Analyzing data that is spread across a network of databases takes skills different from traditional statistical analysis. This post will explain some of the characteristics of big data as well as data mining.

Big Data Traits

The three main traits of big data are volume, velocity, and variety. Volume describes the size of big data, which means data to big to be on only one computer. Velocity is about how fast the data can be processed. Lastly, variety different types of data. common sources of big data includes the following

  • Metadata from visual sources such as cameras
  • Data from sensors such as in medical equipment
  • Social media data such as information from google, youtube or facebook

Data Mining

Data mining is the process of discovering a model in a big dataset. Through the development of an algorithm, we can find specific information that helps us to answer our questions. Generally, there are two ways to mine data and these are extraction and summarization.

Extraction is the process of pulling specific information from a big dataset. For example, if we want all the addresses of people who bought a specific book from Amazon the result would be an extraction from a big data set.

Summarization is reducing a dataset to describe it. We might do a cluster analysis in which similar data is combined on a characteristic. For example, if we analyze all the books people ordered through Amazon last year we might notice that one cluster of groups buys mostly religious books while another buys investment books.

Conclusion

Big data will only continue to get bigger. Currently, the response has been to just use more computers and servers. As such, there is now a need for finding information on many computers and servers. This is the purpose of data mining, which is to find pertinent information that answers stakeholders questions.

Reform Movement in Language Teaching

By the late 19th century, there was a general push for making strong changes to how language was taught. There was a resurgence in linguistics and phonetics that serve as major influences on language teaching. This post will share some of the major reform factors of this time period.

International Phonetic Association

In the 1880’s, the International Phonetic Association was founded. Not only did this organization developed the International Phonetic Alphabet. They also laid down several influential principles of language teaching. For example, the IPA believed that the focus of learning a language should be on the spoken language. This is another indication of the shift away from reading and writing.

The focus on spoken language also led to recommending the use of proper pronunciation and the use of conversation in the classroom. There was still a prescriptive emphasis in developing “proper” speaking skills as though there is one standard for how to talk. This emphasis on verbal accuracy may have come from the stress of accuracy in the Grammar-Translation Method.

The IPA also encouraged the teaching of grammar inductively. This means to teach grammatical concepts through the use of examples or applications of the rules. From these examples, students would extract the rule for themselves. This is a much more engaging way to teach details such as rules in comparison to the standard deductive approach in which the rule is given followed by applications of it.

Other Reform Principles

There are several other significant reforms. One key idea was the need to teach language in a matter that was simple to complex in design. One has to wonder how language could have been taught with teaching from simple to more complex content. However, this principle may have been simply stating something that had been taken for granted.

Another reform idea was a focus on reading the language before seeing it in writing. This is in contrast to the focus on text by the Grammar-Translation method. Lastly, learning should happen in context. A focus on context became a major topic of controversy in education in general in the 20th century.

One last major reform that brought an end to the Grammar-Translation Method was the belief that translation should be avoided. Translation was at the heart of language teaching up until this point. Such a stance as this may have been highly shocking for its time as it was a pushing against a tradition that dated back to the 16th century.

Conclusion

Change is a part of life. The reforms brought about in language teaching at the end of the 19th century were for the purpose of improving language teaching. The primary desire was not to throw away what had been done before. Rather, the goal was to further help in the improvement of language teaching.

Reaction Toward Grammar-Translation

By the mid 19th century, many language educators began to react negatively towards the grammar-translation method. This post will examine several concerns of the grammar-translation model and the proposed early solutions to these concerns.

The Problems

Among some of the problems people had with grammar-translation includes was the inability to communicate verbally and lack of context. The lack of verbal communication was a major problem particularly when grammar-translation was used to teach living languages such as English. For many, learning a living language involves learning to speak it and the grammar-translation model does not provide this.

A closely related problem was a lack of context. A large part of communication is the setting in which it takes place. Another term for this is pragmatics. The setting along with body language (paralinguistic features) determines a large portion of understanding in communication. This is all ignored with the grammar-translation method as it is focused on text exclusively.

Proposed Solutions

Several 19th-century language teaching innovators offered answers to these problems. Prendergast was one of the first to notice how children learn language through context. He also found that children memorize commonly use phrases for future use. From these two observations, Pendergast proposed a structural approach to language learning in which the most basic units of a language are taught first followed by more complex ideas.

Gouin also studied how children learn language He proposed that language learning was easiest through using language to accomplish sequenced events that were related. For example, students might learn several phrases using the word door such as “I walk toward the door” and “I stop at the door”. Students would then learn the verb of such phrases like “I walk” and “I stop”. This experience happens in several different ways in order to help the student understand what “walk” and “stop” mean.

Gouin also supported the use of paralinguistic features such as gesturing in order to help explain ideas in a conversation with students. This support of body language influenced several methods of teaching English.

Conclusion

The reformers of the 19th century notice something about language that is obvious to us today, and that is the need to learn to communicate verbally.This led to many proposed reforms. However, few have heard of these reforms as they did not spread throughout the world of language teaching. This is due to inferior ways of communicating when compared today.

Though lacking recognition. The reforms suggested in the 19th century have become a part of standard practice for any teachers today.

Grammar Translation Method

The grammar-translation was developed through the teaching of Latin. This post will explain some of the traits of the grammar-translation model as well as reactions towards it.

Characteristics

The goal in grammar-translation is to learn read and write another language for the sake of developing mental discipline. This is consistent with the perennialist worldview of education at the time. Learning a language is focused on grammar rules used in manipulating the meaning of the text.

As such, listening and speaking are not a focus. This leads to the students’ native language being used as the mode of instruction and the foreign language is strictly for other purposes. A typical lesson involves copious amounts of translating with a goal of high accuracy.

Grammar was taught deductively which means that the teacher always explained the rules for the students who would then apply them. This is in contrast to discovery learning which relies on students learning principles of a lesson themselves.

Impact

Grammar-translation was essential the first formalized way of teaching a language. Even today, this approached is used for the teaching of English as well as many “dead” languages such as Latin, Koine Greek, and Classical Hebrew.

The result of this approach to learning a language was an endless amount of vocabulary without context combined with an emphasis on memorizing.  Many a pastor and theologian bemoan their days of taking biblical languages. This was partially due to how the language was taught. Many programs require memorizing an extensive list of word and declensions even though there are dictionaries, lexicons, and concordances readily available.

There are some advantages to this approach. For learning to communicate on an academic level via writing this method is supreme. This makes sense as the student does not have to develop speaking and listening skills. In addition, understanding the rules of a language provides insights into how and why of using it.

The grammar-translation method was easy to administer for teachers while boring for students. For teachers who lack verbal ability, it allows them to provide some sort of understanding of the language to their students. This method is also beneficial to large classes where it is difficult to monitor behavior.

With time, language teaching was becoming more and more important. Combine this with the dissatisfaction that was arising from the grammar-translation and there arises a shift and push back against the grammar-translation.

The Influences of Latin in TESOL

There are probably many TESOL teachers who are perhaps unaware of the role Latin has played in shaping the world of TESOL today. Latin has had a tremendous influence on how language teaching has been shaped as Latin was one of the first languages that were systematically taught on a large scale. As such, Latin provided the foundation for how language was taught for several hundred years.

Latin at its Role in Language Teaching

Speaking several languages was the norm for most of known history in most parts of the world such as Europe. However, with the dawn of empires such as the Greek and Romans, there came a need to have a dominating language over local languages.

The language of Rome was primarily Latin. As such, this led Latin to the spreading of Latin throughout the Western world. What was unique was how long the Roman Empire lasted. After over 1000 years, Latin was the language of education, business, and government. It was embedded in tradition and not just an outside language imposed on locals.

With the decline of the Roman empire came a growth in the use of other languages in Europe such as English, French, Italian, etc. This contributed to Latin being taught as a subject because of the prominence it uses to have. Change is difficult and abandoning a language that was so ingrained in Western civilization was not easy for scholars.

Another reason that Latin was still taught after its decline was for purposes of strengthening the mind. Educators believed that study of Latin would improve intellectual prowess of students because of the challenge of learning it.

The Teaching of Latin

Latin was taught to young people through a focus on grammar rules, declension, and conjugation of verbs. Students also translated passages to and from Latin to developing writing skills.

A deductive approach was used in developing a knowledge of the grammar. Students were taught the rules of the grammar first and then provided with opportunities to apply them. There was no discovery or inductive approaches to learning.

Furthermore, students only learned to read and write Latin. This is partly due to the fact that Latin had died as a verbal language. Therefore, there was no development of conversational skills or practical application.

Latin and Modern Language Teaching

The approach of Latin with its focus on grammar and translation was how other languages were first taught by the 19th century. Since there was no other example of how to approach language teaching it only made sense to copy how Latin was taught. Everybody was focused on text but never on context.

People learned to communicate in through text even though they were studying living languages.  Every language was taught as a mental exercise rather than as a skill for practical use.

Conclusion

The teaching of Latin led directly to the development of the grammar-translation method. This method laid the foundation for reactionary methods that are a part of the field of TESOL.

Boosting in R

Boosting is a technique used to sort through many predictors in order to find the strongest through weighing them. In order to do this, you tell R to use a specific classifier such as a tree or regression model. R than makes multiple models or trees while trying to reduce the error in each model as much as possible. The weight of each predictor is based on the amount of error it reduces as an average across the models.

We will now go through an example of boosting use the “College” dataset from the “ISLR” package.

Load Packages and Setup Training/Testing Sets

First, we will load the required packages and create the needed datasets. Below is the code for this.

library(ISLR); data("College");library(ggplot2);
library(caret)
intrain<-createDataPartition(y=College$Grad.Rate, p=0.7,
 list=FALSE)
trainingset<-College[intrain, ]; testingset<-
 College[-intrain, ]

Develop the Model

We will now create the model. We are going to use all of the variables in the dataset for this example to predict graduation rate. To use all available variables requires the use of the “.” symbol instead of listing every variable name in the model. Below is the code.

Model <- train(Grad.Rate ~., method='gbm', 
 data=trainingset, verbose=FALSE)

The method we used is ‘gbm’ which stands for boosting with trees. This means that we are using the boosting feature for making decision trees.

Once the model is created you can check the results by using the following code

summary(Model)

The output is as follows (for the first 5 variables only).

                    var    rel.inf
Outstate       Outstate 36.1745640
perc.alumni perc.alumni 14.0532312
Top25perc     Top25perc 13.0194117
Apps               Apps  5.7415103
F.Undergrad F.Undergrad  5.7016602

These results tell you what the most influential variables are in predicting graduation rate. The strongest predictor was “Outstate”. This means that as the number of outstate students increases it leads to an increase in the graduation rate. You can check this by running a correlation test between ‘Outstate’ and ‘Grad.Rate’.

The next two variables are the percentage of alumni and top 25 percent. The more alumni the higher the grad rate and the more people in the top 25% the higher the grad rate.

A Visual

We will now make plot comparing the predicted grad rate with the actual grade rate. Below is the code followed by the plot.

qplot(predict(Model, testingset), Grad.Rate, data = testingset)

Rplot10

The model looks sound based on the visual inspection.

Conclusion

Boosting is a useful way to found out which predictors are strongest. It is an excellent way to explore a model to determine this for future processing.

Theories on Motivation

Motivation is the desire a person has to do something. There have been many different theories that have attempted to explain motivation. This post will look at some of the lesser know theories that have helped to shape views on motivation. In particular, we will look at the following theories.

  • Drive theory
  • Expectancy-Value theory
  • Self-Worth theory
  • Views on Control

Drive Theory

Drive theory is one of the simplest and earliest theories of motivation. In drive theory, there are three critical component.

  1. A need is noticed
  2. The need leads to a drive to do something
  3. The drive causes a behavior.

An example of this is someone who has a need for food. This leads to a desire to eat which culminates in the person finding food and actually eating.

The simplicity of the model was actually one of the criticisms of it. Many people there was more to motivation than just these three components.

Expectancy-Value Theory

Another influential theory in motivation is the expectancy-value theory. This theory states that the amount of motivation a person has depends on the expectation of what the person will get if the complete the activity. If the person highly values the expected reward the will be highly motivated and vice versa.

For example, if a parent promises a child a new bike if they learn how to ride on two wheels. If the child highly values the new bike they will be highly motivated to learn to ride a bike on two wheels because of the expectation of the reward of a new bicycle.

Self-Worth Theory

Self-worth theory tries to explain motivation through how a person sees their own ability.  If a person believes they have high ability they will put forth high effort. This often leads to excellent results.

This means that the opposite is true as well. If a person believes that have low ability they will not try hard and they will produce poor results. The difference lies in how each person sees their ability level.

Control

This last point involves a collection of related terms on motivation. Control has to do with a person’s perspective that they have authority over what they do and what happens to them. Two common terms related to control are locus of control and learned helplessness.

Locus of control has to do with a person’s perception of the control they have over their decisions and life. People with an internal locus of control believe they have the authority. People with an external locus of control believe that others control their destiny.

Highly motivated people have an internal locus of control and tend to be more assertive than people with an external locus of control.

Learned helplessness is a person becoming convinced that they cannot do something. This is often a result of an external locus of control. Individuals who accept a learned helplessness viewpoint are characterized by a lack of motivation and assertiveness.

Conclusion

Motivation is a critical part of teaching. This post provided insights into some basic concepts found in the realm of motivation.

Rapid Instructional Design

Instructional design is a critical component of education particularly in the field of e-learning. Instructional design can be defined as the application of learning principles in order to support the learning of students. To put it simply, instructional design involves designing the teaching in a way that improves learning.

In this post, we will look at one example of an instructional design. We will look at Dave Meiers’s Rapid Instructional Design (RID).

Meier’s RID model uses learning techniques that speed up learning and includes a learning environment that emphasizes practice, feedback, and experience rather than presentations. RID is focused on active learning rather than the traditional model of passive learning through such examples as lecturing.

The RID model has the following four phases

  • Preparation
  • Presentation
  • Practice
  • Performance

Preparation

Preparation is about preparing the learner for learning. In this first step, the teacher would share the big picture of the learning experience. This includes state the goals and benefits of the learning experience. Other activities at this step are to arouse the interest of the reader in an appropriate matter and to deal with any potential problems that would impede the learning.

How this can be done varies. Often, beginning a lesson with a story or illustration can arouse interest. Dealing with problem students could be one way to deal with potential barriers to learning.

Presentation

At the presentation step, the learners are first exposed to the new knowledge and or skill. Whereas traditional teaching focuses on content delivery, the RID model focus on interactive activities and discovery learning.

A primary goal of RID is to use and incorporate real world phenomenon into the teaching. For example, do not only talk about math but develop lessons from the real world involving people and companies for the students. This enhances relevancy.

Practice

Practice involves having the students use whatever they just learned. This is critical as this allows them to learn through trial-and-error. As they receive feedback on their progress the students develop mastery.

Practice is easy in such fields as math, science, and even music. For more abstract fields such as critical thinking, theology, and philosophy. Practice takes place via discussion or through expressing ideas in writing. Demonstrating thought through communicating ideas verbally and in writing are forms of practice for more abstract subjects.

Performance

Performance is the application of the skill in a real-world setting. This is also known as an authentic assessment. How this is done is discipline specific.

In education, performance includes such activities as the student teaching phase of a new teacher. This allows the student to apply many of the skills they learned during their teacher training. In music, the recital serves as an excellent model of performance.

Conclusion

The RID model is just one of many ways to guide the learners of students. The value of this model is in the simplicity of its approach and the emphasis on active learning.

Random Forest in R

Random Forest is a similar machine learning approach to decision trees. The main difference is that with random forest. At each node in the tree, the variable is bootstrapped. In addition, several different trees are made and the average of the trees are presented as the results. This means that there is no individual tree to analyze but rather a ‘forest’ of trees

The primary advantage of random forest is accuracy and prevent overfitting. In this post, we will look at an application of random forest in R. We will use the ‘College’ data from the ‘ISLR’ package to predict whether a college is public or private

Preparing the Data

First, we need to split our data into a training and testing set as well as load the various packages that we need. We have run this code several times when using machine learning. Below is the code to complete this.

library(ggplot2);library(ISLR)
library(caret)
data("College")
forTrain<-createDataPartition(y=College$Private, p=0.7, list=FALSE)
trainingset<-College[forTrain, ]
testingset<-College[-forTrain, ]

Develop the Model

Next, we need to setup the model we want to run using Random Forest. The coding is similar to that which is used for regression. Below is the code

Model1<-train(Private~Grad.Rate+Outstate+Room.Board+Books+PhD+S.F.Ratio+Expend, data=trainingset, method='rf',prox=TRUE)

We are using 7 variables to predict whether a university is private or not. The method is ‘rf’ which stands for “Random Forest”. By now, I am assuming you can read code and understand what the model is trying to do. For a refresher on reading code for a model please click here.

Reading the Output

If you type “Model1” followed by pressing enter, you will receive the output for the random forest

Random Forest 

545 samples
 17 predictors
  2 classes: 'No', 'Yes' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 545, 545, 545, 545, 545, 545, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD  
  2     0.8957658  0.7272629  0.01458794   0.03874834
  4     0.8969672  0.7320475  0.01394062   0.04050297
  7     0.8937115  0.7248174  0.01536274   0.04135164

Accuracy was used to select the optimal model using 
 the largest value.
The final value used for the model was mtry = 4.

Most of this is self-explanatory. The main focus is on the mtry, accuracy, and Kappa.

The shows several different models that the computer generated. Each model reports the accuracy of the model as well as the Kappa. The accuracy states how well the model predicted accurately whether a university was public or private. The kappa shares the same information but it calculates how well a model predicted while taking into account chance or luck. As such, the Kappa should be lower than the accuracy.

At the bottom of the output, the computer tells which mtry was the best. For our example, the best mtry was number 4. If you look closely, you will see that mtry 4 had the highest accuracy and Kappa as well.

Confusion Matrix for the Training Data

Below is the confusion matrix for the training data using the model developed by the random forest. As you can see, the results are different from the random forest output. This is because this model is predicting without bootstrapping

> predNew<-predict(Model1, trainingset) > trainingset$predRight<-predNew==trainingset$Private > table(predNew, trainingset$Private)
       
predNew  No Yes
    No  149   0
    Yes   0 396

Results of the Testing Data

We will now use the testing data to check the accuracy of the model we developed on the training data. Below is the code followed by the output

pred <- predict(Model1, testingset)
testingset$predRight<-pred==testingset$Private
table(pred, testingset$Private)
pred   No Yes
  No   48  11
  Yes  15 158

For the most part, the model we developed to predict if a university is private or not is accurate.

How Important is a Variable

You can calculate how important an individual variable is in the model by using the following code

Model1RF<-randomForest(Private~Grad.Rate+Outstate+Room.Board+Books+PhD+S.F.Ratio+Expend, data=trainingset, importance=TRUE)
importance(Model1RF)

The output tells you how much the accuracy of the model is reduced if you remove the variable. As such, the higher the number the more valuable the variable is in improving accuracy.

Conclusion

This post exposed you to the basics of random forest. Random forest is a technique that develops a forest of decisions trees through resampling. The results of all these trees are then averaged to give you an idea of which variables are most useful in prediction.

Getting and Keeping Student Attention

Getting students to focus and pay attention is a major problem in education. Fortunately, there are several strategies that a teacher can use to help students to pay attention. In this post, we will cover the following approaches for maintaining a student’s attention…

  • Indicate what is important
  • Increase intensity
  • Include novelty
  • Include movement

Importance

There are times when students are engaged but they don’t know what to do or what they are looking for. For example, a teacher may want students to summarize a paragraph. However, it is common for students to get focused on the details of the passage and never identify the main point.

To overcome this problem, a teacher may want to focus the student’s attention on questions that will guide the students to summarizing the paragraph. The questions break down the task of summarizing into individual steps. Below is an example

  1. What is the topic of the paragraph?
  2. What are some of the details the author includes in the paragraph?
  3. What is the main point of the paragraph?

The example above provides one way the task of summarizing can be broken down into several steps. This helps in focusing the students.

Raise the Intensity

Increasing the intensity has to do with the amount of stimulus a child receives while doing something. For example, if a child is struggling to write the letter ‘t’ you may have them say out loud how to write it before writing the letter. This exposes the child to new material both verbally and in a psychomotor way.

The goal of this approach is to engage more of the student’s senses in order to help them to pay attention.

Novelty

This approach is self-explanatory. Students pay attention much more closely to something they have not experienced before. The only limits to this approach are the imagination.

For example, if a teacher is teaching math to small children, they may choose to use manipulatives as a new way of reinforcing the content. Another option would be to incorporate simple word problems.  There is truly no limit in this strategy.

Movement

Movement can involve the students and or the teacher moving around. When the students move it can help in breaking the monotony of having to sit still.  Movement is even beneficial for adult students. A moving teacher, on the other hand, is a moving target the students can focus upon. It is normally wise to avoid staying in one place too long when teaching children for the sake of attention and classroom management.

Conclusion

These ideas are some of the basics for increasing attention. Naturally, there are other ways to deal with this challenge. However, a teacher chooses to deal with this problem, they need to determine if their approach works for their students

Type I and Type II Error

Hypothesis testing in statistics involves deciding whether to reject or not reject a null hypothesis. There are problems that can occur when making decisions about a null hypothesis. A researcher can reject a null when they should not reject it, which is called a type I error. The other mistake is not rejecting a null when they should have, which is a type II error. Both of these mistakes represent can seriously damage the interpretation of data.

An Example

The classic example that explains type I and type II errors is a courtroom. In a trial, the defendant is considered innocent until proven guilty. The defendant can be compared to the null hypothesis being true. The prosecutor job is to present evidence that the defendant is guilty. This is the same as providing statistical evidence to reject the null hypothesis which indicates that the null is not true and needs to be rejected.

There are four possible outcomes of our trial and our statistical test…

  1. The defendant can be declared guilty when they are really guilty. That’s a correct decision.This is the same as rejecting the null hypothesis.
  2. The defendant could be judged not guilty when they really are innocent. That’s a correct and is the same as not rejecting the null hypothesis.
  3. The defendant is convicted when they are actually innocent, which is wrong. This is the same as rejecting the null hypothesis when you should not and is know as a type I error
  4. The defendant is guilty but declared innocent, which is also incorrect. This is the same as not rejecting the null hypothesis when you should have. This is known as a type II error.

Important Notes

The probability of committing a type I error is the same as the alpha or significance level of a statistical test. Common values associated with alpha are o.1, 0.05, and 0.01. This means that the likelihood of committing a type I error depends on the level of the significance that the researcher picks.

The probability of committing a type II error is known as beta. Calculating beta is complicated as you need specific values in your null and alternative hypothesis. It is not always possible to supply this. As such, researcher often do not focus on type II error avoidance as they do with type I.

Another concern is that decrease the risk of committing one type of error increases the risk of committing the other. This means that if you reduce the risk of type I error you increase the risk of committing a type II error.

Conclusion

The risk of error or incorrect judgment of a null hypothesis is a risk in statistical analysis. As such, researchers need to be aware of these problems as they study data.

Decision Trees in R

Decision trees are useful for splitting data based into smaller distinct groups based on criteria you establish. This post will attempt to explain how to develop decision trees in R.

We are going to use the ‘College’ dataset found in the “ISLR” package. Once you load the package you need to split the data into a training and testing set as shown in the code below. We want to divide the data based on education level, age, and income

library(ISLR); library(ggplot2); library(caret)
data("College")
inTrain<-createDataPartition(y=College$education, 
 p=0.7, list=FALSE)
trainingset <- College[inTrain, ]
testingset <- College[-inTrain, ]

Visualize the Data

We will now make a plot of the data based on education as the groups and age and wage as the x and y variable. Below is the code followed by the plot. Please note that education is divided into 5 groups as indicated in the chart.

qplot(age, wage, colour=education, data=trainingset)
Rplot10
Create the Model

We are now going to develop the model for the decision tree. We will use age and wage to predict education as shown in the code below.

TreeModel<-train(education~age+income, method='rpart', data=trainingset)

Create Visual of the Model

We now need to create a visual of the model. This involves installing the package called ‘rattle’. You can install ‘rattle’ separately yourself. After doing this below is the code for the tree model followed by the diagram.

fancyRpartPlot(TreeModel$finalModel)

Rplot02

Here is what the chart means

  1. At the top is node 1 which is called ‘HS Grad” the decimals underneath is the percentage of the data that falls within the “HS Grad” category. As the highest node, everything is classified as “HS grad” until we begin to apply our criteria.
  2. Underneath nod 1 is a decision about wage. If a person makes less than 112 you go to the left if they make more you go to the right.
  3. Nod 2 indicates the percentage of the sample that was classified as HS grade regardless of education. 14% of those with less than a HS diploma were classified as a HS Grade based on wage. 43% of those with a HS diploma were classified as a HS grade based on income. The percentage underneath the decimals indicates the total amount of the sample placed in the HS grad category. Which was 57%.
  4. This process is repeated for each node until the data is divided as much as possible.

Predict

You can predict individual values in the dataset by using the ‘predict’ function with the test data as shown in the code below.

predict(TreeModel, newdata = testingset)

Conclusion

Prediction Trees are a unique feature in data analysis for determining how well data can be divided into subsets. It also provides a visual of how to move through data sequentially based on characteristics in the data.