APA Tables in R

Anybody who has ever had to do any writing for academic purposes or in industry has had to deal with APA formatting. The rules and expectations seem to be endless and always changing. If you are able to maneuver the endless list of rules you still have to determine what to report and how when writing an article.

There is a package in R that can at least take away the mystery of how to report ANOVA, correlation, and regression tables. This package is called “apaTables”. In this post, we will look at how to use this package for making tables that are formatted according to APA.

We are going to create examples of ANOVA, correlation, and regression tables using the ‘mtcars’ dataset. Below is the initial code that we need to begin.

library(apaTables)
data("mtcars")

ANOVA

We will begin with the results of ANOVA. In order for this to be successful, you have to use the “lm” function to create the model. If you are familiar with ANOVA and regression this should not be surprising as they both find the same answer using different approaches. After the “lm” function you must use the “filename” argument and give the output a name in quotations. This file will be saved in your R working directory. You can also provide other information such as the table number and confidence level if you desire.

There will be two outputs in our code. The output to the console is in R. A second output will be in a word doc. Below is the code.

apa.aov.table(lm(mpg~cyl,mtcars),filename = "Example1.doc",table.number = 1)
## 
## 
## Table 1 
## 
## ANOVA results using mpg as the dependent variable
##  
## 
##    Predictor      SS df      MS      F    p partial_eta2
##  (Intercept) 3429.84  1 3429.84 333.71 .000             
##          cyl  817.71  1  817.71  79.56 .000          .73
##        Error  308.33 30   10.28                         
##  CI_90_partial_eta2
##                    
##          [.56, .80]
##                    
## 
## Note: Values in square brackets indicate the bounds of the 90% confidence interval for partial eta-squared

Here is the word doc output

1.png

Perhaps you are beginning to see the beauty of using this package and its functions. The “apa.aov.table”” function provides a nice table that requires no formatting by the researcher.

You can even make a table of the means and standard deviations of ANOVA. This is similar to what you would get if you used the “aggregate” function. Below is the code.

apa.1way.table(cyl, mpg,mtcars,filename = "Example2.doc",table.number = 2)
## 
## 
## Table 2 
## 
## Descriptive statistics for mpg as a function of cyl.  
## 
##  cyl     M   SD
##    4 26.66 4.51
##    6 19.74 1.45
##    8 15.10 2.56
## 
## Note. M and SD represent mean and standard deviation, respectively.
## 

Here is what it looks like in word

1.png

Correlation 

We will now look at an example of a correlation table. The function for this is “apa.cor.table”. This function works best with only a few variables. Otherwise, the table becomes bigger than a single sheet of paper. In addition, you probably will want to suppress the confidence intervals to save space. There are other arguments that you can explore on your own. Below is the code

apa.cor.table(mtcars,filename = "Example3.doc",table.number = 3,show.conf.interval = F)
## 
## 
## Table 3 
## 
## Means, standard deviations, and correlations
##  
## 
##   Variable M      SD     1      2      3      4      5      6      7     
##   1. mpg   20.09  6.03                                                   
##                                                                          
##   2. cyl   6.19   1.79   -.85**                                          
##                                                                          
##   3. disp  230.72 123.94 -.85** .90**                                    
##                                                                          
##   4. hp    146.69 68.56  -.78** .83**  .79**                             
##                                                                          
##   5. drat  3.60   0.53   .68**  -.70** -.71** -.45**                     
##                                                                          
##   6. wt    3.22   0.98   -.87** .78**  .89**  .66**  -.71**              
##                                                                          
##   7. qsec  17.85  1.79   .42*   -.59** -.43*  -.71** .09    -.17         
##                                                                          
##   8. vs    0.44   0.50   .66**  -.81** -.71** -.72** .44*   -.55** .74** 
##                                                                          
##   9. am    0.41   0.50   .60**  -.52** -.59** -.24   .71**  -.69** -.23  
##                                                                          
##   10. gear 3.69   0.74   .48**  -.49** -.56** -.13   .70**  -.58** -.21  
##                                                                          
##   11. carb 2.81   1.62   -.55** .53**  .39*   .75**  -.09   .43*   -.66**
##                                                                          
##   8      9     10 
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##                   
##   .17             
##                   
##   .21    .79**    
##                   
##   -.57** .06   .27
##                   
## 
## Note. * indicates p < .05; ** indicates p < .01.
## M and SD are used to represent mean and standard deviation, respectively.
## 

Here is the word doc results

1.png

If you run this code at home and open the word doc in Word you will not see variables 9 and 10 because the table is too big by itself for a single page. I hade to resize it manually. One way to get around this is to delate the M and SD column and place those as rows below the table.

Regression

Our final example will be a regression table. The code is as follows

apa.reg.table(lm(mpg~disp,mtcars),filename = "Example4",table.number = 4)
## 
## 
## Table 4 
## 
## Regression results using mpg as the criterion
##  
## 
##    Predictor       b       b_95%_CI  beta    beta_95%_CI sr2 sr2_95%_CI
##  (Intercept) 29.60** [27.09, 32.11]                                    
##         disp -0.04** [-0.05, -0.03] -0.85 [-1.05, -0.65] .72 [.51, .81]
##                                                                        
##                                                                        
##                                                                        
##       r             Fit
##                        
##  -.85**                
##             R2 = .718**
##         95% CI[.51,.81]
##                        
## 
## Note. * indicates p < .05; ** indicates p < .01.
## A significant b-weight indicates the beta-weight and semi-partial correlation are also significant.
## b represents unstandardized regression weights; beta indicates the standardized regression weights; 
## sr2 represents the semi-partial correlation squared; r represents the zero-order correlation.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## 

Here are the results in word

1.png

You can also make regression tables that have multiple blocks or models. Below is an example

apa.reg.table(lm(mpg~disp,mtcars),lm(mpg~disp+hp,mtcars),filename = "Example5",table.number = 5)
## 
## 
## Table 5 
## 
## Regression results using mpg as the criterion
##  
## 
##    Predictor       b       b_95%_CI  beta    beta_95%_CI sr2  sr2_95%_CI
##  (Intercept) 29.60** [27.09, 32.11]                                     
##         disp -0.04** [-0.05, -0.03] -0.85 [-1.05, -0.65] .72  [.51, .81]
##                                                                         
##                                                                         
##                                                                         
##  (Intercept) 30.74** [28.01, 33.46]                                     
##         disp -0.03** [-0.05, -0.02] -0.62 [-0.94, -0.31] .15  [.00, .29]
##           hp   -0.02  [-0.05, 0.00] -0.28  [-0.59, 0.03] .03 [-.03, .09]
##                                                                         
##                                                                         
##                                                                         
##       r             Fit        Difference
##                                          
##  -.85**                                  
##             R2 = .718**                  
##         95% CI[.51,.81]                  
##                                          
##                                          
##  -.85**                                  
##  -.78**                                  
##             R2 = .748**    Delta R2 = .03
##         95% CI[.54,.83] 95% CI[-.03, .09]
##                                          
## 
## Note. * indicates p < .05; ** indicates p < .01.
## A significant b-weight indicates the beta-weight and semi-partial correlation are also significant.
## b represents unstandardized regression weights; beta indicates the standardized regression weights; 
## sr2 represents the semi-partial correlation squared; r represents the zero-order correlation.
## Square brackets are used to enclose the lower and upper limits of a confidence interval.
## 

Here is the word doc version

1.png

Conculsion 

This is a real time saver for those of us who need to write and share statistical information.

Advertisements

Reading Comprehension Strategies

Students frequently struggle with understanding what they read. There can be many reasons for this such as vocabulary issues, to struggles with just sounding out the text. Another common problem, frequently seen among native speakers of a language, is the students just read without taking a moment to think about what they read. This lack of reflection and intellectual wrestling with the text can make so that the student knows they read something but knows nothing about what they read.

In this post, we will look at several common strategies to support reading comprehension. These strategies include the following…

Walking a Student Through the Text

As students get older, there is a tendency for many teachers to ignore the need to guide students through a reading before the students read it. One way to improve reading comprehension is to go through the assigned reading and give an idea to the students of what to expect from the text.

Doing this provides a framework within the student’s mind in which they can add the details to as they do the reading. When walking through a text with students the teacher can provide insights into important ideas, explain complex words, explain visuals, and give general ideas as to what is important.

Ask Questions

Asking question either before or after a reading is another great way to support students understanding. Prior questions give an idea of what the students should be expected to know after reading. On the other hand, questions after the reading should aim to help students to coalesce the ideals they were exposed to in the reading.

The type of questions is endless. The questions can be based on Bloom’s taxonomy in order to stimulate various thinking skills. Another skill is probing and soliciting responses from students through encouraging and asking reasonable follow-up questions.

Develop Relevance

Connecting what a student knows what they do not know is known as relevance.If a teacher can stretch a student from what they know and use it to understand what is new it will dramatically improve comprehension.

This is trickier than it sounds. It requires the teacher to have a firm grasp of the subject as well as the habits and knowledge of the students. Therefore, patience is required.

Conclusion

Reading is a skill that can improve a great deal through practice. However, mastery will require the knowledge and application of strategies. Without this next level of training, a student will often become more and more frustrated with reading challenging text.

Criticism of Grades

Grading has recently been under attack with people bringing strong criticism against the practice. Some schools have even stopped using grades altogether. In this post, we will look at problems with grading as well as alternatives.

It Depends on the Subject

The weakness of grading is often seen much more clearly in subjects that have more of a subjective nature to them from the Social sciences and humanities such as English, History, or Music. Subjects from the hard sciences such as biology, math, and engineering are more objective in nature. If a student states that 2 + 2 = 5 there is little left to persuasion or critical thinking to influence the grade.

However, when it comes to judging thinking or musical performance it is much more difficult to assess this without bringing the subjectivity of opinion. This is not bad as a teacher should be an expert in their domain but it still brings an arbitrary unpredictability to the system of grading that is difficult to avoid.

Returning to the math problem, if a student stats 2 +2 =  4 this answer is always right whether the teacher likes the student or not. However, an excellent historical essay on slavery can be graded poorly if the history teacher has issues with the thesis of the student. To assess the essay requires subjective though into the quality of the student’s writing and subjectivity means that the assessment cannot be objective.

Obsession of Students

Many students become obsess and almost worship the grades they receive. This often means that the focus becomes more about getting an ‘A’ than on actually learning. This means that the students take no-risk in their learning and conform strictly to the directions of the teacher. Mindless conformity is not a sign of future success.

There are many comments on the internet about the differences between ‘A’ and ‘C’ students. How ‘A’ students are conformist and ‘C’ students are innovators. The point is that the better the academic performance of a student the better they are at obeying orders and not necessarily on thinking independently.

Alternatives to Grades

There are several alternatives to grading. One of the most common is Pass/fail. Either the student passes the course or they do not. This is common at the tertiary level especially in highly subjective courses such as writing a thesis or dissertation. In such cases, the student meets the “mysterious” standard or they do not.

Another alternative is has been the explosion in the use of gamification. As the student acquires the badges, hit points, etc. it is evidence of learning. Of course, this idea is applied primarily at the K-12 level but it the concept of gamification seems to be used in almost all of the game apps available on cellphones as well as many websites.

Lastly, observation is another alternative. In this approach, the teacher makes weekly observations of each student. These observations are then used to provide feedback for the students. Although time-consuming this is a way to support students without grades.

Conclusion

As long as there is education there must be some sort of way to determine if students are meeting expectations. Grades are the current standard. As with any system, grades have their strengths and weaknesses. With this in mind, it is the responsibility of teachers to always search for ways to improve how students are assessed.

Passive vs Active Learning

Passive and active learning are two extremes in the world of teaching. Traditionally, learning has been mostly passive in nature. However, in the last 2-3 decades, there has been a push, particularly in the United States to encourage active learning in the classroom.

This post will define passive and active learning and provide examples of each.

Passive Learning

Passive learning is defined from the perspective of the student and means learning in which the students do little to nothing to acquire the knowledge. The most common form of passive learning is direct instruction aka lecture-style teaching.

With passive learning, the student is viewed as an empty receptacle of knowledge that the teacher must fill with his knowledge. Freire called this banking education as the student serves as an account in which the teacher or banker places the knowledge or money.

There is a heavy emphasis on memorizing and recalling information. The objective is the preservation of knowledge and the students should take notes and be ready to repeat or at least paraphrase what the teacher said. The teacher is the all-wise sage on the stage.

Even though it sounds as though passive learning is always bad there are times when it is beneficial. When people have no prior knowledge of a subject passive learning can provide a foundation for future active learning activities. In addition, if it is necessary to provide a large amount of information direct instruction can help in achieving this.

Active Learning

Active learning is learning in which the students must do something in order to learn. Common examples of this include project-based learning, flipped classroom, and any form of discussion in the classroom.

Active learning is derived from the philosophy of constructivism. Constructivism is the belief that students used their current knowledge to build new understanding. For example, with project-based learning students must take what they know in order to complete the unknown of the project.

For the flipped classroom, students review the lecture style information before class. During class, the students participate in activities in which the use what they learned outside of class. This in turn “flips” the learning experience. Out of class is the passive part while in class is the active part.

There is a reduction or total absence of lecturing in an active learning classroom. Rather students interact with each and the teacher to develop their understanding of the content. This transactional nature of learning is another characteristic of active learning.

There are some challenges with active learning. Since it is constructivist in nature it can be difficult to assess what the students learned. This is due in part to the subjective nature of constructivism. If everybody constructs their own understanding everybody understands differently which makes it difficult to have one objective assessment.

Furthermore, active learning is time-consuming in terms of preparation and the learning experience. Developing activities and leading a discussion forces the class to move slower. If the demands of the course require large amounts of content this can be challenging for many teachers.

Conclusion

There is room in the world of education for passive and active learning strategies. The main goal should be to find a balance between these two extremes as over reliance on either one will probably be a disadvantage to students.

Teacher Burnout

Teacher burnout is a common problem within education. The statistics vary but you can safely say about 1/3 of teachers suffer from some form of burnout at one point or another during their career. This post will define burnout, explain some of the causes, the stages of burnout, as well as ways to deal with burnout.

Definition

Essentially, teacher burnout is an experience of a person who is overwhelmed by the stress of teaching. The most common victims of this are young teachers as well as female teachers.

Young teachers are often at higher risk because they have not developed coping mechanisms for the rigors of teaching. Women are also more often to fall victim to teacher burnout because of the added burning of maintaining the home as well as difficulties with distancing themselves emotionally from their profession as a teacher.

Causes

Teacher burnout is generally caused by stress. Below are several forms of stress that can plague the teaching profession.

  • Workload-This is especially true for those who can never say “no.” Committees, field trips, student activities, grading, lesson plans, accreditation. All of these important tasks can overwhelm a person
  • Student behavioral problems-Classroom management is always a challenge as families continue to collapse.
  • Issues with leadership
  • Boredom-This stressor is more common with experienced teachers who have taught the same content for years. There are only so many ways to teach content that are appealing to the teacher before there is some repetition. Boredom can also be especially challenging for a teacher who values learning more than personal relationships with students.

Stages of Burnout

The stages of teacher burnout follow the same progression as burnout in other social work like professions. Below are four stages as developed by McMullen

  1. Closed off- The burnout victim stops socializing and is rigid against feedback. Signs include self-neglect.
  2. Irritable-The victim temper shortens. In addition, he begins to complain about everything. Problems are observed everywhere whether they are legitimate or not.
  3. Paranoia-The teacher is worried about everything. Depression is common at this point as well as a loss of motivation.
  4. Exhaustion-THe teacher is emotionally drained. They no longer “care” as they see no way to improve the situation. Compassion fatigue sets in which means that there is no more emotional support to give to students.

Dealing with Burnout

Perhaps the most important step coping with burnout is to prioritize. It is necessary for a sake of sanity to say no to various request at times. Personal time away from any job is critical to being able to return refreshed. Therefore, teaching cannot be the sole driving force of the typical person’s life but should be balanced with other activities and even downtime.

It may also be necessary to consider changing professions. If you are not able to give your best in the classroom perhaps there are other opportunities available. It is impractical to think that someone who becomes a teacher must stay a teacher their entire life as though there is no other way to use the skills developed in the classroom in other professions.

Conclusion

Burnout is a problem but it is not unique to education. What really matters is that people take control and responsibility of their time and not chase every problem that comes into their life. Doing so will help in coping with the rigors of the teaching profession.

Motivating Students

It can be frustrating for a teacher to spend hours in preparation and planning activities only to have to students who have no desire to learn or enjoy the learning experience. There are ways to help students to be more motivated and engaged in their learning. This post will provide some basic ideas.

Types of Motivation

In simple terms, there are two types of motivation. These two types of motivation are intrinsic and extrinsic motivation. Intrinsic motivation is an inner drive to do something or in other words to be self-motivated.

Extrinsic motivation is when the push to do something comes from outside of the person. Due to uncontrollable circumstances, the person is pushed to do something.

Each teacher needs to decide which form of motivation to focus on or whether to try and address both in their classroom. A teacher with more of a cognitivist view of teaching will probably lean towards developing intrinsic motivation. On the other hand, a teacher who has more of a behavioral view of teaching may focus more on influencing extrinsic motivation.

Ways to Motivate

Involvement

Nothing motivates like having to help those around you. Getting students involved in their learning and in the management of the class often affects motivation. When students are called to help they realize that they have a role and that others are depending on them. This brings a naturally social pressure to fulfill their role.

Make it Relevant

Teachers often fall into the trap of knowing what’s best for students and sticking to teaching this. However, the student does not always agrees with what is best for them and thus are not motivated to learn.

To alleviate this problem, a teacher must provide immediate applications of knowledge. If the student can see how they can use the information now rather than several years from now they will probably be more motivated to learn it.

One way to develop relevancy is discovery learning. Instead of teaching everything in advance let the students work until they can go no further. When they realize they need to learn something they will be ready to listen.

Acknowledge Excellence

When students are doing good work, it is important to let them know. This will help them to understand what is acceptable learning behavior. People like positive reinforcement and this needs to come from a person of authority like a teacher.

A slightly different way to acknowledge excellence is simply to expect it. When the standard is set high often students naturally want to reach for it because they often want the approval of the teacher.

Conclusion

We have all faced situation when we were not interested or motivated to learn and study. It is important to remember this when dealing with students. They have the same challenge with motivation as we all do.

The Fall of Cursive Handwriting

Writing in a cursive style has been around for centuries. However,  there has been a steep decline in the use of cursive writing in America for the past several decades. This post will trace the history of cursive writing as well as what is replacing this traditional form of writing.

History

Cursive in one form or another dates back until at least the 11th century with examples of it being found in documents related to the Norman Conquest of England. Cursive was originally developed to prevent having to raise the quill from the page when writing. Apparently, quills are extremely fragile and constantly reapplying them to the paper increase the likelihood they would break.

Cursive was also developed in order to fight more words on a page. This became especially important with the development of the printing press, With people hated the condense font of the printing press that they revolted and developed a cursive writing style.

In America, people’s writing style and penmanship could be used to identify social rank. However, this changed with the development of the Spencerian method, developed by PLats Spencer. This writing style standardized cursive thus democratizing it.

After Spencer, there were several writing systems that all had their moment in the sun. Examples include the cursive styles developed by Palmer, Thurber, and Zaner. Each had its own unique approach that all influenced children during the 20th and early 21st century.

The Decline

The initial decline of cursive writing began with the advent of the typewriter. With typing, a person could write much faster than by hand. Writing by hand often has a top speed of 20 wpm while even a child who has no trying in typing can achieve 20wpm and a trained typist can reach 40 wpm with pros reach 75 wpm.

Typing also removes the confusion of sloppy handwriting. We’ve all have been guilty of poor penmanship or have had to suffer through trying to decipher what someone wrote. Typing removes even if it allows the dread typos.

With computers arriving in the 1970’s schools began to abandon the teaching of cursive by the 1980’s and 90’s. Today cursive writing is so unusual that some young people cannot even read it.

Going Forward

Typing has become so ubiquitous that schools do not even teach it as they assume that students came to school with this skill. As such, many students are using the hunt and peck approach which is slow and bogs down the thought process needed for writing. The irony is that cursive has been forgotten and typing has been assumed which means that it was never learned by many.

To further complicate things, the use of touch screens has further negated the learning of typing. Fast typing often relies on touch. With screens, there is nothing to feel or press when tyoing. This problem makes it difficult to type automatically which takes cognitive power from writing as now the student has to focus on remembering where the letter p is on the keyboard rather than shaping their opinion.

Critical Thinking Strategies

Developing critical thinking is a primary goal in many classrooms. However, it is difficult to actually achieve this goal as critical thinking is an elusive concept to understand. This post will provide practical ways to help students develop critical thinking skills.

Critical Thinking Defined

Critical thinking is the ability to develop support for one’s position on a subject as well as the ability to question the reasons and opinions of another person on a given subject. The ability to support one’s one position is exceedingly difficult as many people are convinced that their feelings can be substituted as evidence for their position.

It is also difficult to question the reasons and opinions of others as it requires the ability to identify weaknesses in the person’s positions while having to think on one’s feet. Again this is why many people stick to their emotions as it requires no thinking and emotions can be felt much faster than thoughts can be processed. Thinking critically involves assessing the strength of another’s thought process through pushing them with challenging questions or counter-arguments.

Developing Critical Thinking Skills

Debates-Debates provide an opportunity for people to both prepare arguments as well as defend in an extemporaneous manner. The experience of preparation as well as on the feet thinking help to develop critical thinking in many ways. In addition, the time limits of debates really force the participants to be highly engaged.

Reciprocal Teaching-Reciprocal teaching involves students taking turns to teach each other. As such, the must take a much closer look at the content when they are aware that they will have to teach it. In addition, Reciprocal teaching encourages discussion and the answering of questions which further supports critical thinking skills development.

Discussion-Discussion through the use of open-ended question is another classic way to develop critical thinking skills. The key is in the open-ended nature of the question. This means that there is no single answer to the question. Instead, the quality of answers are judged on the support the students provide and their reasoning skills.

Open-ended assignments-Often as teachers, we want to give specific detailed instructions on how to complete an assignment. This reduces confusion and gives each student a similar context in which learning takes place.

However, open-ended assignments provide a general end goal but allow the students to determine how they will complete it. This open-ended nature really forces the students to think about what they will do. In addition, this is similar to work in the real world where often the boss wants something done and doesn’t really care how the workers get it done. The lack of direction can cause less critical workers problems as they do not know what to do but those who are trained to deal with ambiguity will be prepared for this.

Conclusion

Critical thinking requires a context in which free thought is allowed but is supported. It is difficult to develop the skills of thinking with activities that stimulate this skill. The activities mentioned here are just some of the choices available to a teacher.

Teaching Reflective Thinking

Reflective thinking is the ability to look at the past and develop understanding and insights about what happened and using this information to develop a deeper understanding or to choose a course of action.  Many may believe that reflective thinking is a natural part of learning.

However, I have always been surprised at how little reflective thinking my students do. They seem to just do things without ever trying to understand how well they did outside of passing the assignment. Without reflective thinking, it is difficult to learn from past mistakes as no thought was made to avoid them.

This post will examine opportunities and aways of reflective thinking.

Opportunities for Reflective Thinking

Generally, reflective thinking can happen when

  1. When you learn something
  2. When you do something

These are similar but different concepts. Learning can happen without doing anything such as listening to a lecture or discussion. You hear a lot of great stuff but you never implement it.

Doing something means the application of knowledge in a particular setting. An example would be teaching or working at a company. With the application of knowledge comes consequences the indicate how well you did. For example, teaching kids and then seeing either look of understanding or confusion on their face

Strategies for Reflective THinking

For situations in which the student learns something without a lot of action a common model for encouraging reflective thinking is the  Connect, Extend, Challenge model. The model is explained below

  • Connect: Link what you have learned to something you already know
  • Extend: Determine how this new knowledge extends your learning
  • Challenge: Decide what you still do not understanding

Connecting is what makes learning relevant for many students and is also derived from constructivism. Extending is a way for a student to see the benefits of the new knowledge. It goes beyond learning because you were told to learn. Lastly, challenging helps the student to determine what they do not know which is another metacognitive strategy.

When a student does something the reflection process is slightly different below is an extremely common model.

  • what went well
  • what went wrong
  • how to fix what went wrong

In this model, the student identifies what they did right, which requires reflective thinking. The student also identifies the things they did wrong during the experience. Lastly, the student must problem solve and develop strategies to overcome the mistakes they made. Often the solutions in this final part are implemented during the next action sequence to see how well they worked out.

Conclusion

Thinking about the past is one of the strongest ways to prepare for the future. Therefore, teachers must provide their students with opportunities to think reflectively. The strategies included here provide a framework for guiding students in this critical process.

Classrooom Management at the University Level

Classroom management is different at the university level when compared to K-12. Often the problem is not behavioral in nature (with the exception of cell phones). Rather a lot of the classroom management problems at a university are academically related. In the classroom, the problem is often inattentiveness or idleness. In general, the challenge is completing assignments and being prepared for assessments.

Clear Syllabus

Making sure the syllabus is clear is critical for better performance of students. The syllabus includes the calendar, assignment requirements, rules, etc. When these are laid out in advance expectations are set the students strives to reach.

If the syllabus is unclear it normally means the expectations are unclear and even that the teaching is unclear. Most universities have a standard format for their syllabuses but it is still the teacher responsibility to explain clearly the expectations

Stick to the Syllabus

When the course has begun the commitments and expectations stipulated in the syllabus should be fully committed to. It is better to think of the syllabus as a binding contract between two parties. Once it is distributed and discuss there is nothing left to negotiate.

Related to this is the need to actually enforce rules. If there is a late policy it must be enforced otherwise students will think that you are not serious and the students will push for more concession. This can quickly snowball into chaos. If you actually have a rule against cellphones than it needs to be supported or you will develop students who have a disdain for people who don’t enforce their rules.

Provide Feedback

Perhaps one of the biggest problems in academia is a lack of feedback. Many professors may only have three assignments in a course per year. Given that there is almost always a mid-term and final in many courses and these are primarily summative assessment and not really for learn only. Many students have one assignment that extends beyond multiple choice.

This means that students need constant feedback. This allows for students to learn from their mistakes as well as provide them with motivation to complete their students. It is not always practical to mark every assignment. A shortcut would be to look at a sample of assignments and explain common errors to the class.

Mix Teaching Styles 

The last useful strategy will help to reduce daydreaming and listlessness. The most common teaching approach is usually lecture or direct-instruction. The problem is that if everyone does this it becomes really boring for any students. Therefore, lecturing is only bad if this is the only instructional model being used.

To maintain engagement means to used different teaching methodologies. While the syllabus should be structured and unchanging good teaching often has a flair and a slight degree of unpredictability that makes the classroom interesting

Conclusion

Teaching at any level is hard. However, classroom management at the university level can be challenging as this is not the most widely discussed topic. For, success a professor needs to commit to the syllabus while being flexible in their delivery of content.

Classroom Management Ideas

One of the greatest challenges in teaching is classroom management. Students are always looking for ways and opportunities to test the limits of acceptable behavior. For teachers, these constant experimentation with the boundaries of how to act are extremely tiresome.

However, there are several strategies that teachers can use to limit poor behavior. Some of these ideas include the following.

  • Setting routines
  • Rehearsing transitions
  • Anticipating behavior
  • Non-Verbal cues

Set Routines

Establishing clear routines will help to regulate the behavior of students tremendously. When everybody knows their role and what to do there is usually less curiosity for a student to see what they can get away with.

Routine need to be explained, demonstrated, and practice in order for students to master them. Once a routine is established most students enjoy the predictability of having set actions that they need to perform and certain times of the day. While instruction should be varied and exciting routines provide a sense of stability and security to brilliant teaching.

Rehearse Transitions

A specific form of routine are transitions. Transitions are those moments in class when you have to move from one activity to another. An example would be going out to recess or coming in from recess, etc.

It is at moments like these that everyone is active. With so many moving parts and actions taking place, this is when the most breakdowns in behavior can often take place. Therefore the teacher needs to be extra diligent during the moments and make sure the routines are thoroughly drilled to avoid near absolute chaos.

Anticipation

Anticipating has to do with seeing what might happen before it actually happens. An analogy would be to an athlete who sees an opportunity to make a great play because of the actions of his opponent. A teacher must be able to read the class and be one-step ahead of the students.

A term related to this is called withitness which means to have a constant awareness of what is happening in the classroom. Or in other words to have eyes in the back of your head. As a teacher gets to know their students it becomes easier to predict their actions and to make adjustments beforehand. This can greatly reduce behavioral problems.

Non-Verbal Cues

Talk is cheap, especially with students. Non-verbal cues save the voice while getting students to do things. Every teacher should have several non-verbal commands that they use in their classroom. Examples may include ways to get the classes attention, to grant permission to go to the bathroom, to give permission get out of one’s seat, etc.

Most classes have a rule for students to raise their hand. However, non-verbal cues should not stop there. The more non-verbal cues the less talking. In addition, non-verbal cues reduce arguing because there were no words exchanged.

Conclusion

Behavior is a challenge but there are ways to overcome at least some of it. Teachers need to consider and employ ways to anticipate and deal with behavioral problems preferably before they become big problems.

Homeschooling Multiple Children

Homeschooling one child is challenging enough. Now imagine trying to teach more than one or even several. There are things that become more complex but also more efficient with the addition of each new member to the homeschooling context.

It Gets Easier Each Time

When you begin to teach the second child it is surprisingly easier. You have learned from the mistakes made teaching the first child and are familiar with the curriculum. The content is probably fresh in your mind and you’re no longer trying to remember how to do all of these basic skills that are now automatic for you such as reading and counting. You also have learned shortcuts and other tricks that make your teaching more efficient.

The second child has also probably watched you teach the first one. When this happens they learn a lot of the content almost through osmosis. I have seen three years playing with how to write when the older sibling could barely write at five years of age. Just watching the older sibling sped up the development of the younger one.

The second child is also more likely to be eager to learn from watching the older child be in school as well since there is a culture of learning in the house now. They can’t wait for their turn to learn and this also makes things easier. Combine this with an experience parent and adding an additional student is not as burdensome as it seems.

Working Together

To be efficient and not stressed out many families teach non-core subjects (history, science, art, PE, etc.) to all children at the same time. The reason for this is that often in non-core subject the order the content is learned is not as important or linear. For example, in science, if a second grader learns about the weather before learning about plants it probably will not cause too much damage in their development if any at all.

Core-subject (reading, math) are taught separately because the difference in skill in the subjects can be extensive and there is a clear linear development in these subjects. The exception to this would be to have the older sibling serve as a teacher or tutor for the younger one. This really helps everyone involve in developing a better understanding and reduces the stress on the parent.

Independence of the Senior Student

With the addition of the second child to the homeschool, this calls on the oldest to become more independent. There is less one-on-one time to support them with the time that is no given to other children. Therefore, the older child will have to sometimes figure things out on their own. The benefit of this is the development of autonomy which is a hard to find skill in this world.

Instead of watching everything they do the parent is now more of a monitor who drops by to check progress rather than watch every academic move. This places some of the burden of learning on the child which is good for developing a sense of responsibility.

Conclusion

With a combination of experience, efficiency, and the help of older children, homeschooling multiple children is highly doable. The key is to get everyone working together to achieve the educational goals of the family.

Supporting ESL Student’s Writing

ESL students usually need to learn to write in the second language. This is especially true for those who have academic goals. Learning to write is difficult even in one’s mother tongue let alone in a second language.

In this post, we will look at several practical ways to help students to learn to write in their L2. Below are some useful strategies

  • Build on what they know
  • Encourage coherency in writing
  • Encourage collaboration
  • Support Consistency

Build on Prior Knowledge

It is easier for most students to write about what they know rather than what they do not know.  As such, as a teacher, it is better to have students write about a familiar topic. This reduces the cognitive load on the students allows them to focus more on their language issues.

In addition, building on prior knowledge is consistent with constructivism. Therefore, students are deepening their learning through using writing to express ideas and opinions.

Support Coherency 

Coherency has to do with whether the paragraph makes sense or not. In order to support this, the teacher needs to guide the students in developing main ideas and supporting details and illustrate how these concepts work together at the paragraph level. For more complex writing this involves how various paragraphs work together to support a thesis or purpose statement.

Students struggle tremendously with these big-picture ideas. This in part due to the average student’s obsession with grammar. Grammar is critical after the student has ideas to share clearer and never before that.

Encourage Collaboration

Students should work together to improve their writing. This can involve peer editing and or brainstorming activities. These forms of collaboration give students different perspectives on their writing beyond just depending on the teacher.

Collaboration is also consistent with cooperative learning. In today’s marketplace, few people are granted the privilege of working exclusively alone on anything.  In addition, working together can help the students to develop their English speaking communication skills.

Consistency

Writing needs to be scheduled and happen frequently in order to see progress at the ESL level. This is different from a native speaking context in which the students may have several large papers that they work on alone. In the ESL classroom, the students should write smaller and more frequent papers to provide more feedback and scaffolding.

Small incremental growth should be the primary goal for ESL students. This should be combined with support from the teacher through a consistent commitment to writing.

Conclusion

Writing is a major component of academic life. Many ESL students learning a second language to pursue academic goals. Therefore, it is important that teachers have ideas on how they can support ESL student to achieve the fluency they desire in their writing for further academic success.

Videoconferencing in Online Course

Videoconferencing is a standard aspect of the professional world. Most large companies have some sort of video conferencing happening in terms of meetings and training. In terms of personal life, video conferencing is common as well. We probably have all used skype or google hangout at one time or another to talk with friends. However, video conferencing is not as common in education.

Video Conferencing Before Video Conferencing

Before video conferencing became common, many educators would upload videos to their online course or post them on youtube. This allowed the student to see the teacher and have more of a traditional classroom experience but real-time interaction was impossible. Instead, the interaction was asynchronous meaning not at the same time. As such communication was jilted at the least because of the lag time between interactions.

Things to Consider Before Video Conferencing

In order to have success with video conferencing you will need some sort of application that allows this. There are many different applications to choose from such as skype, google hangouts, and even Facebook. However, you want some sort of software that allows you to show your screen as well as control the flow of the conversation.

One app that allows this is called Zoom. This software allows you to schedule meetings. In addition, students do not need to download anything. Instead, the students are sent a web link that takes them to the online meeting. You can share your screen as well as monitor the discussion with the added benefit of being able to record the meeting for future use.

Pros and Cons of Video Conferencing

For whatever reason, video conferencing is engaging for students. The same discussion in class would lull them to sleep but through webcams, everyone is awake and stimulated. I am not sure what the difference is but this has been my experience

The biggest enemy to video conferencing is scheduling. This is particularly true if students are spread out all over the world. The challenges of time zones and other commitments make this hard.

This is one reason that recording a video conference is so important. It allows students who are not available to at least have an asynchronous learning experience. It also serves as a resource for students who need to see something again. Keep in mind you have to post the video either on your LMS or on youtube so that students have access to it.

Conclusion

Video conferencing provides a familiar learning experience in a different setting. It is able to give students who are not physically present an opportunity to interact with the instructor in meaningful ways. As such, the instructor must be aware of possibilities in how to use this tool in their online teaching.

Maintaining Student Focus During E-Learniing

Self-motivation is perhaps one of the biggest problems in e-learning. Students who are left to themselves to complete learning experience often just do not successfully finish the learning experiences prepared by the teacher. For whatever reason, often the internal drive to finish something such as an online class is missing for many people.

There are several strategies that an online teacher can use in order to help students who may struggle with self-motivation in an online context. These ideas include…

  • Brief Lessons
  • Frequency Assessment
  • Collaboration

Brief Lessons

Nothing is more discouraging to many students than having to read several pages of text or even several hours of video to complete a single lesson or module in an online course. Therefore, the teacher needs to make sure lessons are short. Completing many small lessons is much more motivating for many students than completing a handful of really large lessons. This is because frequent completion of small lessons is rewarding and indicates progress which the brain rewards.

How long a lesson should depend on many factors such as the age and expertise of the students. Therefore, it is difficult to give a single magic number to aim for. You want to avoid the extreme of lessons too short and lessons to long.

IN my own experience most people make their lessons too long so the majority of us probably need to reduce the content in an individual lesson and spread it over many lessons. All the content can be there it is just chunked differently so that students experience progress.

Frequency Assessment

Along with brief lessons should be frequent assessment. Nothing motivates like knowing something is going to be on the quiz or there is some sort of immediate application. Students need to do something with what they are learning in order to stay engaged. Therefore, constant assessment is not only for grades but also for learning. Besides the stress of a small quiz provides an emotional stimulus that many students need

The assessment also allows for feedback which helps the student to monitor their learning. In addition, the feedback provides more evidence of progress being made in a course which is itself motivating for many.

Collaboration

Nothing motivates the same as working together. Many people love to work in groups and get energy from this. In addition, it’s harder to quit and give a course when you have group members waiting for your contribution. In addition, interacting with students deepens understanding of the course material.

Communicating with other students online to complete assignments is one way of establishing community in an online class. It is similar to traditional classroom where everyone has to discuss and work together to have success.

Conclusion

Motivated students are successful students. IN order for this to happen in an elearning class studnets need to be engaged through brief lessons that inckude frequent assessment tjat includes social interaction.

Tips for Online Studying

Today it is common for students to study online. This has both pros and cons to it. Although e-learning allows students to study anytime and anywhere it also can lead to a sense of disconnection and frustration. This post will provide some suggestions for how to study online successfully.

Make a Schedule

In a traditional classroom, there is a fixed time to come to class. This regulated discipline helps many students to reach a minimum standard of knowledge even if they never study on their own. In e-learning, the student can study whenever they want. Sadly, many choose to never study which leads to academic failure.

Success in online studying requires a disciplined schedule in which the student determines when they will study as well as what they will do during the study time. As such, you will need to set-up some sort of a calendar and to do list that guides you through the learning experience.

It is also important to pace your studying. With flexible courses sometimes the assignments are due at the end of the course. This temptation leads to students who will do all their studying at the last minute. This robs the student of in-depth learning as well as the ability to complete assignment thoroughly. Learning happens best over time and not at the last minute,

Participate

In a traditional class, there are often opportunities to participate in class discussions or question and answer sessions. Such opportunities provide students with a chance to develop a deeper understanding of the ideas and content of the course. Students who actually participate in such two-way dialog usually understand the material of the course better than students who do not.

For the online student participation is also important and can render the same benefits. Participating in forums and chats will deepen understanding. However, I must admit that with the text-heavy nature of online forums reading the comments of peers can in many ways boost understanding without participation. This is because you can read other’s ideas at your own speed which helps with comprehension. This is not possible during an in-class discussion when people may move faster than you can handle.

Communicate with the Instructor

When a student is confused they need to speak up. For some reason, students are often shy to contact the instructor in an online course. However, the teacher is there to help you and expects questions and feedback. As such, reach to them.

Communicating with the instructor also helps to establish a sense of community which is important in online learning. It helps the instructor to establish presence and demonstrates that they are here to help you to succeed.

Conclusion

E-learning is a major component of the future of learning. Therefore, students need to be familiar with what they need to do in order to be successful in their online studies.

Tips for Teaching Online

Teaching online is a unique experience due in part to the platform of instruction. Often, there is no face to face interaction and all communication is in some sort of digital format. Although this can be a rewarding experience there are still several things to consider when teaching in this format. Some tips for successful online teaching include the following.

  • Planning in advance
  • Having a presence
  • Knowing your technology
  • Being consistent

Plan in Advance

All teaching involves advance planning. However, there are those teaching moments in a regular classroom where a teacher can change midstream to hit a particular interest in the class. In addition, more experienced teachers tend to plan less as they are so comfortable with the content and have an intuitive sense of how to support students.

In online teaching, the entire course should be planned and laid out accordingly before the course starts. It is a nightmare to try and develop course material while trying to teach online. This is partially due to the fact that there are so many reminders and due dates sprinkled throughout the course that are inflexible. This means a teacher must know the end from the beginning in terms of what the curriculum covers and what assignments are coming. Changing midstream is really tough.

In addition, the asynchronous nature of online teaching means that instructional material must be thoroughly clear or students will be lost. This again places an emphasis on strong preparation. Online teaching isn’t really for the person who likes to live in the moment but rather for the person who plans ahead.

Have Presence

Having presence means making clear that you are monitoring progress and communicating with students frequently. When students complete assignments they should receive feedback. There should be announcements made in terms of assignments due, general feedback about activities, as well as Q&A with students.

Many people think that teaching online takes less time and can have larger classes. This is far from the case. Online teaching is as time intensive as regular teaching because you must provide feedback and communication or the students will often feel abandon.

Know Your Technology

An online teacher must be familiar and a proponent of technology. This does not mean that you know everything but rather you know how to get stuff done. You don’t need a master in web design but knowing the basics of HTML can really help when communicating with the IT people.

Whatever learning management system you use should actually be familiar with it and not just a consumer. Too many people just upload text for students to read and provide several forums and call that online learning. In many ways, that’s online boredom, especially for younger students.

Consistency

Consistency is about the user experience. The different modules in the course should have the same format with different activities. This way, students focus on learning and not trying to figure out what you want them to do. This applies across classes as well. There needs to be some sense of stability in terms of how content is delivered. There is no single best way but it needs to similar within and across courses for the sake of learning.

Conclusion

These are just some of many ideas to consider when teaching an online course. The main point is the need for preparation and dedication when teaching online.

Blended Learning Defined

E-Learning is commonly used tool at most educational institutions. Often, the e-learning platform is fully online or a traditional model of face-to-face instruction is used. Blended learning is something that is available but not as clear in terms of what to do.

In this post, we will look at what  blended learning is and  what it is not

What Blended Learning is

Blended learning is an instructional environment in which online learning and traditional face-to-face instruction coexist and are employed in a course. There are at least six common models of blended learning.

  • Face-to-face driver – Traditional instruction is supported by online materials
  • Online driver –The entire course is completed online with teacher support made available
  • Rotation – A course in which students cycle back and forth between online and traditional instruction
  • Labs – Content is delivered online but in a specific location such as a computer lab on-campus
  • Flex – Most of the curriculum is delivered is online and the teacher is available for face-to-face consultation.
  • Self-blend – Students choose to augment their traditional learning experience with online coursework.

These models mentioned above can be used in combination with each other and are not mutually exclusive.

For a course to be blended, it is probably necessary for at least some sort of learning to happen online. The challenge is in defining learning. For example, the Moodle platform places an emphasis on constructivism. As such, there are a lot of opportunities for collaboration in the use of the modules available in Moodle. Through discussion and interaction with other students through forums, commenting on videos, etc., students are able to demonstrate learning.

For a more individualistic experience, if the course is blended the students need to do something online. For example, completing a quiz, add material to a wiki or database, etc. are all ways to show that learning is taking place without as much collaboration. However, a teacher chooses to incorporate blended learning the students need to do something online for it to truly be blended.

What Blended Learning is not

Many teachers will post there powerpoints online and have students submit assignments online and call this blended learning. While it is commendable that online tools are being used this is not really blended learning because there is no learning taking place anytime online. Rather this is an excellent example of using cloud sources to upload and download materials.

The powerpoints were seen in class and are available for review.  Uploading assignments are trickier to classify as online learning or not but if it required the students to complete a traditional assignment and simply upload it then there was no real online learning experience. The students neither collaborated nor completed anything online in order to complete this learning experience.

Conclusion

The definition here is not exhaustive. The purpose was to provide a flexible framework in which blended learning is possible. To make it as simple as possible, blended learning is the students actively learning online and actively learning in a traditional format. How much of each component depends on the approach of the teacher.

Benefits of Writing

There are many reasons that a person or student should learn to master the craft of writing in some form or genre. Of course, the average person knows how to write if they have a k-12 education but here it is meant excelling at writing beyond introductory basics. As such, in this post, we will look at the following benefits of learning to write

  • Makes you a better reader and listener
  • Enhances communication skills
  • Develops thinking skills

Improved Reading and Listening Skills

There seems to be an interesting feedback loop between reading and writing. Avid readers are often good writers and avid writers are often good readers. Reading allows you to observe how others write and communicate. This, in turn, can inspire your own writing. It’s similar to how children copy the behavior of the people around them. When you write it is natural to bring with you the styles you have experienced through reading.

Writing also improves listening skills, however, this happens through the process of listening to others through reading. By reading we have to assess and evaluate the arguments of the author. This can only happen through listening to the author through reading his work.

Communication Skills

Writing, regardless of genre, involves finding an audience and sharing your own ideas in a way that is clear to them. As such, writing natural enhances communication skills This is because of the need to identify the purpose or reason you are writing as well as how you will share your message.

When writing is unclear it is often because the writer has targeted the wrong audience or has an unclear purpose for writing. A common reason research articles are rejected is that the editor is convinced that the article is not appropriate for the journal’s audience. Therefore, it is critical that an author knows there audience.

Thinking Skills 

In relation to communication skills is thinking skills. Writing involves taking information in one medium, the thoughts in your head, and placing them in another medium, words on paper. Whenever content moves from one medium to another there is a loss in meaning. This is why for many people, there writing makes sense to them but to no one else.

Therefore, a great deal of thought must be placed into writing with clarity. You have to structure the thesis/purpose statement, main ideas, and supporting details. Not to mention that you will often need references and need to adhere to some form of formatting. All this must be juggled while delivering content that critically stimulating.

Conclusion 

Writing is a vehicle of communication that is not used as much as it used to be. There are so many other forms of communication and interaction that something writing is obsolete. However, though the communication may change, the benefits of writing are still available.

Local Regression in R

Local regression uses something similar to nearest neighbor classification to generate a regression line. In local regression, nearby observations are used to fit the line rather than all observations. It is necessary to indicate the percentage of the observations you want R to use for fitting the local line. The name for this hyperparameter is the span. The higher the span the smoother the line becomes.

Local regression is great one there are only a handful of independent variables in the model. When the total number of variables becomes too numerous the model will struggle. As such, we will only fit a bivariate model. This will allow us to process the model and to visualize it.

In this post, we will use the “Clothing” dataset from the “Ecdat” package and we will examine innovation (inv2) relationship with total sales (tsales). Below is some initial code.

library(Ecdat)
data(Clothing)
str(Clothing)
## 'data.frame':    400 obs. of  13 variables:
##  $ tsales : int  750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
##  $ sales  : num  4412 4281 4167 2670 15000 ...
##  $ margin : num  41 39 40 40 44 41 39 28 41 37 ...
##  $ nown   : num  1 2 1 1 2 ...
##  $ nfull  : num  1 2 2 1 1.96 ...
##  $ npart  : num  1 3 2.22 1.28 1.28 ...
##  $ naux   : num  1.54 1.54 1.41 1.37 1.37 ...
##  $ hoursw : int  76 192 114 100 104 72 161 80 158 87 ...
##  $ hourspw: num  16.8 22.5 17.2 21.5 15.7 ...
##  $ inv1   : num  17167 17167 292857 22207 22207 ...
##  $ inv2   : num  27177 27177 71571 15000 10000 ...
##  $ ssize  : int  170 450 300 260 50 90 400 100 450 75 ...
##  $ start  : num  41 39 40 40 44 41 39 28 41 37 ...

There is no data preparation in this example. The first thing we will do is fit two different models that have different values for the span hyperparameter. “fit” will have a span of .41 which means it will use 41% of the nearest examples. “fit2” will use .82. Below is the code.

fit<-loess(tsales~inv2,span = .41,data = Clothing)
fit2<-loess(tsales~inv2,span = .82,data = Clothing)

In the code above, we used the “loess” function to fit the model. The “span” argument was set to .41 and .82.

We now need to prepare for the visualization. We begin by using the “range” function to find the distance from the lowest to the highest value. Then use the “seq” function to create a grid. Below is the code.

inv2lims<-range(Clothing$inv2)
inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])

The information in the code above is for setting our x-axis in the plot. We are now ready to fit our model. We will fit the models and draw each regression line.

plot(Clothing$inv2,Clothing$tsales,xlim=inv2lims)
lines(inv2.grid,predict(fit,data.frame(inv2=inv2.grid)),col='blue',lwd=3)
lines(inv2.grid,predict(fit2,data.frame(inv2=inv2.grid)),col='red',lwd=3)

1

Not much difference in the two models. For our final task, we will predict with our “fit” model using all possible values of “inv2” and also fit the confidence interval lines.

pred<-predict(fit,newdata=inv2.grid,se=T)
plot(Clothing$inv2,Clothing$tsales)
lines(inv2.grid,pred$fit,col='red',lwd=3)
lines(inv2.grid,pred$fit+2*pred$se.fit,lty="dashed",lwd=2,col='blue')
lines(inv2.grid,pred$fit-2*pred$se.fit,lty="dashed",lwd=2,col='blue')

1

Conclusion

Local regression provides another way to model complex non-linear relationships in low dimensions. The example here provides just the basics of how this is done is much more complicated than described here.

Smoothing Splines in R

This post will provide information on smoothing splines. Smoothing splines are used in regression when we want to reduce the residual sum of squares by adding more flexibility to the regression line without allowing too much overfitting.

In order to do this, we must tune the parameter called the smoothing spline. The smoothing spline is essentially a natural cubic spline with a knot at every unique value of x in the model. Having this many knots can lead to severe overfitting. This is corrected for by controlling the degrees of freedom through the parameter called lambda. You can manually set this value or select it through cross-validation.

We will now look at an example of the use of smoothing splines with the “Clothing” dataset from the “Ecdat” package. We want to predict “tsales” based on the use of innovation in the stores. Below is some initial code.

library(Ecdat)
data(Clothing)
str(Clothing)
## 'data.frame':    400 obs. of  13 variables:
##  $ tsales : int  750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
##  $ sales  : num  4412 4281 4167 2670 15000 ...
##  $ margin : num  41 39 40 40 44 41 39 28 41 37 ...
##  $ nown   : num  1 2 1 1 2 ...
##  $ nfull  : num  1 2 2 1 1.96 ...
##  $ npart  : num  1 3 2.22 1.28 1.28 ...
##  $ naux   : num  1.54 1.54 1.41 1.37 1.37 ...
##  $ hoursw : int  76 192 114 100 104 72 161 80 158 87 ...
##  $ hourspw: num  16.8 22.5 17.2 21.5 15.7 ...
##  $ inv1   : num  17167 17167 292857 22207 22207 ...
##  $ inv2   : num  27177 27177 71571 15000 10000 ...
##  $ ssize  : int  170 450 300 260 50 90 400 100 450 75 ...
##  $ start  : num  41 39 40 40 44 41 39 28 41 37 ...

We are going to create three models. Model one will have 70 degrees of freedom, model two will have 7, and model three will have the number of degrees of freedom are determined through cross-validation. Below is the code.

fit1<-smooth.spline(Clothing$inv2,Clothing$tsales,df=57)
fit2<-smooth.spline(Clothing$inv2,Clothing$tsales,df=7)
fit3<-smooth.spline(Clothing$inv2,Clothing$tsales,cv=T)
## Warning in smooth.spline(Clothing$inv2, Clothing$tsales, cv = T): cross-
## validation with non-unique 'x' values seems doubtful
(data.frame(fit1$df,fit2$df,fit3$df))
##   fit1.df  fit2.df  fit3.df
## 1      57 7.000957 2.791762

In the code above we used the “smooth.spline” function which comes with base r.Notice that we did not use the same coding syntax as the “lm” function calls for. The code above also indicates the degrees of freedom for each model.  You can see that for “fit3” the cross-validation determine that 2.79 was the most appropriate degrees of freedom. In addition, if you type in the following code.

sapply(data.frame(fit1$x,fit2$x,fit3$x),length)
## fit1.x fit2.x fit3.x 
##     73     73     73

You will see that there are only 73 data points in each model. The “Clothing” dataset has 400 examples in it. The reason for this reduction is that the “smooth.spline” function only takes unique values from the original dataset. As such, though there are 400 examples in the dataset only 73 of them are unique.

Next, we plot our data and add regression lines

plot(Clothing$inv2,Clothing$tsales)
lines(fit1,col='red',lwd=3)
lines(fit2,col='green',lwd=3)
lines(fit3,col='blue',lwd=3)
legend('topright',lty=1,col=c('red','green','blue'),c("df = 57",'df=7','df=CV 2.8'))

1.png

You can see that as the degrees of freedom increase so does the flexibility in the line. The advantage of smoothing splines is to have a more flexible way to assess the characteristics of a dataset.

Polynomial Spline Regression in R

Normally, when least squares regression is used, you fit one line to the model. However, sometimes you may want enough flexibility that you fit different lines over different regions of your independent variable. This process of fitting different lines over different regions of X is known as Regression Splines.

How this works is that there are different coefficient values based on the regions of X. As the researcher, you can set the cutoff points for each region. The cutoff point is called a “knot.” The more knots you use the more flexible the model becomes because there are fewer data points with each range allowing for more variability.

We will now go through an example of polynomial regression splines. Remeber that polynomial means that we will have a curved line as we are using higher order polynomials. Our goal will be to predict total sales based on the amount of innovation a store employs. We will use the “Ecdat” package and the “Clothing” dataset. In addition, we will need the “splines” package. The code is as follows.

library(splines);library(Ecdat)
data(Clothing)

We will now fit our model. We must indicate the number and placement of the knots. This is commonly down at the 25th 50th and 75th percentile. Below is the code

fit<-lm(tsales~bs(inv2,knots = c(12000,60000,150000)),data = Clothing)

In the code above we used the traditional “lm” function to set the model. However, we also used the “bs” function which allows us to create our spline regression model. The argument “knots” was set to have three different values. Lastly, the dataset was indicated.

Remember that the default spline model in R is a third-degree polynomial. This is because it is hard for the eye to detect the discontinuity at the knots.

We now need X values that we can use for prediction purposes. In the code below we first find the range of the “inv2” variable. We then create a grid that includes all the possible values of “inv2” in increments of 1. Lastly, we use the “predict” function to develop the prediction model. We set the “se” argument to true as we will need this information. The code is below.

inv2lims<-range(Clothing$inv2)
inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])
pred<-predict(fit,newdata=list(inv2=inv2.grid),se=T)

We are now ready to plot our model. The code below graphs the model and includes the regression line (red), confidence interval (green), as well as the location of each knot (blue)

plot(Clothing$inv2,Clothing$tsales,main="Regression Spline Plot")
lines(inv2.grid,pred$fit,col='red',lwd=3)
lines(inv2.grid,pred$fit+2*pred$se.fit,lty="dashed",lwd=2,col='green')
lines(inv2.grid,pred$fit-2*pred$se.fit,lty="dashed",lwd=2,col='green')
segments(12000,0,x1=12000,y1=5000000,col='blue' )
segments(60000,0,x1=60000,y1=5000000,col='blue' )
segments(150000,0,x1=150000,y1=5000000,col='blue' )

1.png

When this model was created it was essentially three models connected. Model on goes from the first blue line to the second. Model 2 goes form the second blue line to the third and model three was from the third blue line until the end. This kind of flexibility is valuable in understanding  nonlinear relationship

Logistic Polynomial Regression in R

Polynomial regression is used when you want to develop a regression model that is not linear. It is common to use this method when performing traditional least squares regression. However, it is also possible to use polynomial regression when the dependent variable is categorical. As such, in this post, we will go through an example of logistic polynomial regression.

Specifically, we will use the “Clothing” dataset from the “Ecdat” package. We will divide the “tsales” dependent variable into two categories to run the analysis. Below is the code to get started.

library(Ecdat)
data(Clothing)

There is little preparation for this example. Below is the code for the model

fitglm<-glm(I(tsales>900000)~poly(inv2,4),data=Clothing,family = binomial)

Here is what we did

1. We created an object called “fitglm” to save our results
2. We used the “glm” function to process the model
3. We used the “I” function. This told R to process the information inside the parentheses as is. As such, we did not have to make a new variable in which we split the “tsales” variable. Simply, if sales were greater than 900000 it was code 1 and 0 if less than this amount.
4. Next, we set the information for the independent variable. We used the “poly” function. Inside this function, we placed the “inv2” variable and the highest order polynomial we want to explore.
5. We set the data to “Clothing”
6. Lastly, we set the “family” argument to “binomial” which is needed for logistic regression

Below is the results

summary(fitglm)
## 
## Call:
## glm(formula = I(tsales > 9e+05) ~ poly(inv2, 4), family = binomial, 
##     data = Clothing)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5025  -0.8778  -0.8458   1.4534   1.5681  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       3.074      2.685   1.145   0.2523  
## poly(inv2, 4)1  641.710    459.327   1.397   0.1624  
## poly(inv2, 4)2  585.975    421.723   1.389   0.1647  
## poly(inv2, 4)3  259.700    178.081   1.458   0.1448  
## poly(inv2, 4)4   73.425     44.206   1.661   0.0967 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 521.57  on 399  degrees of freedom
## Residual deviance: 493.51  on 395  degrees of freedom
## AIC: 503.51
## 
## Number of Fisher Scoring iterations: 13

It appears that only the 4th-degree polynomial is significant and barely at that. We will now find the range of our independent variable “inv2” and make a grid from this information. Doing this will allow us to run our model using the full range of possible values for our independent variable.

inv2lims<-range(Clothing$inv2)
inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])

The “inv2lims” object has two values. The lowest value in “inv2” and the highest value. These values serve as the highest and lowest values in our “inv2.grid” object. This means that we have values started at 350 and going to 400000 by 1 in a grid to be used as values for “inv2” in our prediction model. Below is our prediction model.

predsglm<-predict(fitglm,newdata=list(inv2=inv2.grid),se=T,type="response")

Next, we need to calculate the probabilities that a given value of “inv2” predicts a store has “tsales” greater than 900000. The equation is as follows.

pfit<-exp(predsglm$fit)/(1+exp(predsglm$fit))

Graphing this leads to interesting insights. Below is the code

plot(pfit)

1

You can see the curves in the line from the polynomial expression. As it appears. As inv2 increase the probability increase until the values fall between 125000 and 200000. This is interesting, to say the least.

We now need to plot the actual model. First, we need to calculate the confidence intervals. This is done with the code below.

se.bandsglm.logit<-cbind(predsglm$fit+2*predsglm$se.fit,predsglm$fit-2*predsglm$se.fit)
se.bandsglm<-exp(se.bandsglm.logit)/(1+exp(se.bandsglm.logit))

The ’se.bandsglm” object contains the log odds of each example and the “se.bandsglm” has the probabilities. Now we plot the results

plot(Clothing$inv2,I(Clothing$tsales>900000),xlim=inv2lims,type='n')
points(jitter(Clothing$inv2),I((Clothing$tsales>900000)),cex=2,pch='|',col='darkgrey')
lines(inv2.grid,pfit,lwd=4)
matlines(inv2.grid,se.bandsglm,col="green",lty=6,lwd=6)

1.pngIn the code above we did the following.
1. We plotted our dependent and independent variables. However, we set the argument “type” to n which means nothing. This was done so we can add the information step-by-step.
2. We added the points. This was done using the “points” function. The “jitter” function just helps to spread the information out. The other arguments (cex, pch, col) our for aesthetics and our optional.
3. We add our logistic polynomial line based on our independent variable grid and the “pfit” object which has all of the predicted probabilities.
4. Last, we add the confidence intervals using the “matlines” function. Which includes the grid object as well as the “se.bandsglm” information.

You can see that these results are similar to when we only graphed the “pfit” information. However, we also add the confidence intervals. You can see the same dip around 125000-200000 were there is also a larger confidence interval. if you look at the plot you can see that there are fewer data points in this range which may be what is making the intervals wider.

Conclusion

Logistic polynomial regression allows the regression line to have more curves to it if it is necessary. This is useful for fitting data that is non-linear in nature.

Teaching HandWriting to Young Children

Learning to write takes a lifetime. Any author will share with you how they have matured and grown over time in the craft of writing. However, there are some basic fundamentals that need to be mastered before the process of growing as a writer can begin.

This post will provide an approach to teaching writing to young children that includes the following steps.

  1. Learning to write the letters
  2. Learning to write sentences
  3. Learning to write paragraphs

Learning the Letters

The first step in this process is learning to write letters. The challenge is normally developing the fine motor skills for creating letters. If you have ever seen the writing of a 5-year-old you have some idea of what I am talking about.

It is difficult for children to actually write letters.  Normally this is taught through having the students trace the letters on a piece of paper. This drill and kill style eventual works as the child masters the art of tracing. An analogy would be the use of training wheels on a bicycle.

Generally, straight lines are easier to write than curves. As such, easy letters to learn first are t, i, and l. Curves with straight lines are often easier than slanted lines so the next stage of letters might include b, d, f, h, j, p, r, u, and y. Lastly, slanted lines and full circle letters are the hardest in my experience. As such, a, c, e, g, k, m, n, o, s, v, w, x, and z are the last to learn.

Learning to Write Sentences

It is discouraging to have the child learn the entire alphabet before writing something. It’s better to learn a few letters and begin making sentences immediately. This heightens relevance and it is motivating to the child to be able to read their own writing. For now, the sentences do not really need to make sense. Just have them write using a handful of letters with support.

Simple three-word sentences are enough at this moment. Many worksheets will provide blanks lines with space at the top for drawing and coloring which provides a visual of the sentence.

It is critical to provide support for the development of the sentence. You have to help the child develop the thought that they want to put on paper. This is difficult for many children. You may also be taxed with proving spelling support. Although for now, I would not worry too much about spelling. Students need to create first and follow rules of creating later.

Writing Paragraphs

The typical child will probably not be able to write paragraphs until the 3rd or 4th grade at the earliest. paragraph writing takes an extensive amount of planning for a small child as they now must have a beginning, middle, and end or a main idea with supporting details.

At this stage, the best way to learn to write is to read a lot. This provides a structure and vocabulary on which the child can develop their own ideas in writing. In addition, rules of writing can be taught such as grammar and other components of language.

Conclusion

Writing can be an enjoyable experience if children are guided initially in learning this craft. Over time, a child can provide many insightful ideas and comments through developing the ability to communicate through the use of text.

Polynomial Regression in R

Polynomial regression is one of the easiest ways to fit a non-linear line to a data set. This is done through the use of higher order polynomials such as cubic, quadratic, etc to one or more predictor variables in a model.

Generally, polynomial regression is used for one predictor and one outcome variable. When there are several predictor variables it is more common to use generalized additive modeling/ In this post, we will use the “Clothing” dataset from the “Ecdat” package to predict total sales with the use of polynomial regression. Below is some initial code.

library(Ecdat)
data(Clothing)
str(Clothing)
## 'data.frame':    400 obs. of  13 variables:
##  $ tsales : int  750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
##  $ sales  : num  4412 4281 4167 2670 15000 ...
##  $ margin : num  41 39 40 40 44 41 39 28 41 37 ...
##  $ nown   : num  1 2 1 1 2 ...
##  $ nfull  : num  1 2 2 1 1.96 ...
##  $ npart  : num  1 3 2.22 1.28 1.28 ...
##  $ naux   : num  1.54 1.54 1.41 1.37 1.37 ...
##  $ hoursw : int  76 192 114 100 104 72 161 80 158 87 ...
##  $ hourspw: num  16.8 22.5 17.2 21.5 15.7 ...
##  $ inv1   : num  17167 17167 292857 22207 22207 ...
##  $ inv2   : num  27177 27177 71571 15000 10000 ...
##  $ ssize  : int  170 450 300 260 50 90 400 100 450 75 ...
##  $ start  : num  41 39 40 40 44 41 39 28 41 37 ...

We are going to use the “inv2” variable as our predictor. This variable measures the investment in automation by a particular store. We will now run our polynomial regression model.

fit<-lm(tsales~poly(inv2,5),data = Clothing)
summary(fit)
## 
## Call:
## lm(formula = tsales ~ poly(inv2, 5), data = Clothing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -946668 -336447  -96763  184927 3599267 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      833584      28489  29.259  < 2e-16 ***
## poly(inv2, 5)1  2391309     569789   4.197 3.35e-05 ***
## poly(inv2, 5)2  -665063     569789  -1.167   0.2438    
## poly(inv2, 5)3    49793     569789   0.087   0.9304    
## poly(inv2, 5)4  1279190     569789   2.245   0.0253 *  
## poly(inv2, 5)5  -341189     569789  -0.599   0.5497    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 569800 on 394 degrees of freedom
## Multiple R-squared:  0.05828,    Adjusted R-squared:  0.04633 
## F-statistic: 4.876 on 5 and 394 DF,  p-value: 0.0002428

The code above should be mostly familiar. We use the “lm” function as normal for regression. However, we then used the “poly” function on the “inv2” variable. What this does is runs our model 5 times (5 is the number next to “inv2”). Each time a different polynomial is used from 1 (no polynomial) to 5 (5th order polynomial). The results indicate that the 4th-degree polynomial is significant.

We now will prepare a visual of the results but first, there are several things we need to prepare. First, we want to find what the range of our predictor variable “inv2” is and we will save this information in a grade. The code is below.

inv2lims<-range(Clothing$inv2)

Second, we need to create a grid that has all the possible values of “inv2” from the lowest to the highest broken up in intervals of one. Below is the code.

inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])

We now have a dataset with almost 400000 data points in the “inv2.grid” object through this approach. We will now use these values to predict “tsales.” We also want the standard errors so we se “se” to TRUE

preds<-predict(fit,newdata=list(inv2=inv2.grid),se=TRUE)

We now need to find the confidence interval for our regression line. This is done by making a dataframe that takes the predicted fit adds or subtracts 2 and multiples this number by the standard error as shown below.

se.bands<-cbind(preds$fit+2*preds$se.fit,preds$fit-2*preds$se.fit)

With these steps completed, we are ready to create our civilization.

To make our visual, we use the “plot” function on the predictor and outcome. Doing this gives us a plot without a regression line. We then use the “lines” function to add the polynomial regression line, however, this line is based on the “inv2.grid” object (40,000 observations) and our predictions. Lastly, we use the “matlines” function to add the confidence intervals we found and stored in the “se.bands” object.

plot(Clothing$inv2,Clothing$tsales)
lines(inv2.grid,preds$fit,lwd=4,col='blue')
matlines(inv2.grid,se.bands,lwd = 4,col = "yellow",lty=4)

1.png

Conclusion

You can clearly see the curvature of the line. Which helped to improve model fit. Now any of you can tell that we are fitting this line to mostly outliers. This is one reason we the standard error gets wider and wider it is because there are fewer and fewer observations on which to base it. However, for demonstration purposes, this is a clear example of the power of polynomial regression.

Partial Least Squares Regression in R

Partial least squares regression is a form of regression that involves the development of components of the original variables in a supervised way. What this means is that the dependent variable is used to help create the new components form the original variables. This means that when pls is used the linear combination of the new features helps to explain both the independent and dependent variables in the model.

In this post, we will use predict “income” in the “Mroz” dataset using pls. Below is some initial code.

library(pls);library(Ecdat)
data("Mroz")
str(Mroz)
## 'data.frame':    753 obs. of  18 variables:
##  $ work      : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...
##  $ hoursw    : int  1610 1656 1980 456 1568 2032 1440 1020 1458 1600 ...
##  $ child6    : int  1 0 1 0 1 0 0 0 0 0 ...
##  $ child618  : int  0 2 3 3 2 0 2 0 2 2 ...
##  $ agew      : int  32 30 35 34 31 54 37 54 48 39 ...
##  $ educw     : int  12 12 12 12 14 12 16 12 12 12 ...
##  $ hearnw    : num  3.35 1.39 4.55 1.1 4.59 ...
##  $ wagew     : num  2.65 2.65 4.04 3.25 3.6 4.7 5.95 9.98 0 4.15 ...
##  $ hoursh    : int  2708 2310 3072 1920 2000 1040 2670 4120 1995 2100 ...
##  $ ageh      : int  34 30 40 53 32 57 37 53 52 43 ...
##  $ educh     : int  12 9 12 10 12 11 12 8 4 12 ...
##  $ wageh     : num  4.03 8.44 3.58 3.54 10 ...
##  $ income    : int  16310 21800 21040 7300 27300 19495 21152 18900 20405 20425 ...
##  $ educwm    : int  12 7 12 7 12 14 14 3 7 7 ...
##  $ educwf    : int  7 7 7 7 14 7 7 3 7 7 ...
##  $ unemprate : num  5 11 5 5 9.5 7.5 5 5 3 5 ...
##  $ city      : Factor w/ 2 levels "no","yes": 1 2 1 1 2 2 1 1 1 1 ...
##  $ experience: int  14 5 15 6 7 33 11 35 24 21 ...

First, we must prepare our data by dividing it into a training and test set. We will do this by doing a 50/50 split of the data.

set.seed(777)
train<-sample(c(T,F),nrow(Mroz),rep=T) #50/50 train/test split
test<-(!train)

In the code above we set the “set.seed function in order to assure reduplication. Then we created the “train” object and used the “sample” function to make a vector with ‘T’ and ‘F’ based on the number of rows in “Mroz”. Lastly, we created the “test”” object base don everything that is not in the “train” object as that is what the exclamation point is for.

Now we create our model using the “plsr” function from the “pls” package and we will examine the results using the “summary” function. We will also scale the data since this the scale affects the development of the components and use cross-validation. Below is the code.

set.seed(777)
pls.fit<-plsr(income~.,data=Mroz,subset=train,scale=T,validation="CV")
summary(pls.fit)
## Data:    X dimension: 392 17 
##  Y dimension: 392 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           11218     8121     6701     6127     5952     5886     5857
## adjCV        11218     8114     6683     6108     5941     5872     5842
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        5853     5849     5854      5853      5853      5852      5852
## adjCV     5837     5833     5837      5836      5836      5835      5835
##        14 comps  15 comps  16 comps  17 comps
## CV         5852      5852      5852      5852
## adjCV      5835      5835      5835      5835
## 
## TRAINING: % variance explained
##         1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X         17.04    26.64    37.18    49.16    59.63    64.63    69.13
## income    49.26    66.63    72.75    74.16    74.87    75.25    75.44
##         8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X         72.82    76.06     78.59     81.79     85.52     89.55     92.14
## income    75.49    75.51     75.51     75.52     75.52     75.52     75.52
##         15 comps  16 comps  17 comps
## X          94.88     97.62    100.00
## income     75.52     75.52     75.52

The printout includes the root mean squared error for each of the components in the VALIDATION section as well as the variance explained in the TRAINING section. There are 17 components because there are 17 independent variables. You can see that after component 3 or 4 there is little improvement in the variance explained in the dependent variable. Below is the code for the plot of these results. It requires the use of the “validationplot” function with the “val.type” argument set to “MSEP” Below is the code

validationplot(pls.fit,val.type = "MSEP")

1.png

We will do the predictions with our model. We use the “predict” function, use our “Mroz” dataset but only those index in the “test” vector and set the components to three based on our previous plot. Below is the code.

set.seed(777)
pls.pred<-predict(pls.fit,Mroz[test,],ncomp=3)

After this, we will calculate the mean squared error. This is done by subtracting the results of our predicted model from the dependent variable of the test set. We then square this information and calculate the mean. Below is the code

mean((pls.pred-Mroz$income[test])^2)
## [1] 63386682

As you know, this information is only useful when compared to something else. Therefore, we will run the data with a tradition least squares regression model and compare the results.

set.seed(777)
lm.fit<-lm(income~.,data=Mroz,subset=train)
lm.pred<-predict(lm.fit,Mroz[test,])
mean((lm.pred-Mroz$income[test])^2)
## [1] 59432814

The least squares model is slightly better then our partial least squares model but if we look at the model we see several variables that are not significant. We will remove these see what the results are

summary(lm.fit)
## 
## Call:
## lm(formula = income ~ ., data = Mroz, subset = train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20131  -2923  -1065   1670  36246 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.946e+04  3.224e+03  -6.036 3.81e-09 ***
## workno      -4.823e+03  1.037e+03  -4.651 4.59e-06 ***
## hoursw       4.255e+00  5.517e-01   7.712 1.14e-13 ***
## child6      -6.313e+02  6.694e+02  -0.943 0.346258    
## child618     4.847e+02  2.362e+02   2.052 0.040841 *  
## agew         2.782e+02  8.124e+01   3.424 0.000686 ***
## educw        1.268e+02  1.889e+02   0.671 0.502513    
## hearnw       6.401e+02  1.420e+02   4.507 8.79e-06 ***
## wagew        1.945e+02  1.818e+02   1.070 0.285187    
## hoursh       6.030e+00  5.342e-01  11.288  < 2e-16 ***
## ageh        -9.433e+01  7.720e+01  -1.222 0.222488    
## educh        1.784e+02  1.369e+02   1.303 0.193437    
## wageh        2.202e+03  8.714e+01  25.264  < 2e-16 ***
## educwm      -4.394e+01  1.128e+02  -0.390 0.697024    
## educwf       1.392e+02  1.053e+02   1.322 0.186873    
## unemprate   -1.657e+02  9.780e+01  -1.694 0.091055 .  
## cityyes     -3.475e+02  6.686e+02  -0.520 0.603496    
## experience  -1.229e+02  4.490e+01  -2.737 0.006488 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5668 on 374 degrees of freedom
## Multiple R-squared:  0.7552, Adjusted R-squared:  0.744 
## F-statistic: 67.85 on 17 and 374 DF,  p-value: < 2.2e-16
set.seed(777)
lm.fit<-lm(income~work+hoursw+child618+agew+hearnw+hoursh+wageh+experience,data=Mroz,subset=train)
lm.pred<-predict(lm.fit,Mroz[test,])
mean((lm.pred-Mroz$income[test])^2)
## [1] 57839715

As you can see the error decreased even more which indicates that the least squares regression model is superior to the partial least squares model. In addition, the partial least squares model is much more difficult to explain because of the use of components. As such, the least squares model is the favored one.

Story Grammar Components

When people tell a story, whether orally or in a movie, there are certain characteristics that seem to appears in stories as determined by culture which children attempt to imitate when they tell a story. These traits are called story grammar components and include the following

  • Setting statement
  • Initiating event
  • Internal response
  • Internal plan
  • Attempt
  • Direct Consequence
  • Reaction

This post will explore each of these characteristics of a story.

Setting Statement

The setting statement introduces the character of the story and often identifies who the “good guy” and “bad guy” are. Many movies do this from Transformers to any X-men movie. In the first 10-15 minutes, the characters are introduced and the background is explained. For example, in the classic story “The Three Little Pig” the story begins by telling you there was a wolf and three pigs.

Initiating Event

The initiating event is the catalyst to get the characters to do something. For example, in the “Three Little Pigs” the pigs need shelter. In other words, the initiating event introduces the problem that the characters need to overcome during the story.

Internal Response

The internal response is the characters reaction to the initiating event. The response can talk many forms such as emotional. For example, the pigs get excited when they see they need shelter. Generally, the internal response provides motivation to do something.

Internal Plan

The internal plan is what the characters will do to overcome the initiating event problem. For the pigs, the plan was to each build a house to prepare for the wolf.

Attempt

The attempt is the action that helps the characters to reach their goal. This is the step in which the internal plan is put into action. Therefore, for the pigs, it is the actual construction of their houses.

Direct Consequence

At this step, the story indicates if the attempt was successful or not. For the pigs, this is where things are complicated. Of the three pigs, two were unsuccessful and only one was successful. Success is determined by who is the protagonist and the antagonist. As such, if the wolf is the protagonist the success would be two and the failure one.

Reaction

The reaction is the character’s response to the direct consequence. For the two unsuccessful pigs, there was no reaction because they were eaten by the wolf. However, for the last pig, he was able to live safely after his home protected him.

Conclusion

Even small children will have several of these components in their storytelling. However, it is important to remember that the components are not required in a story nor do they have to follow the order specified here. Instead,  this is a broad generalize way of how people communicate through storytelling.

Principal Component Regression in R

This post will explain and provide an example of principal component regression (PCR). Principal component regression involves having the model construct components from the independent variables that are a linear combination of the independent variables. This is similar to principal component analysis but the components are designed in a way to best explain the dependent variable. Doing this often allows you to use fewer variables in your model and usually improves the fit of your model as well.

Since PCR is based on principal component analysis it is an unsupervised method, which means the dependent variable has no influence on the development of the components. As such, there are times when the components that are developed may not be beneficial for explaining the dependent variable.

Our example will use the “Mroz” dataset from the “Ecdat” package. Our goal will be to predict “income” based on the variables in the dataset. Below is the initial code

library(pls);library(Ecdat)
data(Mroz)
str(Mroz)
## 'data.frame':    753 obs. of  18 variables:
##  $ work      : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...
##  $ hoursw    : int  1610 1656 1980 456 1568 2032 1440 1020 1458 1600 ...
##  $ child6    : int  1 0 1 0 1 0 0 0 0 0 ...
##  $ child618  : int  0 2 3 3 2 0 2 0 2 2 ...
##  $ agew      : int  32 30 35 34 31 54 37 54 48 39 ...
##  $ educw     : int  12 12 12 12 14 12 16 12 12 12 ...
##  $ hearnw    : num  3.35 1.39 4.55 1.1 4.59 ...
##  $ wagew     : num  2.65 2.65 4.04 3.25 3.6 4.7 5.95 9.98 0 4.15 ...
##  $ hoursh    : int  2708 2310 3072 1920 2000 1040 2670 4120 1995 2100 ...
##  $ ageh      : int  34 30 40 53 32 57 37 53 52 43 ...
##  $ educh     : int  12 9 12 10 12 11 12 8 4 12 ...
##  $ wageh     : num  4.03 8.44 3.58 3.54 10 ...
##  $ income    : int  16310 21800 21040 7300 27300 19495 21152 18900 20405 20425 ...
##  $ educwm    : int  12 7 12 7 12 14 14 3 7 7 ...
##  $ educwf    : int  7 7 7 7 14 7 7 3 7 7 ...
##  $ unemprate : num  5 11 5 5 9.5 7.5 5 5 3 5 ...
##  $ city      : Factor w/ 2 levels "no","yes": 1 2 1 1 2 2 1 1 1 1 ...
##  $ experience: int  14 5 15 6 7 33 11 35 24 21 ...

Our first step is to divide our dataset into a train and test set. We will do a simple 50/50 split for this demonstration.

train<-sample(c(T,F),nrow(Mroz),rep=T) #50/50 train/test split
test<-(!train)

In the code above we use the “sample” function to create a “train” index based on the number of rows in the “Mroz” dataset. Basically, R is making a vector that randomly assigns different rows in the “Mroz” dataset to be marked as True or False. Next, we use the “train” vector and we assign everything or every number that is not in the “train” vector to the test vector by using the exclamation mark.

We are now ready to develop our model. Below is the code

set.seed(777)
pcr.fit<-pcr(income~.,data=Mroz,subset=train,scale=T,validation="CV")

To make our model we use the “pcr” function from the “pls” package. The “subset” argument tells r to use the “train” vector to select examples from the “Mroz” dataset. The “scale” argument makes sure everything is measured the same way. This is important when using a component analysis tool as variables with different scale have a different influence on the components. Lastly, the “validation” argument enables cross-validation. This will help us to determine the number of components to use for prediction. Below is the results of the model using the “summary” function.

summary(pcr.fit)
## Data:    X dimension: 381 17 
##  Y dimension: 381 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           12102    11533    11017     9863     9884     9524     9563
## adjCV        12102    11534    11011     9855     9878     9502     9596
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        9149     9133     8811      8527      7265      7234      7120
## adjCV     9126     9123     8798      8877      7199      7172      7100
##        14 comps  15 comps  16 comps  17 comps
## CV         7118      7141      6972      6992
## adjCV      7100      7123      6951      6969
## 
## TRAINING: % variance explained
##         1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X        21.359    38.71    51.99    59.67    65.66    71.20    76.28
## income    9.927    19.50    35.41    35.63    41.28    41.28    46.75
##         8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X         80.70    84.39     87.32     90.15     92.65     95.02     96.95
## income    47.08    50.98     51.73     68.17     68.29     68.31     68.34
##         15 comps  16 comps  17 comps
## X          98.47     99.38    100.00
## income     68.48     70.29     70.39

There is a lot of information here.The VALIDATION: RMSEP section gives you the root mean squared error of the model broken down by component. The TRAINING section is similar the printout of any PCA but it shows the amount of cumulative variance of the components, as well as the variance, explained for the dependent variable “income.” In this model, we are able to explain up to 70% of the variance if we use all 17 components.

We can graph the MSE using the “validationplot” function with the argument “val.type” set to “MSEP”. The code is below.

validationplot(pcr.fit,val.type = "MSEP")

1

How many components to pick is subjective, however, there is almost no improvement beyond 13 so we will use 13 components in our prediction model and we will calculate the means squared error.

set.seed(777)
pcr.pred<-predict(pcr.fit,Mroz[test,],ncomp=13)
mean((pcr.pred-Mroz$income[test])^2)
## [1] 48958982

MSE is what you would use to compare this model to other models that you developed. Below is the performance of a least squares regression model

set.seed(777)
lm.fit<-lm(income~.,data=Mroz,subset=train)
lm.pred<-predict(lm.fit,Mroz[test,])
mean((lm.pred-Mroz$income[test])^2)
## [1] 47794472

If you compare the MSE the least squares model performs slightly better than the PCR one. However, there are a lot of non-significant features in the model as shown below.

summary(lm.fit)
## 
## Call:
## lm(formula = income ~ ., data = Mroz, subset = train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -27646  -3337  -1387   1860  48371 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.215e+04  3.987e+03  -5.556 5.35e-08 ***
## workno      -3.828e+03  1.316e+03  -2.909  0.00385 ** 
## hoursw       3.955e+00  7.085e-01   5.582 4.65e-08 ***
## child6       5.370e+02  8.241e+02   0.652  0.51512    
## child618     4.250e+02  2.850e+02   1.491  0.13673    
## agew         1.962e+02  9.849e+01   1.992  0.04709 *  
## educw        1.097e+02  2.276e+02   0.482  0.63013    
## hearnw       9.835e+02  2.303e+02   4.270 2.50e-05 ***
## wagew        2.292e+02  2.423e+02   0.946  0.34484    
## hoursh       6.386e+00  6.144e-01  10.394  < 2e-16 ***
## ageh        -1.284e+01  9.762e+01  -0.132  0.89542    
## educh        1.460e+02  1.592e+02   0.917  0.35982    
## wageh        2.083e+03  9.930e+01  20.978  < 2e-16 ***
## educwm       1.354e+02  1.335e+02   1.014  0.31115    
## educwf       1.653e+02  1.257e+02   1.315  0.18920    
## unemprate   -1.213e+02  1.148e+02  -1.057  0.29140    
## cityyes     -2.064e+02  7.905e+02  -0.261  0.79421    
## experience  -1.165e+02  5.393e+01  -2.159  0.03147 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6729 on 363 degrees of freedom
## Multiple R-squared:  0.7039, Adjusted R-squared:   0.69 
## F-statistic: 50.76 on 17 and 363 DF,  p-value: < 2.2e-16

Removing these and the MSE is almost the same for the PCR and least square models

set.seed(777)
lm.fit2<-lm(income~work+hoursw+hearnw+hoursh+wageh,data=Mroz,subset=train)
lm.pred2<-predict(lm.fit2,Mroz[test,])
mean((lm.pred2-Mroz$income[test])^2)
## [1] 47968996

Conclusion

Since the least squares model is simpler it is probably the superior model. PCR is strongest when there are a lot of variables involve and if there are issues with multicollinearity.

Accommodation Theory

Accommodation theory attempts to explain how people adjust the way they talk depending on who the audience is. Generally, there are two ways in which a person can adjust their speech. The two ways are convergence and divergence. In this post, we will look at these two ways of accommodating.

Speech Convergence

Converging is when you change the way you talk about sound more like the person you are talking to. This is seen as polite in many cultures and signals that you are accepting the person who is talking.

There are many different ways in which convergence can take place. The speaker may begin to use similar vocabulary. Another way is to imitate the pronunciation of the person you are talking to. Another common way os to translate technical jargon into simpler English.

Speech Divergence

Speech divergence is often seen as the opposite of speech convergence. Speech divergence is deliberately selecting a style of language different from the speaker. This often communicates dissatisfaction with the person you are speaking with. For example, most teenagers deliberately speak differently from their parents. This serves a role in their identifying with peers and to distances from their parents.

However, a slight divergence is expected of non-native speakers. Many people enjoy the accents of athletes and actresses. To have perfect control of two languages is at times seen negatively in some parts of the world.

A famous example of speech divergence is the speaking of former Federal Reserve Chairman Alan Greenspan and his ‘Fedspeak.’ Fedspeak was used whenever Greenspan appears before Congress or made announcements about changing the Federal Reserve interest rate. The goal of this form of communication was to sound as divergent and incoherent as possible below is an example.

The members of the Board of Governors and the Reserve Bank presidents foresee an implicit strengthening of activity after the current rebalancing is over, although the central tendency of their individual forecasts for real GDP still shows a substantial slowdown, on balance, for the year as a whole.

Make little sense unless you have an MBA in finance. It sounds like he sees no change in the growth of the economy

The reason behind this mysterious form of communication was that people place a strong emphasis on whatever the Federal Reserve and Alan Greenspan said. This led to swings in the stock market. To prevent this,  Greenspan diverged his language to make it as confusing as possible to avoid massive changes in the stock market

Conclusion 

When communicating we can choose to adapt ourselves are deliberately diverge. Which choice we choose depends a great deal on the context that we find ourselves end