Videoconferencing in Online Course

Videoconferencing is a standard aspect of the professional world. Most large companies have some sort of video conferencing happening in terms of meetings and training. In terms of personal life, video conferencing is common as well. We probably have all used skype or google hangout at one time or another to talk with friends. However, video conferencing is not as common in education.

Video Conferencing Before Video Conferencing

Before video conferencing became common, many educators would upload videos to their online course or post them on youtube. This allowed the student to see the teacher and have more of a traditional classroom experience but real-time interaction was impossible. Instead, the interaction was asynchronous meaning not at the same time. As such communication was jilted at the least because of the lag time between interactions.

Things to Consider Before Video Conferencing

In order to have success with video conferencing you will need some sort of application that allows this. There are many different applications to choose from such as skype, google hangouts, and even Facebook. However, you want some sort of software that allows you to show your screen as well as control the flow of the conversation.

One app that allows this is called Zoom. This software allows you to schedule meetings. In addition, students do not need to download anything. Instead, the students are sent a web link that takes them to the online meeting. You can share your screen as well as monitor the discussion with the added benefit of being able to record the meeting for future use.

Pros and Cons of Video Conferencing

For whatever reason, video conferencing is engaging for students. The same discussion in class would lull them to sleep but through webcams, everyone is awake and stimulated. I am not sure what the difference is but this has been my experience

The biggest enemy to video conferencing is scheduling. This is particularly true if students are spread out all over the world. The challenges of time zones and other commitments make this hard.

This is one reason that recording a video conference is so important. It allows students who are not available to at least have an asynchronous learning experience. It also serves as a resource for students who need to see something again. Keep in mind you have to post the video either on your LMS or on youtube so that students have access to it.

Conclusion

Video conferencing provides a familiar learning experience in a different setting. It is able to give students who are not physically present an opportunity to interact with the instructor in meaningful ways. As such, the instructor must be aware of possibilities in how to use this tool in their online teaching.

Advertisements

Maintaining Student Focus During E-Learniing

Self-motivation is perhaps one of the biggest problems in e-learning. Students who are left to themselves to complete learning experience often just do not successfully finish the learning experiences prepared by the teacher. For whatever reason, often the internal drive to finish something such as an online class is missing for many people.

There are several strategies that an online teacher can use in order to help students who may struggle with self-motivation in an online context. These ideas include…

  • Brief Lessons
  • Frequency Assessment
  • Collaboration

Brief Lessons

Nothing is more discouraging to many students than having to read several pages of text or even several hours of video to complete a single lesson or module in an online course. Therefore, the teacher needs to make sure lessons are short. Completing many small lessons is much more motivating for many students than completing a handful of really large lessons. This is because frequent completion of small lessons is rewarding and indicates progress which the brain rewards.

How long a lesson should depend on many factors such as the age and expertise of the students. Therefore, it is difficult to give a single magic number to aim for. You want to avoid the extreme of lessons too short and lessons to long.

IN my own experience most people make their lessons too long so the majority of us probably need to reduce the content in an individual lesson and spread it over many lessons. All the content can be there it is just chunked differently so that students experience progress.

Frequency Assessment

Along with brief lessons should be frequent assessment. Nothing motivates like knowing something is going to be on the quiz or there is some sort of immediate application. Students need to do something with what they are learning in order to stay engaged. Therefore, constant assessment is not only for grades but also for learning. Besides the stress of a small quiz provides an emotional stimulus that many students need

The assessment also allows for feedback which helps the student to monitor their learning. In addition, the feedback provides more evidence of progress being made in a course which is itself motivating for many.

Collaboration

Nothing motivates the same as working together. Many people love to work in groups and get energy from this. In addition, it’s harder to quit and give a course when you have group members waiting for your contribution. In addition, interacting with students deepens understanding of the course material.

Communicating with other students online to complete assignments is one way of establishing community in an online class. It is similar to traditional classroom where everyone has to discuss and work together to have success.

Conclusion

Motivated students are successful students. IN order for this to happen in an elearning class studnets need to be engaged through brief lessons that inckude frequent assessment tjat includes social interaction.

TIps for Online Studying

Today it is common for students to study online. This has both pros and cons to it. Although e-learning allows students to study anytime and anywhere it also can lead to a sense of disconnection and frustration. This post will provide some suggestions for how to study online successfully.

Make a Schedule

In a traditional classroom, there is a fixed time to come to class. This regulated discipline helps many students to reach a minimum standard of knowledge even if they never study on their own. In e-learning, the student can study whenever they want. Sadly, many choose to never study which leads to academic failure.

Success in online studying requires a disciplined schedule in which the student determines when they will study as well as what they will do during the study time. As such, you will need to set-up some sort of a calendar and to do list that guides you through the learning experience.

It is also important to pace your studying. With flexible courses sometimes the assignments are due at the end of the course. This temptation leads to students who will do all their studying at the last minute. This robs the student of in-depth learning as well as the ability to complete assignment thoroughly. Learning happens best over time and not at the last minute,

Participate

In a traditional class, there are often opportunities to participate in class discussions or question and answer sessions. Such opportunities provide students with a chance to develop a deeper understanding of the ideas and content of the course. Students who actually participate in such two-way dialog usually understand the material of the course better than students who do not.

For the online student participation is also important and can render the same benefits. Participating in forums and chats will deepen understanding. However, I must admit that with the text-heavy nature of online forums reading the comments of peers can in many ways boost understanding without participation. This is because you can read other’s ideas at your own speed which helps with comprehension. This is not possible during an in-class discussion when people may move faster than you can handle.

Communicate with the Instructor

When a student is confused they need to speak up. For some reason, students are often shy to contact the instructor in an online course. However, the teacher is there to help you and expects questions and feedback. As such, reach to them.

Communicating with the instructor also helps to establish a sense of community which is important in online learning. It helps the instructor to establish presence and demonstrates that they are here to help you to succeed.

Conclusion

E-learning is a major component of the future of learning. Therefore, students need to be familiar with what they need to do in order to be successful in their online studies.

Tips for Teaching Online

Teaching online is a unique experience due in part to the platform of instruction. Often, there is no face to face interaction and all communication is in some sort of digital format. Although this can be a rewarding experience there are still several things to consider when teaching in this format. Some tips for successful online teaching include the following.

  • Planning in advance
  • Having a presence
  • Knowing your technology
  • Being consistent

Plan in Advance

All teaching involves advance planning. However, there are those teaching moments in a regular classroom where a teacher can change midstream to hit a particular interest in the class. In addition, more experienced teachers tend to plan less as they are so comfortable with the content and have an intuitive sense of how to support students.

In online teaching, the entire course should be planned and laid out accordingly before the course starts. It is a nightmare to try and develop course material while trying to teach online. This is partially due to the fact that there are so many reminders and due dates sprinkled throughout the course that are inflexible. This means a teacher must know the end from the beginning in terms of what the curriculum covers and what assignments are coming. Changing midstream is really tough.

In addition, the asynchronous nature of online teaching means that instructional material must be thoroughly clear or students will be lost. This again places an emphasis on strong preparation. Online teaching isn’t really for the person who likes to live in the moment but rather for the person who plans ahead.

Have Presence

Having presence means making clear that you are monitoring progress and communicating with students frequently. When students complete assignments they should receive feedback. There should be announcements made in terms of assignments due, general feedback about activities, as well as Q&A with students.

Many people think that teaching online takes less time and can have larger classes. This is far from the case. Online teaching is as time intensive as regular teaching because you must provide feedback and communication or the students will often feel abandon.

Know Your Technology

An online teacher must be familiar and a proponent of technology. This does not mean that you know everything but rather you know how to get stuff done. You don’t need a master in web design but knowing the basics of HTML can really help when communicating with the IT people.

Whatever learning management system you use should actually be familiar with it and not just a consumer. Too many people just upload text for students to read and provide several forums and call that online learning. In many ways, that’s online boredom, especially for younger students.

Consistency

Consistency is about the user experience. The different modules in the course should have the same format with different activities. This way, students focus on learning and not trying to figure out what you want them to do. This applies across classes as well. There needs to be some sense of stability in terms of how content is delivered. There is no single best way but it needs to similar within and across courses for the sake of learning.

Conclusion

These are just some of many ideas to consider when teaching an online course. The main point is the need for preparation and dedication when teaching online.

Blended Learning Defined

E-Learning is commonly used tool at most educational institutions. Often, the e-learning platform is fully online or a traditional model of face-to-face instruction is used. Blended learning is something that is available but not as clear in terms of what to do.

In this post, we will look at what  blended learning is and  what it is not

What Blended Learning is

Blended learning is an instructional environment in which online learning and traditional face-to-face instruction coexist and are employed in a course. There are at least six common models of blended learning.

  • Face-to-face driver – Traditional instruction is supported by online materials
  • Online driver –The entire course is completed online with teacher support made available
  • Rotation – A course in which students cycle back and forth between online and traditional instruction
  • Labs – Content is delivered online but in a specific location such as a computer lab on-campus
  • Flex – Most of the curriculum is delivered is online and the teacher is available for face-to-face consultation.
  • Self-blend – Students choose to augment their traditional learning experience with online coursework.

These models mentioned above can be used in combination with each other and are not mutually exclusive.

For a course to be blended, it is probably necessary for at least some sort of learning to happen online. The challenge is in defining learning. For example, the Moodle platform places an emphasis on constructivism. As such, there are a lot of opportunities for collaboration in the use of the modules available in Moodle. Through discussion and interaction with other students through forums, commenting on videos, etc., students are able to demonstrate learning.

For a more individualistic experience, if the course is blended the students need to do something online. For example, completing a quiz, add material to a wiki or database, etc. are all ways to show that learning is taking place without as much collaboration. However, a teacher chooses to incorporate blended learning the students need to do something online for it to truly be blended.

What Blended Learning is not

Many teachers will post there powerpoints online and have students submit assignments online and call this blended learning. While it is commendable that online tools are being used this is not really blended learning because there is no learning taking place anytime online. Rather this is an excellent example of using cloud sources to upload and download materials.

The powerpoints were seen in class and are available for review.  Uploading assignments are trickier to classify as online learning or not but if it required the students to complete a traditional assignment and simply upload it then there was no real online learning experience. The students neither collaborated nor completed anything online in order to complete this learning experience.

Conclusion

The definition here is not exhaustive. The purpose was to provide a flexible framework in which blended learning is possible. To make it as simple as possible, blended learning is the students actively learning online and actively learning in a traditional format. How much of each component depends on the approach of the teacher.

Benefits of Writing

There are many reasons that a person or student should learn to master the craft of writing in some form or genre. Of course, the average person knows how to write if they have a k-12 education but here it is meant excelling at writing beyond introductory basics. As such, in this post, we will look at the following benefits of learning to write

  • Makes you a better reader and listener
  • Enhances communication skills
  • Develops thinking skills

Improved Reading and Listening Skills

There seems to be an interesting feedback loop between reading and writing. Avid readers are often good writers and avid writers are often good readers. Reading allows you to observe how others write and communicate. This, in turn, can inspire your own writing. It’s similar to how children copy the behavior of the people around them. When you write it is natural to bring with you the styles you have experienced through reading.

Writing also improves listening skills, however, this happens through the process of listening to others through reading. By reading we have to assess and evaluate the arguments of the author. This can only happen through listening to the author through reading his work.

Communication Skills

Writing, regardless of genre, involves finding an audience and sharing your own ideas in a way that is clear to them. As such, writing natural enhances communication skills This is because of the need to identify the purpose or reason you are writing as well as how you will share your message.

When writing is unclear it is often because the writer has targeted the wrong audience or has an unclear purpose for writing. A common reason research articles are rejected is that the editor is convinced that the article is not appropriate for the journal’s audience. Therefore, it is critical that an author knows there audience.

Thinking Skills 

In relation to communication skills is thinking skills. Writing involves taking information in one medium, the thoughts in your head, and placing them in another medium, words on paper. Whenever content moves from one medium to another there is a loss in meaning. This is why for many people, there writing makes sense to them but to no one else.

Therefore, a great deal of thought must be placed into writing with clarity. You have to structure the thesis/purpose statement, main ideas, and supporting details. Not to mention that you will often need references and need to adhere to some form of formatting. All this must be juggled while delivering content that critically stimulating.

Conclusion 

Writing is a vehicle of communication that is not used as much as it used to be. There are so many other forms of communication and interaction that something writing is obsolete. However, though the communication may change, the benefits of writing are still available.

Local Regression in R

Local regression uses something similar to nearest neighbor classification to generate a regression line. In local regression, nearby observations are used to fit the line rather than all observations. It is necessary to indicate the percentage of the observations you want R to use for fitting the local line. The name for this hyperparameter is the span. The higher the span the smoother the line becomes.

Local regression is great one there are only a handful of independent variables in the model. When the total number of variables becomes too numerous the model will struggle. As such, we will only fit a bivariate model. This will allow us to process the model and to visualize it.

In this post, we will use the “Clothing” dataset from the “Ecdat” package and we will examine innovation (inv2) relationship with total sales (tsales). Below is some initial code.

library(Ecdat)
data(Clothing)
str(Clothing)
## 'data.frame':    400 obs. of  13 variables:
##  $ tsales : int  750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
##  $ sales  : num  4412 4281 4167 2670 15000 ...
##  $ margin : num  41 39 40 40 44 41 39 28 41 37 ...
##  $ nown   : num  1 2 1 1 2 ...
##  $ nfull  : num  1 2 2 1 1.96 ...
##  $ npart  : num  1 3 2.22 1.28 1.28 ...
##  $ naux   : num  1.54 1.54 1.41 1.37 1.37 ...
##  $ hoursw : int  76 192 114 100 104 72 161 80 158 87 ...
##  $ hourspw: num  16.8 22.5 17.2 21.5 15.7 ...
##  $ inv1   : num  17167 17167 292857 22207 22207 ...
##  $ inv2   : num  27177 27177 71571 15000 10000 ...
##  $ ssize  : int  170 450 300 260 50 90 400 100 450 75 ...
##  $ start  : num  41 39 40 40 44 41 39 28 41 37 ...

There is no data preparation in this example. The first thing we will do is fit two different models that have different values for the span hyperparameter. “fit” will have a span of .41 which means it will use 41% of the nearest examples. “fit2” will use .82. Below is the code.

fit<-loess(tsales~inv2,span = .41,data = Clothing)
fit2<-loess(tsales~inv2,span = .82,data = Clothing)

In the code above, we used the “loess” function to fit the model. The “span” argument was set to .41 and .82.

We now need to prepare for the visualization. We begin by using the “range” function to find the distance from the lowest to the highest value. Then use the “seq” function to create a grid. Below is the code.

inv2lims<-range(Clothing$inv2)
inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])

The information in the code above is for setting our x-axis in the plot. We are now ready to fit our model. We will fit the models and draw each regression line.

plot(Clothing$inv2,Clothing$tsales,xlim=inv2lims)
lines(inv2.grid,predict(fit,data.frame(inv2=inv2.grid)),col='blue',lwd=3)
lines(inv2.grid,predict(fit2,data.frame(inv2=inv2.grid)),col='red',lwd=3)

1

Not much difference in the two models. For our final task, we will predict with our “fit” model using all possible values of “inv2” and also fit the confidence interval lines.

pred<-predict(fit,newdata=inv2.grid,se=T)
plot(Clothing$inv2,Clothing$tsales)
lines(inv2.grid,pred$fit,col='red',lwd=3)
lines(inv2.grid,pred$fit+2*pred$se.fit,lty="dashed",lwd=2,col='blue')
lines(inv2.grid,pred$fit-2*pred$se.fit,lty="dashed",lwd=2,col='blue')

1

Conclusion

Local regression provides another way to model complex non-linear relationships in low dimensions. The example here provides just the basics of how this is done is much more complicated than described here.

Smoothing Splines in R

This post will provide information on smoothing splines. Smoothing splines are used in regression when we want to reduce the residual sum of squares by adding more flexibility to the regression line without allowing too much overfitting.

In order to do this, we must tune the parameter called the smoothing spline. The smoothing spline is essentially a natural cubic spline with a knot at every unique value of x in the model. Having this many knots can lead to severe overfitting. This is corrected for by controlling the degrees of freedom through the parameter called lambda. You can manually set this value or select it through cross-validation.

We will now look at an example of the use of smoothing splines with the “Clothing” dataset from the “Ecdat” package. We want to predict “tsales” based on the use of innovation in the stores. Below is some initial code.

library(Ecdat)
data(Clothing)
str(Clothing)
## 'data.frame':    400 obs. of  13 variables:
##  $ tsales : int  750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
##  $ sales  : num  4412 4281 4167 2670 15000 ...
##  $ margin : num  41 39 40 40 44 41 39 28 41 37 ...
##  $ nown   : num  1 2 1 1 2 ...
##  $ nfull  : num  1 2 2 1 1.96 ...
##  $ npart  : num  1 3 2.22 1.28 1.28 ...
##  $ naux   : num  1.54 1.54 1.41 1.37 1.37 ...
##  $ hoursw : int  76 192 114 100 104 72 161 80 158 87 ...
##  $ hourspw: num  16.8 22.5 17.2 21.5 15.7 ...
##  $ inv1   : num  17167 17167 292857 22207 22207 ...
##  $ inv2   : num  27177 27177 71571 15000 10000 ...
##  $ ssize  : int  170 450 300 260 50 90 400 100 450 75 ...
##  $ start  : num  41 39 40 40 44 41 39 28 41 37 ...

We are going to create three models. Model one will have 70 degrees of freedom, model two will have 7, and model three will have the number of degrees of freedom are determined through cross-validation. Below is the code.

fit1<-smooth.spline(Clothing$inv2,Clothing$tsales,df=57)
fit2<-smooth.spline(Clothing$inv2,Clothing$tsales,df=7)
fit3<-smooth.spline(Clothing$inv2,Clothing$tsales,cv=T)
## Warning in smooth.spline(Clothing$inv2, Clothing$tsales, cv = T): cross-
## validation with non-unique 'x' values seems doubtful
(data.frame(fit1$df,fit2$df,fit3$df))
##   fit1.df  fit2.df  fit3.df
## 1      57 7.000957 2.791762

In the code above we used the “smooth.spline” function which comes with base r.Notice that we did not use the same coding syntax as the “lm” function calls for. The code above also indicates the degrees of freedom for each model.  You can see that for “fit3” the cross-validation determine that 2.79 was the most appropriate degrees of freedom. In addition, if you type in the following code.

sapply(data.frame(fit1$x,fit2$x,fit3$x),length)
## fit1.x fit2.x fit3.x 
##     73     73     73

You will see that there are only 73 data points in each model. The “Clothing” dataset has 400 examples in it. The reason for this reduction is that the “smooth.spline” function only takes unique values from the original dataset. As such, though there are 400 examples in the dataset only 73 of them are unique.

Next, we plot our data and add regression lines

plot(Clothing$inv2,Clothing$tsales)
lines(fit1,col='red',lwd=3)
lines(fit2,col='green',lwd=3)
lines(fit3,col='blue',lwd=3)
legend('topright',lty=1,col=c('red','green','blue'),c("df = 57",'df=7','df=CV 2.8'))

1.png

You can see that as the degrees of freedom increase so does the flexibility in the line. The advantage of smoothing splines is to have a more flexible way to assess the characteristics of a dataset.

Polynomial Spline Regression in R

Normally, when least squares regression is used, you fit one line to the model. However, sometimes you may want enough flexibility that you fit different lines over different regions of your independent variable. This process of fitting different lines over different regions of X is known as Regression Splines.

How this works is that there are different coefficient values based on the regions of X. As the researcher, you can set the cutoff points for each region. The cutoff point is called a “knot.” The more knots you use the more flexible the model becomes because there are fewer data points with each range allowing for more variability.

We will now go through an example of polynomial regression splines. Remeber that polynomial means that we will have a curved line as we are using higher order polynomials. Our goal will be to predict total sales based on the amount of innovation a store employs. We will use the “Ecdat” package and the “Clothing” dataset. In addition, we will need the “splines” package. The code is as follows.

library(splines);library(Ecdat)
data(Clothing)

We will now fit our model. We must indicate the number and placement of the knots. This is commonly down at the 25th 50th and 75th percentile. Below is the code

fit<-lm(tsales~bs(inv2,knots = c(12000,60000,150000)),data = Clothing)

In the code above we used the traditional “lm” function to set the model. However, we also used the “bs” function which allows us to create our spline regression model. The argument “knots” was set to have three different values. Lastly, the dataset was indicated.

Remember that the default spline model in R is a third-degree polynomial. This is because it is hard for the eye to detect the discontinuity at the knots.

We now need X values that we can use for prediction purposes. In the code below we first find the range of the “inv2” variable. We then create a grid that includes all the possible values of “inv2” in increments of 1. Lastly, we use the “predict” function to develop the prediction model. We set the “se” argument to true as we will need this information. The code is below.

inv2lims<-range(Clothing$inv2)
inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])
pred<-predict(fit,newdata=list(inv2=inv2.grid),se=T)

We are now ready to plot our model. The code below graphs the model and includes the regression line (red), confidence interval (green), as well as the location of each knot (blue)

plot(Clothing$inv2,Clothing$tsales,main="Regression Spline Plot")
lines(inv2.grid,pred$fit,col='red',lwd=3)
lines(inv2.grid,pred$fit+2*pred$se.fit,lty="dashed",lwd=2,col='green')
lines(inv2.grid,pred$fit-2*pred$se.fit,lty="dashed",lwd=2,col='green')
segments(12000,0,x1=12000,y1=5000000,col='blue' )
segments(60000,0,x1=60000,y1=5000000,col='blue' )
segments(150000,0,x1=150000,y1=5000000,col='blue' )

1.png

When this model was created it was essentially three models connected. Model on goes from the first blue line to the second. Model 2 goes form the second blue line to the third and model three was from the third blue line until the end. This kind of flexibility is valuable in understanding  nonlinear relationship

Logistic Polynomial Regression in R

Polynomial regression is used when you want to develop a regression model that is not linear. It is common to use this method when performing traditional least squares regression. However, it is also possible to use polynomial regression when the dependent variable is categorical. As such, in this post, we will go through an example of logistic polynomial regression.

Specifically, we will use the “Clothing” dataset from the “Ecdat” package. We will divide the “tsales” dependent variable into two categories to run the analysis. Below is the code to get started.

library(Ecdat)
data(Clothing)

There is little preparation for this example. Below is the code for the model

fitglm<-glm(I(tsales>900000)~poly(inv2,4),data=Clothing,family = binomial)

Here is what we did

1. We created an object called “fitglm” to save our results
2. We used the “glm” function to process the model
3. We used the “I” function. This told R to process the information inside the parentheses as is. As such, we did not have to make a new variable in which we split the “tsales” variable. Simply, if sales were greater than 900000 it was code 1 and 0 if less than this amount.
4. Next, we set the information for the independent variable. We used the “poly” function. Inside this function, we placed the “inv2” variable and the highest order polynomial we want to explore.
5. We set the data to “Clothing”
6. Lastly, we set the “family” argument to “binomial” which is needed for logistic regression

Below is the results

summary(fitglm)
## 
## Call:
## glm(formula = I(tsales > 9e+05) ~ poly(inv2, 4), family = binomial, 
##     data = Clothing)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5025  -0.8778  -0.8458   1.4534   1.5681  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       3.074      2.685   1.145   0.2523  
## poly(inv2, 4)1  641.710    459.327   1.397   0.1624  
## poly(inv2, 4)2  585.975    421.723   1.389   0.1647  
## poly(inv2, 4)3  259.700    178.081   1.458   0.1448  
## poly(inv2, 4)4   73.425     44.206   1.661   0.0967 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 521.57  on 399  degrees of freedom
## Residual deviance: 493.51  on 395  degrees of freedom
## AIC: 503.51
## 
## Number of Fisher Scoring iterations: 13

It appears that only the 4th-degree polynomial is significant and barely at that. We will now find the range of our independent variable “inv2” and make a grid from this information. Doing this will allow us to run our model using the full range of possible values for our independent variable.

inv2lims<-range(Clothing$inv2)
inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])

The “inv2lims” object has two values. The lowest value in “inv2” and the highest value. These values serve as the highest and lowest values in our “inv2.grid” object. This means that we have values started at 350 and going to 400000 by 1 in a grid to be used as values for “inv2” in our prediction model. Below is our prediction model.

predsglm<-predict(fitglm,newdata=list(inv2=inv2.grid),se=T,type="response")

Next, we need to calculate the probabilities that a given value of “inv2” predicts a store has “tsales” greater than 900000. The equation is as follows.

pfit<-exp(predsglm$fit)/(1+exp(predsglm$fit))

Graphing this leads to interesting insights. Below is the code

plot(pfit)

1

You can see the curves in the line from the polynomial expression. As it appears. As inv2 increase the probability increase until the values fall between 125000 and 200000. This is interesting, to say the least.

We now need to plot the actual model. First, we need to calculate the confidence intervals. This is done with the code below.

se.bandsglm.logit<-cbind(predsglm$fit+2*predsglm$se.fit,predsglm$fit-2*predsglm$se.fit)
se.bandsglm<-exp(se.bandsglm.logit)/(1+exp(se.bandsglm.logit))

The ’se.bandsglm” object contains the log odds of each example and the “se.bandsglm” has the probabilities. Now we plot the results

plot(Clothing$inv2,I(Clothing$tsales>900000),xlim=inv2lims,type='n')
points(jitter(Clothing$inv2),I((Clothing$tsales>900000)),cex=2,pch='|',col='darkgrey')
lines(inv2.grid,pfit,lwd=4)
matlines(inv2.grid,se.bandsglm,col="green",lty=6,lwd=6)

1.pngIn the code above we did the following.
1. We plotted our dependent and independent variables. However, we set the argument “type” to n which means nothing. This was done so we can add the information step-by-step.
2. We added the points. This was done using the “points” function. The “jitter” function just helps to spread the information out. The other arguments (cex, pch, col) our for aesthetics and our optional.
3. We add our logistic polynomial line based on our independent variable grid and the “pfit” object which has all of the predicted probabilities.
4. Last, we add the confidence intervals using the “matlines” function. Which includes the grid object as well as the “se.bandsglm” information.

You can see that these results are similar to when we only graphed the “pfit” information. However, we also add the confidence intervals. You can see the same dip around 125000-200000 were there is also a larger confidence interval. if you look at the plot you can see that there are fewer data points in this range which may be what is making the intervals wider.

Conclusion

Logistic polynomial regression allows the regression line to have more curves to it if it is necessary. This is useful for fitting data that is non-linear in nature.

Teaching HandWriting to Young Children

Learning to write takes a lifetime. Any author will share with you how they have matured and grown over time in the craft of writing. However, there are some basic fundamentals that need to be mastered before the process of growing as a writer can begin.

This post will provide an approach to teaching writing to young children that includes the following steps.

  1. Learning to write the letters
  2. Learning to write sentences
  3. Learning to write paragraphs

Learning the Letters

The first step in this process is learning to write letters. The challenge is normally developing the fine motor skills for creating letters. If you have ever seen the writing of a 5-year-old you have some idea of what I am talking about.

It is difficult for children to actually write letters.  Normally this is taught through having the students trace the letters on a piece of paper. This drill and kill style eventual works as the child masters the art of tracing. An analogy would be the use of training wheels on a bicycle.

Generally, straight lines are easier to write than curves. As such, easy letters to learn first are t, i, and l. Curves with straight lines are often easier than slanted lines so the next stage of letters might include b, d, f, h, j, p, r, u, and y. Lastly, slanted lines and full circle letters are the hardest in my experience. As such, a, c, e, g, k, m, n, o, s, v, w, x, and z are the last to learn.

Learning to Write Sentences

It is discouraging to have the child learn the entire alphabet before writing something. It’s better to learn a few letters and begin making sentences immediately. This heightens relevance and it is motivating to the child to be able to read their own writing. For now, the sentences do not really need to make sense. Just have them write using a handful of letters with support.

Simple three-word sentences are enough at this moment. Many worksheets will provide blanks lines with space at the top for drawing and coloring which provides a visual of the sentence.

It is critical to provide support for the development of the sentence. You have to help the child develop the thought that they want to put on paper. This is difficult for many children. You may also be taxed with proving spelling support. Although for now, I would not worry too much about spelling. Students need to create first and follow rules of creating later.

Writing Paragraphs

The typical child will probably not be able to write paragraphs until the 3rd or 4th grade at the earliest. paragraph writing takes an extensive amount of planning for a small child as they now must have a beginning, middle, and end or a main idea with supporting details.

At this stage, the best way to learn to write is to read a lot. This provides a structure and vocabulary on which the child can develop their own ideas in writing. In addition, rules of writing can be taught such as grammar and other components of language.

Conclusion

Writing can be an enjoyable experience if children are guided initially in learning this craft. Over time, a child can provide many insightful ideas and comments through developing the ability to communicate through the use of text.

Polynomial Regression in R

Polynomial regression is one of the easiest ways to fit a non-linear line to a data set. This is done through the use of higher order polynomials such as cubic, quadratic, etc to one or more predictor variables in a model.

Generally, polynomial regression is used for one predictor and one outcome variable. When there are several predictor variables it is more common to use generalized additive modeling/ In this post, we will use the “Clothing” dataset from the “Ecdat” package to predict total sales with the use of polynomial regression. Below is some initial code.

library(Ecdat)
data(Clothing)
str(Clothing)
## 'data.frame':    400 obs. of  13 variables:
##  $ tsales : int  750000 1926395 1250000 694227 750000 400000 1300000 495340 1200000 495340 ...
##  $ sales  : num  4412 4281 4167 2670 15000 ...
##  $ margin : num  41 39 40 40 44 41 39 28 41 37 ...
##  $ nown   : num  1 2 1 1 2 ...
##  $ nfull  : num  1 2 2 1 1.96 ...
##  $ npart  : num  1 3 2.22 1.28 1.28 ...
##  $ naux   : num  1.54 1.54 1.41 1.37 1.37 ...
##  $ hoursw : int  76 192 114 100 104 72 161 80 158 87 ...
##  $ hourspw: num  16.8 22.5 17.2 21.5 15.7 ...
##  $ inv1   : num  17167 17167 292857 22207 22207 ...
##  $ inv2   : num  27177 27177 71571 15000 10000 ...
##  $ ssize  : int  170 450 300 260 50 90 400 100 450 75 ...
##  $ start  : num  41 39 40 40 44 41 39 28 41 37 ...

We are going to use the “inv2” variable as our predictor. This variable measures the investment in automation by a particular store. We will now run our polynomial regression model.

fit<-lm(tsales~poly(inv2,5),data = Clothing)
summary(fit)
## 
## Call:
## lm(formula = tsales ~ poly(inv2, 5), data = Clothing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -946668 -336447  -96763  184927 3599267 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      833584      28489  29.259  < 2e-16 ***
## poly(inv2, 5)1  2391309     569789   4.197 3.35e-05 ***
## poly(inv2, 5)2  -665063     569789  -1.167   0.2438    
## poly(inv2, 5)3    49793     569789   0.087   0.9304    
## poly(inv2, 5)4  1279190     569789   2.245   0.0253 *  
## poly(inv2, 5)5  -341189     569789  -0.599   0.5497    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 569800 on 394 degrees of freedom
## Multiple R-squared:  0.05828,    Adjusted R-squared:  0.04633 
## F-statistic: 4.876 on 5 and 394 DF,  p-value: 0.0002428

The code above should be mostly familiar. We use the “lm” function as normal for regression. However, we then used the “poly” function on the “inv2” variable. What this does is runs our model 5 times (5 is the number next to “inv2”). Each time a different polynomial is used from 1 (no polynomial) to 5 (5th order polynomial). The results indicate that the 4th-degree polynomial is significant.

We now will prepare a visual of the results but first, there are several things we need to prepare. First, we want to find what the range of our predictor variable “inv2” is and we will save this information in a grade. The code is below.

inv2lims<-range(Clothing$inv2)

Second, we need to create a grid that has all the possible values of “inv2” from the lowest to the highest broken up in intervals of one. Below is the code.

inv2.grid<-seq(from=inv2lims[1],to=inv2lims[2])

We now have a dataset with almost 400000 data points in the “inv2.grid” object through this approach. We will now use these values to predict “tsales.” We also want the standard errors so we se “se” to TRUE

preds<-predict(fit,newdata=list(inv2=inv2.grid),se=TRUE)

We now need to find the confidence interval for our regression line. This is done by making a dataframe that takes the predicted fit adds or subtracts 2 and multiples this number by the standard error as shown below.

se.bands<-cbind(preds$fit+2*preds$se.fit,preds$fit-2*preds$se.fit)

With these steps completed, we are ready to create our civilization.

To make our visual, we use the “plot” function on the predictor and outcome. Doing this gives us a plot without a regression line. We then use the “lines” function to add the polynomial regression line, however, this line is based on the “inv2.grid” object (40,000 observations) and our predictions. Lastly, we use the “matlines” function to add the confidence intervals we found and stored in the “se.bands” object.

plot(Clothing$inv2,Clothing$tsales)
lines(inv2.grid,preds$fit,lwd=4,col='blue')
matlines(inv2.grid,se.bands,lwd = 4,col = "yellow",lty=4)

1.png

Conclusion

You can clearly see the curvature of the line. Which helped to improve model fit. Now any of you can tell that we are fitting this line to mostly outliers. This is one reason we the standard error gets wider and wider it is because there are fewer and fewer observations on which to base it. However, for demonstration purposes, this is a clear example of the power of polynomial regression.

Partial Least Squares Regression in R

Partial least squares regression is a form of regression that involves the development of components of the original variables in a supervised way. What this means is that the dependent variable is used to help create the new components form the original variables. This means that when pls is used the linear combination of the new features helps to explain both the independent and dependent variables in the model.

In this post, we will use predict “income” in the “Mroz” dataset using pls. Below is some initial code.

library(pls);library(Ecdat)
data("Mroz")
str(Mroz)
## 'data.frame':    753 obs. of  18 variables:
##  $ work      : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...
##  $ hoursw    : int  1610 1656 1980 456 1568 2032 1440 1020 1458 1600 ...
##  $ child6    : int  1 0 1 0 1 0 0 0 0 0 ...
##  $ child618  : int  0 2 3 3 2 0 2 0 2 2 ...
##  $ agew      : int  32 30 35 34 31 54 37 54 48 39 ...
##  $ educw     : int  12 12 12 12 14 12 16 12 12 12 ...
##  $ hearnw    : num  3.35 1.39 4.55 1.1 4.59 ...
##  $ wagew     : num  2.65 2.65 4.04 3.25 3.6 4.7 5.95 9.98 0 4.15 ...
##  $ hoursh    : int  2708 2310 3072 1920 2000 1040 2670 4120 1995 2100 ...
##  $ ageh      : int  34 30 40 53 32 57 37 53 52 43 ...
##  $ educh     : int  12 9 12 10 12 11 12 8 4 12 ...
##  $ wageh     : num  4.03 8.44 3.58 3.54 10 ...
##  $ income    : int  16310 21800 21040 7300 27300 19495 21152 18900 20405 20425 ...
##  $ educwm    : int  12 7 12 7 12 14 14 3 7 7 ...
##  $ educwf    : int  7 7 7 7 14 7 7 3 7 7 ...
##  $ unemprate : num  5 11 5 5 9.5 7.5 5 5 3 5 ...
##  $ city      : Factor w/ 2 levels "no","yes": 1 2 1 1 2 2 1 1 1 1 ...
##  $ experience: int  14 5 15 6 7 33 11 35 24 21 ...

First, we must prepare our data by dividing it into a training and test set. We will do this by doing a 50/50 split of the data.

set.seed(777)
train<-sample(c(T,F),nrow(Mroz),rep=T) #50/50 train/test split
test<-(!train)

In the code above we set the “set.seed function in order to assure reduplication. Then we created the “train” object and used the “sample” function to make a vector with ‘T’ and ‘F’ based on the number of rows in “Mroz”. Lastly, we created the “test”” object base don everything that is not in the “train” object as that is what the exclamation point is for.

Now we create our model using the “plsr” function from the “pls” package and we will examine the results using the “summary” function. We will also scale the data since this the scale affects the development of the components and use cross-validation. Below is the code.

set.seed(777)
pls.fit<-plsr(income~.,data=Mroz,subset=train,scale=T,validation="CV")
summary(pls.fit)
## Data:    X dimension: 392 17 
##  Y dimension: 392 1
## Fit method: kernelpls
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           11218     8121     6701     6127     5952     5886     5857
## adjCV        11218     8114     6683     6108     5941     5872     5842
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        5853     5849     5854      5853      5853      5852      5852
## adjCV     5837     5833     5837      5836      5836      5835      5835
##        14 comps  15 comps  16 comps  17 comps
## CV         5852      5852      5852      5852
## adjCV      5835      5835      5835      5835
## 
## TRAINING: % variance explained
##         1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X         17.04    26.64    37.18    49.16    59.63    64.63    69.13
## income    49.26    66.63    72.75    74.16    74.87    75.25    75.44
##         8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X         72.82    76.06     78.59     81.79     85.52     89.55     92.14
## income    75.49    75.51     75.51     75.52     75.52     75.52     75.52
##         15 comps  16 comps  17 comps
## X          94.88     97.62    100.00
## income     75.52     75.52     75.52

The printout includes the root mean squared error for each of the components in the VALIDATION section as well as the variance explained in the TRAINING section. There are 17 components because there are 17 independent variables. You can see that after component 3 or 4 there is little improvement in the variance explained in the dependent variable. Below is the code for the plot of these results. It requires the use of the “validationplot” function with the “val.type” argument set to “MSEP” Below is the code

validationplot(pls.fit,val.type = "MSEP")

1.png

We will do the predictions with our model. We use the “predict” function, use our “Mroz” dataset but only those index in the “test” vector and set the components to three based on our previous plot. Below is the code.

set.seed(777)
pls.pred<-predict(pls.fit,Mroz[test,],ncomp=3)

After this, we will calculate the mean squared error. This is done by subtracting the results of our predicted model from the dependent variable of the test set. We then square this information and calculate the mean. Below is the code

mean((pls.pred-Mroz$income[test])^2)
## [1] 63386682

As you know, this information is only useful when compared to something else. Therefore, we will run the data with a tradition least squares regression model and compare the results.

set.seed(777)
lm.fit<-lm(income~.,data=Mroz,subset=train)
lm.pred<-predict(lm.fit,Mroz[test,])
mean((lm.pred-Mroz$income[test])^2)
## [1] 59432814

The least squares model is slightly better then our partial least squares model but if we look at the model we see several variables that are not significant. We will remove these see what the results are

summary(lm.fit)
## 
## Call:
## lm(formula = income ~ ., data = Mroz, subset = train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -20131  -2923  -1065   1670  36246 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.946e+04  3.224e+03  -6.036 3.81e-09 ***
## workno      -4.823e+03  1.037e+03  -4.651 4.59e-06 ***
## hoursw       4.255e+00  5.517e-01   7.712 1.14e-13 ***
## child6      -6.313e+02  6.694e+02  -0.943 0.346258    
## child618     4.847e+02  2.362e+02   2.052 0.040841 *  
## agew         2.782e+02  8.124e+01   3.424 0.000686 ***
## educw        1.268e+02  1.889e+02   0.671 0.502513    
## hearnw       6.401e+02  1.420e+02   4.507 8.79e-06 ***
## wagew        1.945e+02  1.818e+02   1.070 0.285187    
## hoursh       6.030e+00  5.342e-01  11.288  < 2e-16 ***
## ageh        -9.433e+01  7.720e+01  -1.222 0.222488    
## educh        1.784e+02  1.369e+02   1.303 0.193437    
## wageh        2.202e+03  8.714e+01  25.264  < 2e-16 ***
## educwm      -4.394e+01  1.128e+02  -0.390 0.697024    
## educwf       1.392e+02  1.053e+02   1.322 0.186873    
## unemprate   -1.657e+02  9.780e+01  -1.694 0.091055 .  
## cityyes     -3.475e+02  6.686e+02  -0.520 0.603496    
## experience  -1.229e+02  4.490e+01  -2.737 0.006488 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5668 on 374 degrees of freedom
## Multiple R-squared:  0.7552, Adjusted R-squared:  0.744 
## F-statistic: 67.85 on 17 and 374 DF,  p-value: < 2.2e-16
set.seed(777)
lm.fit<-lm(income~work+hoursw+child618+agew+hearnw+hoursh+wageh+experience,data=Mroz,subset=train)
lm.pred<-predict(lm.fit,Mroz[test,])
mean((lm.pred-Mroz$income[test])^2)
## [1] 57839715

As you can see the error decreased even more which indicates that the least squares regression model is superior to the partial least squares model. In addition, the partial least squares model is much more difficult to explain because of the use of components. As such, the least squares model is the favored one.

Story Grammar Components

When people tell a story, whether orally or in a movie, there are certain characteristics that seem to appears in stories as determined by culture which children attempt to imitate when they tell a story. These traits are called story grammar components and include the following

  • Setting statement
  • Initiating event
  • Internal response
  • Internal plan
  • Attempt
  • Direct Consequence
  • Reaction

This post will explore each of these characteristics of a story.

Setting Statement

The setting statement introduces the character of the story and often identifies who the “good guy” and “bad guy” are. Many movies do this from Transformers to any X-men movie. In the first 10-15 minutes, the characters are introduced and the background is explained. For example, in the classic story “The Three Little Pig” the story begins by telling you there was a wolf and three pigs.

Initiating Event

The initiating event is the catalyst to get the characters to do something. For example, in the “Three Little Pigs” the pigs need shelter. In other words, the initiating event introduces the problem that the characters need to overcome during the story.

Internal Response

The internal response is the characters reaction to the initiating event. The response can talk many forms such as emotional. For example, the pigs get excited when they see they need shelter. Generally, the internal response provides motivation to do something.

Internal Plan

The internal plan is what the characters will do to overcome the initiating event problem. For the pigs, the plan was to each build a house to prepare for the wolf.

Attempt

The attempt is the action that helps the characters to reach their goal. This is the step in which the internal plan is put into action. Therefore, for the pigs, it is the actual construction of their houses.

Direct Consequence

At this step, the story indicates if the attempt was successful or not. For the pigs, this is where things are complicated. Of the three pigs, two were unsuccessful and only one was successful. Success is determined by who is the protagonist and the antagonist. As such, if the wolf is the protagonist the success would be two and the failure one.

Reaction

The reaction is the character’s response to the direct consequence. For the two unsuccessful pigs, there was no reaction because they were eaten by the wolf. However, for the last pig, he was able to live safely after his home protected him.

Conclusion

Even small children will have several of these components in their storytelling. However, it is important to remember that the components are not required in a story nor do they have to follow the order specified here. Instead,  this is a broad generalize way of how people communicate through storytelling.

Principal Component Regression in R

This post will explain and provide an example of principal component regression (PCR). Principal component regression involves having the model construct components from the independent variables that are a linear combination of the independent variables. This is similar to principal component analysis but the components are designed in a way to best explain the dependent variable. Doing this often allows you to use fewer variables in your model and usually improves the fit of your model as well.

Since PCR is based on principal component analysis it is an unsupervised method, which means the dependent variable has no influence on the development of the components. As such, there are times when the components that are developed may not be beneficial for explaining the dependent variable.

Our example will use the “Mroz” dataset from the “Ecdat” package. Our goal will be to predict “income” based on the variables in the dataset. Below is the initial code

library(pls);library(Ecdat)
data(Mroz)
str(Mroz)
## 'data.frame':    753 obs. of  18 variables:
##  $ work      : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...
##  $ hoursw    : int  1610 1656 1980 456 1568 2032 1440 1020 1458 1600 ...
##  $ child6    : int  1 0 1 0 1 0 0 0 0 0 ...
##  $ child618  : int  0 2 3 3 2 0 2 0 2 2 ...
##  $ agew      : int  32 30 35 34 31 54 37 54 48 39 ...
##  $ educw     : int  12 12 12 12 14 12 16 12 12 12 ...
##  $ hearnw    : num  3.35 1.39 4.55 1.1 4.59 ...
##  $ wagew     : num  2.65 2.65 4.04 3.25 3.6 4.7 5.95 9.98 0 4.15 ...
##  $ hoursh    : int  2708 2310 3072 1920 2000 1040 2670 4120 1995 2100 ...
##  $ ageh      : int  34 30 40 53 32 57 37 53 52 43 ...
##  $ educh     : int  12 9 12 10 12 11 12 8 4 12 ...
##  $ wageh     : num  4.03 8.44 3.58 3.54 10 ...
##  $ income    : int  16310 21800 21040 7300 27300 19495 21152 18900 20405 20425 ...
##  $ educwm    : int  12 7 12 7 12 14 14 3 7 7 ...
##  $ educwf    : int  7 7 7 7 14 7 7 3 7 7 ...
##  $ unemprate : num  5 11 5 5 9.5 7.5 5 5 3 5 ...
##  $ city      : Factor w/ 2 levels "no","yes": 1 2 1 1 2 2 1 1 1 1 ...
##  $ experience: int  14 5 15 6 7 33 11 35 24 21 ...

Our first step is to divide our dataset into a train and test set. We will do a simple 50/50 split for this demonstration.

train<-sample(c(T,F),nrow(Mroz),rep=T) #50/50 train/test split
test<-(!train)

In the code above we use the “sample” function to create a “train” index based on the number of rows in the “Mroz” dataset. Basically, R is making a vector that randomly assigns different rows in the “Mroz” dataset to be marked as True or False. Next, we use the “train” vector and we assign everything or every number that is not in the “train” vector to the test vector by using the exclamation mark.

We are now ready to develop our model. Below is the code

set.seed(777)
pcr.fit<-pcr(income~.,data=Mroz,subset=train,scale=T,validation="CV")

To make our model we use the “pcr” function from the “pls” package. The “subset” argument tells r to use the “train” vector to select examples from the “Mroz” dataset. The “scale” argument makes sure everything is measured the same way. This is important when using a component analysis tool as variables with different scale have a different influence on the components. Lastly, the “validation” argument enables cross-validation. This will help us to determine the number of components to use for prediction. Below is the results of the model using the “summary” function.

summary(pcr.fit)
## Data:    X dimension: 381 17 
##  Y dimension: 381 1
## Fit method: svdpc
## Number of components considered: 17
## 
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
##        (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
## CV           12102    11533    11017     9863     9884     9524     9563
## adjCV        12102    11534    11011     9855     9878     9502     9596
##        7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
## CV        9149     9133     8811      8527      7265      7234      7120
## adjCV     9126     9123     8798      8877      7199      7172      7100
##        14 comps  15 comps  16 comps  17 comps
## CV         7118      7141      6972      6992
## adjCV      7100      7123      6951      6969
## 
## TRAINING: % variance explained
##         1 comps  2 comps  3 comps  4 comps  5 comps  6 comps  7 comps
## X        21.359    38.71    51.99    59.67    65.66    71.20    76.28
## income    9.927    19.50    35.41    35.63    41.28    41.28    46.75
##         8 comps  9 comps  10 comps  11 comps  12 comps  13 comps  14 comps
## X         80.70    84.39     87.32     90.15     92.65     95.02     96.95
## income    47.08    50.98     51.73     68.17     68.29     68.31     68.34
##         15 comps  16 comps  17 comps
## X          98.47     99.38    100.00
## income     68.48     70.29     70.39

There is a lot of information here.The VALIDATION: RMSEP section gives you the root mean squared error of the model broken down by component. The TRAINING section is similar the printout of any PCA but it shows the amount of cumulative variance of the components, as well as the variance, explained for the dependent variable “income.” In this model, we are able to explain up to 70% of the variance if we use all 17 components.

We can graph the MSE using the “validationplot” function with the argument “val.type” set to “MSEP”. The code is below.

validationplot(pcr.fit,val.type = "MSEP")

1

How many components to pick is subjective, however, there is almost no improvement beyond 13 so we will use 13 components in our prediction model and we will calculate the means squared error.

set.seed(777)
pcr.pred<-predict(pcr.fit,Mroz[test,],ncomp=13)
mean((pcr.pred-Mroz$income[test])^2)
## [1] 48958982

MSE is what you would use to compare this model to other models that you developed. Below is the performance of a least squares regression model

set.seed(777)
lm.fit<-lm(income~.,data=Mroz,subset=train)
lm.pred<-predict(lm.fit,Mroz[test,])
mean((lm.pred-Mroz$income[test])^2)
## [1] 47794472

If you compare the MSE the least squares model performs slightly better than the PCR one. However, there are a lot of non-significant features in the model as shown below.

summary(lm.fit)
## 
## Call:
## lm(formula = income ~ ., data = Mroz, subset = train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -27646  -3337  -1387   1860  48371 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.215e+04  3.987e+03  -5.556 5.35e-08 ***
## workno      -3.828e+03  1.316e+03  -2.909  0.00385 ** 
## hoursw       3.955e+00  7.085e-01   5.582 4.65e-08 ***
## child6       5.370e+02  8.241e+02   0.652  0.51512    
## child618     4.250e+02  2.850e+02   1.491  0.13673    
## agew         1.962e+02  9.849e+01   1.992  0.04709 *  
## educw        1.097e+02  2.276e+02   0.482  0.63013    
## hearnw       9.835e+02  2.303e+02   4.270 2.50e-05 ***
## wagew        2.292e+02  2.423e+02   0.946  0.34484    
## hoursh       6.386e+00  6.144e-01  10.394  < 2e-16 ***
## ageh        -1.284e+01  9.762e+01  -0.132  0.89542    
## educh        1.460e+02  1.592e+02   0.917  0.35982    
## wageh        2.083e+03  9.930e+01  20.978  < 2e-16 ***
## educwm       1.354e+02  1.335e+02   1.014  0.31115    
## educwf       1.653e+02  1.257e+02   1.315  0.18920    
## unemprate   -1.213e+02  1.148e+02  -1.057  0.29140    
## cityyes     -2.064e+02  7.905e+02  -0.261  0.79421    
## experience  -1.165e+02  5.393e+01  -2.159  0.03147 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6729 on 363 degrees of freedom
## Multiple R-squared:  0.7039, Adjusted R-squared:   0.69 
## F-statistic: 50.76 on 17 and 363 DF,  p-value: < 2.2e-16

Removing these and the MSE is almost the same for the PCR and least square models

set.seed(777)
lm.fit2<-lm(income~work+hoursw+hearnw+hoursh+wageh,data=Mroz,subset=train)
lm.pred2<-predict(lm.fit2,Mroz[test,])
mean((lm.pred2-Mroz$income[test])^2)
## [1] 47968996

Conclusion

Since the least squares model is simpler it is probably the superior model. PCR is strongest when there are a lot of variables involve and if there are issues with multicollinearity.

Accommodation Theory

Accommodation theory attempts to explain how people adjust the way they talk depending on who the audience is. Generally, there are two ways in which a person can adjust their speech. The two ways are convergence and divergence. In this post, we will look at these two ways of accommodating.

Speech Convergence

Converging is when you change the way you talk about sound more like the person you are talking to. This is seen as polite in many cultures and signals that you are accepting the person who is talking.

There are many different ways in which convergence can take place. The speaker may begin to use similar vocabulary. Another way is to imitate the pronunciation of the person you are talking to. Another common way os to translate technical jargon into simpler English.

Speech Divergence

Speech divergence is often seen as the opposite of speech convergence. Speech divergence is deliberately selecting a style of language different from the speaker. This often communicates dissatisfaction with the person you are speaking with. For example, most teenagers deliberately speak differently from their parents. This serves a role in their identifying with peers and to distances from their parents.

However, a slight divergence is expected of non-native speakers. Many people enjoy the accents of athletes and actresses. To have perfect control of two languages is at times seen negatively in some parts of the world.

A famous example of speech divergence is the speaking of former Federal Reserve Chairman Alan Greenspan and his ‘Fedspeak.’ Fedspeak was used whenever Greenspan appears before Congress or made announcements about changing the Federal Reserve interest rate. The goal of this form of communication was to sound as divergent and incoherent as possible below is an example.

The members of the Board of Governors and the Reserve Bank presidents foresee an implicit strengthening of activity after the current rebalancing is over, although the central tendency of their individual forecasts for real GDP still shows a substantial slowdown, on balance, for the year as a whole.

Make little sense unless you have an MBA in finance. It sounds like he sees no change in the growth of the economy

The reason behind this mysterious form of communication was that people place a strong emphasis on whatever the Federal Reserve and Alan Greenspan said. This led to swings in the stock market. To prevent this,  Greenspan diverged his language to make it as confusing as possible to avoid massive changes in the stock market

Conclusion 

When communicating we can choose to adapt ourselves are deliberately diverge. Which choice we choose depends a great deal on the context that we find ourselves end

Ways Language Change Spreads

All languages change if there is any doubt just pick up a book that is over 100 years old and regardless of the language it will at a minimum sound slightly different from current language use or radically different.

In this post, we will look at three common ways in which language change is spread. These three ways are from…

  • From one group to another
  • From one style to another
  • Lexical diffusion

Changes from Group to Group

This view of language change is that changes in a language move from one group. A group can be any sort of social or work circle. Examples can include family, colleagues, church affiliation, etc.

The change of language in a group is often facilitated by “gatekeepers.” Gatekeepers are people who are members of different groups. Most people are members of many different groups at the same time.

What happens is that a person picks up language in one group and shares this style of communication in another.  An example would be a child learning slang at school and using it at home. Naturally, language change moves at different speeds in different groups depending on the acceptability of the change.

Changes from Style to Style

A style is a way of communicating. It simple terms a person’s style can be formal and informal with varying shades of grey in-between. These two extremes can also be viewed as prestigious vs not prestigious.

Normally, formal/prestigious styles of language move down into informal styles of language. For example, a movie star or some other celebrity speaks a certain way and thus style is transferred downward among those who are not so famous.

There are times where informal and un-prestigious language change spreads upward. Normally, this is much slower than change moving downward. In addition, it also involves words and styles that are so old that what really happened was the young people who used these words become part of the “establishment” in their middle age and continued to use the style. For example, the word “cool” used to be slang but is commonly used among some of the most elite leaders of the world now. Therefore, it wasn’t the language that changed as much as the people who used it with the passing of one generation to another.

Lexical Diffusion

Lexical diffusion is the change of how a word is pronounced. This is often an exceedingly slow process and can take centuries. The English language is full of words that have strange pronunciations when considering the spelling. This is due to English being thoroughly influenced by other languages such as French and Latin.

Conclusion

These three theories are just some of the ways langauge change can spread. In addition, it may not be practical to thinnk of them each happening independently from the others. Rather, these three theories can often be seen as working at the same time to slowly change a language over time.

Example of Best Subset Regression in R

This post will provide an example of best subset regression. This is a topic that has been covered before in this blog. However, in the current post, we will approach this using a slightly different coding and a different dataset. We will be using the “HI” dataset from the “Ecdat” package. Our goal will be to predict the number of hours a women works based on the other variables in the dataset. Below is some initial code.

library(leaps);library(Ecdat)
data(HI)
str(HI)
## 'data.frame':    22272 obs. of  13 variables:
##  $ whrswk    : int  0 50 40 40 0 40 40 25 45 30 ...
##  $ hhi       : Factor w/ 2 levels "no","yes": 1 1 2 1 2 2 2 1 1 1 ...
##  $ whi       : Factor w/ 2 levels "no","yes": 1 2 1 2 1 2 1 1 2 1 ...
##  $ hhi2      : Factor w/ 2 levels "no","yes": 1 1 2 2 2 2 2 1 1 2 ...
##  $ education : Ord.factor w/ 6 levels "<9years"<"9-11years"<..: 4 4 3 4 2 3 5 3 5 4 ...
##  $ race      : Factor w/ 3 levels "white","black",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ hispanic  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ experience: num  13 24 43 17 44.5 32 14 1 4 7 ...
##  $ kidslt6   : int  2 0 0 0 0 0 0 1 0 1 ...
##  $ kids618   : int  1 1 0 1 0 0 0 0 0 0 ...
##  $ husby     : num  12 1.2 31.3 9 0 ...
##  $ region    : Factor w/ 4 levels "other","northcentral",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ wght      : int  214986 210119 219955 210317 219955 208148 213615 181960 214874 214874 ...

To develop a model we use the “regsubset” function from the “leap” package. Most of the coding is the same as linear regression. The only difference is the “nvmax” argument which is set to 13. The default setting for “nvmax” is 8. This is good if you only have 8 variables. However, the results from the “str” function indicate that we have 13 functions. Therefore, we need to set the “nvmax” argument to 13 instead of the default value of 8 in order to be sure to include all variables. Below is the code

regfit.full<-regsubsets(whrswk~.,HI, nvmax = 13)

We can look at the results with the “summary” function. For space reasons, the code is shown but the results will not be shown here.

summary(regfit.full)

If you run the code above in your computer you will 13 columns that are named after the variables created. A star in a column means that that variable is included in the model. To the left is the numbers 1-13 which. One means one variable in the model two means two variables in the model etc.

Our next step is to determine which of these models is the best. First, we need to decide what our criteria for inclusion will be. Below is a list of available fit indices.

names(summary(regfit.full))
## [1] "which"  "rsq"    "rss"    "adjr2"  "cp"     "bic"    "outmat" "obj"

For our purposes, we will use “rsq” (r-square) and “bic” “Bayesian Information Criteria.” In the code below we are going to save the values for these two fit indices in their own objects.

rsq<-summary(regfit.full)$rsq
bic<-summary(regfit.full)$bic

Now let’s plot them

plot(rsq,type='l',main="R-Square",xlab="Number of Variables")

1

plot(bic,type='l',main="BIC",xlab="Number of Variables")

1.png

You can see that for r-square the values increase and for BIC the values decrease. We will now make both of these plots again but we will have r tell the optimal number of variables when considering each model index. For we use the “which” function to determine the max r-square and the minimum BIC

which.max(rsq)
## [1] 13
which.min(bic)
## [1] 12

The model with the best r-square is the one with 13 variables. This makes sense as r-square always improves as you add variables. Since this is a demonstration we will not correct for this. For BIC the lowest values was for 12 variables. We will now plot this information and highlight the best model in the plot using the “points” function, which allows you to emphasis one point in a graph

plot(rsq,type='l',main="R-Square with Best Model Highlighted",xlab="Number of Variables")
points(13,(rsq[13]),col="blue",cex=7,pch=20)

1.png

plot(bic,type='l',main="BIC with Best Model Highlighted",xlab="Number of Variables")
points(12,(bic[12]),col="blue",cex=7,pch=20)

1.png

Since BIC calls for only 12 variables it is simpler than the r-square recommendation of 13. Therefore, we will fit our final model using the BIC recommendation of 12. Below is the code.

coef(regfit.full,12)
##        (Intercept)             hhiyes             whiyes 
##        30.31321796         1.16940604        18.25380263 
##        education.L        education^4        education^5 
##         6.63847641         1.54324869        -0.77783663 
##          raceblack        hispanicyes         experience 
##         3.06580207        -1.33731802        -0.41883100 
##            kidslt6            kids618              husby 
##        -6.02251640        -0.82955827        -0.02129349 
## regionnorthcentral 
##         0.94042820

So here is our final model. This is what we would use for our test set.

Conclusion

Best subset regression provides the researcher with insights into every possible model as well as clues as to which model is at least statistically superior. This knowledge can be used for developing models for data science applications.

Koines

There are many different ways that languages or a language can interact with each other. One way is how different dialects of a language interact. A koine is a dialect that is a blend of other dialects that have had direct contact with each other.

In this post, we will discuss the follow

  • Koine vs pidgins and creoles
  • Development of koines

Koine vs Pidgins and Creoles

Koine is a lesser known term to the general public in comparison to pidgin and creole. The word “koine” comes from the same Greek word that means “common.” In other words, koine is what two or more dialects have in common. For those familiar with biblical languages you would n=know that the term “Koine” Greek is the language of the New Testament, which means the New Testament was written in common Greek.

As mentioned in the introduction a koine is the interaction of two or more dialects to create new common dialect. In contrast, a pidgin is the interaction of two languages to make a new third language. A pidgin can eventually mature into a Creole which is a pidgin that is spoken as the native language of people. There seems to be no term for a koine that is spoken natively.

Developing a Koine

The process of developing a Koine is known as koineization. There are both linguistic and social factors that contribute to the development of a koine. For linguistic factors, they include leveling and simplification and for the social factor, it involves accommodation, prestige.

Levelling is the elimination from the koine distinct sounds from the colliding dialects. An example of this is the loss of the post-vocalic [r] in parts of England.

Simplification is the process of the simplest characteristics of the dialects being included in the new koine. The dialect with the fewer rules and exceptions almost always emerges as having more influence on the koine.

The social factor of accommodations means that people will copy something they believe is prestigious or “cool.” If a certain dialect is considered unacceptable it will not lead to a koine because of people’s dislike of it. As such accommodation and prestige are interrelated concepts.

Conclusion

Koine and koineization are two words that help to explain where dialects come from. As such, for those who have an interest in linguistics, these are terms to be familiar with

High Dimensionality Regression

There are times when least squares regression is not able to provide accurate predictions or explanation in an object. One example in which least scares regression struggles with a small sample size. By small, we mean when the total number of variables is greater than the sample size. Another term for this is high dimensions which means more variables than examples in the dataset

This post will explain the consequences of what happens when high dimensions is a problem and also how to address the problem.

Inaccurate measurements

One problem with high dimensions in regression is that the results for the various metrics are overfitted to the data. Below is an example using the “attitude” dataset. There are 2 variables and 3 examples for developing a model. This is not strictly high dimensions but it is an example of a small sample size.

data("attitude")
reg1 <- lm(complaints[1:3]~rating[1:3],data=attitude[1:3]) 
summary(reg1)
## 
## Call:
## lm(formula = complaints[1:3] ~ rating[1:3], data = attitude[1:3])
## 
## Residuals:
##       1       2       3 
##  0.1026 -0.3590  0.2564 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 21.95513    1.33598   16.43   0.0387 *
## rating[1:3]  0.67308    0.02221   30.31   0.0210 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4529 on 1 degrees of freedom
## Multiple R-squared:  0.9989, Adjusted R-squared:  0.9978 
## F-statistic: 918.7 on 1 and 1 DF,  p-value: 0.021

With only 3 data points the fit is perfect. You can also examine the mean squared error of the model. Below is a function for this followed by the results

mse <- function(sm){ 
        mean(sm$residuals^2)}
mse(reg1)
## [1] 0.06837607

Almost no error. Lastly, let’s look at a visual of the model

with(attitude[1:3],plot(complaints[1:3]~ rating[1:3]))
title(main = "Sample Size 3")
abline(lm(complaints[1:3]~rating[1:3],data = attitude))

1.png

You can see that the regression line goes almost perfectly through each data point. If we tried to use this model on the test set in a real data science problem there would be a huge amount of bias. Now we will rerun the analysis this time with the full sample.

reg2<- lm(complaints~rating,data=attitude) 
summary(reg2)
## 
## Call:
## lm(formula = complaints ~ rating, data = attitude)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3880  -6.4553  -0.2997   6.1462  13.3603 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   8.2445     7.6706   1.075    0.292    
## rating        0.9029     0.1167   7.737 1.99e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.65 on 28 degrees of freedom
## Multiple R-squared:  0.6813, Adjusted R-squared:  0.6699 
## F-statistic: 59.86 on 1 and 28 DF,  p-value: 1.988e-08

You can clearly see a huge reduction in the r-square from .99 to .68. Next, is the mean-square error

mse(reg2)
## [1] 54.61425

The error has increased a great deal. Lastly, we fit the regression line

with(attitude,plot(complaints~ rating))
title(main = "Full Sample Size")
abline(lm(complaints~rating,data = attitude))

1.png

Naturally, the second model is more likely to perform better with a test set. The problem is that least squares regression is too flexible when the number of features is greater than or equal to the number of examples in a dataset.

What to Do?

If least squares regression must be used. One solution to overcoming high dimensionality is to use some form of regularization regression such as ridge, lasso, or elastic net. Any of these regularization approaches will help to reduce the number of variables or dimensions in the final model through the use of shrinkage.

However, keep in mind that no matter what you do as the number of dimensions increases so does the r-square even if the variable is useless. This is known as the curse of dimensionality. Again, regularization can help with this.

Remember that with a large number of dimensions there are normally several equally acceptable models. To determine which is most useful depends on understanding the problem and context of the study.

Conclusion

With the ability to collect huge amounts of data has led to the growing problem of high dimensionality. One there are more features than examples it can lead to statistical errors. However, regularization is one tool for dealing with this problem.

Regression with Shrinkage Methods

One problem with least squares regression is determining what variables to keep in a model. One solution to this problem is the use of shrinkage methods. Shrinkage regression involves constraining or regularizing the coefficient estimates towards zero. The benefit of this is that it is an efficient way to either remove variables from a model or significantly reduce the influence of less important variables.

In this post, we will look at two common forms of regularization and these are.

Ridge

Ridge regression includes a tuning parameter called lambda that can be used to reduce to weak coefficients almost to zero. This shrinkage penalty helps with the bias-variance trade-off. Lambda can be set to any value from 0 to infinity. A lambda set to 0 is the same as least square regression while a lambda set to infinity will produce a null model. The technical term for lambda when ridge is used is the “l2 norm”

Finding the right value of lambda is the primary goal when using this algorithm,. Finding it involves running models with several values of lambda and seeing which returns the best results on predetermined metrics.

The primary problem with ridge regression is that it does not actually remove any variables from the model. As such, the prediction might be excellent but explanatory power is not improve if there are a large number of variables.

Lasso

Lasso regression has the same characteristics as Ridge with one exception. The one exception is the Lasso algorithm can actually remove variables by setting them to zero. This means that lasso regression models are usually superior in terms of the ability to interpret and explain them. The technical term for lambda when lasso is used is the “l1 norm.”

It is not clear when lasso or ridge is superior. Normally, if the goal is explanatory lasso is often stronger. However, if the goal is prediction, ridge may be an improvement but not always.

Conclusion

Shrinkage methods are not limited to regression. Many other forms of analysis can employ shrinkage such as artificial neural networks. Most machine learning models can accommodate shrinkage.

Generally, ridge and lasso regression is employed when you have a huge number of predictors as well as a larger dataset. The primary goal is the simplification of an overly complex model. Therefore, the shrinkage methods mentioned here are additional ways to use statistical models in regression.

Subset Selection Regression

There are many different ways in which the variables of a regression model can be selected. In this post, we look at several common ways in which to select variables or features for a regression model. In particular, we look at the following.

  • Best subset regression
  • Stepwise selection

Best Subset Regression

Best subset regression fits a regression model for every possible combination of variables. The “best” model can be selected based on such criteria as the adjusted r-square, BIC (Bayesian Information Criteria), etc.

The primary drawback to best subset regression is that it becomes impossible to compute the results when you have a large number of variables. Generally, when the number of variables exceeds 40 best subset regression becomes too difficult to calculate.

Stepwise Selection

Stepwise selection involves adding or taking away one variable at a time from a regression model. There are two forms of stepwise selection and they are forward and backward selection.

In forward selection, the computer starts with a null model ( a model that calculates the mean) and adds one variable at a time to the model. The variable chosen is the one the provides the best improvement to the model fit. This process reduces greatly the number of models that need to be fitted in comparison to best subset regression.

Backward selection starts the full model and removes one variable at a time based on which variable improves the model fit the most. The main problem with either forward or backward selection is that the best model may not always be selected in this process. In addition, backward selection must have a sample size that is larger than the number of variables.

Deciding Which to Choose

Best subset regression is perhaps most appropriate when you have a small number of variables to develop a model with, such as less than 40. When the number of variables grows forward or backward selection are appropriate. If the sample size is small forward selection may be a better choice. However, if the sample size is large as in the number of examples is greater than the number of variables it is now possible to use backward selection.

Conclusion

The examples here are some of the most basic ways to develop a regression model. However, these are not the only ways in which this can be done. What these examples provide is an introduction to regression model development. In addition, these models provide some sort of criteria for the addition or removal of a variable based on statistics rather than intuition.

Leave One Out Cross Validation in R

Leave one out cross validation. (LOOCV) is a variation of the validation approach in that instead of splitting the dataset in half, LOOCV uses one example as the validation set and all the rest as the training set. This helps to reduce bias and randomness in the results but unfortunately, can increase variance. Remember that the goal is always to reduce the error rate which is often calculated as the mean-squared error.

In this post, we will use the “Hedonic” dataset from the “Ecdat” package to assess several different models that predict the taxes of homes In order to do this, we will also need to use the “boot” package. Below is the code.

library(Ecdat);library(boot)
data(Hedonic)
str(Hedonic)
## 'data.frame':    506 obs. of  15 variables:
##  $ mv     : num  10.09 9.98 10.45 10.42 10.5 ...
##  $ crim   : num  0.00632 0.02731 0.0273 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 ...
##  $ chas   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ nox    : num  28.9 22 22 21 21 ...
##  $ rm     : num  43.2 41.2 51.6 49 51.1 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 ...
##  $ dis    : num  1.41 1.6 1.6 1.8 1.8 ...
##  $ rad    : num  0 0.693 0.693 1.099 1.099 ...
##  $ tax    : int  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 ...
##  $ blacks : num  0.397 0.397 0.393 0.395 0.397 ...
##  $ lstat  : num  -3 -2.39 -3.21 -3.53 -2.93 ...
##  $ townid : int  1 2 2 3 3 3 4 4 4 4 ...

First, we need to develop our basic least squares regression model. We will do this with the “glm” function. This is because the “cv.glm” function (more on this later) only works when models are developed with the “glm” function. Below is the code.

tax.glm<-glm(tax ~ mv+crim+zn+indus+chas+nox+rm+age+dis+rad+ptratio+blacks+lstat, data = Hedonic)

We now need to calculate the MSE. To do this we will use the “cv.glm” function. Below is the code.

cv.error<-cv.glm(Hedonic,tax.glm)
cv.error$delta
## [1] 4536.345 4536.075

cv.error$delta contains two numbers. The first is the MSE for the training set and the second is the error for the LOOCV. As you can see the numbers are almost identical.

We will now repeat this process but with the inclusion of different polynomial models. The code for this is a little more complicated and is below.

cv.error=rep(0,5)
for (i in 1:5){
        tax.loocv<-glm(tax ~ mv+poly(crim,i)+zn+indus+chas+nox+rm+poly(age,i)+dis+rad+ptratio+blacks+lstat, data = Hedonic)
        cv.error[i]=cv.glm(Hedonic,tax.loocv)$delta[1]
}
cv.error
## [1] 4536.345 4515.464 4710.878 7047.097 9814.748

Here is what happen.

  1. First, we created an empty object called “cv.error” with five empty spots, which we will use to store information later.
  2. Next, we created a for loop that repeats 5 times
  3. Inside the for loop, we create the same regression model except we added the “poly” function in front of “age”” and also “crim”. These are the variables we want to try polynomials 1-5 one to see if it reduces the error.
  4. The results of the polynomial models are stored in the “cv.error” object and we specifically request the results of “delta” Finally, we printed “cv.error” to the console.

From the results, you can see that the error decreases at a second order polynomial but then increases after that. This means that high order polynomials are not beneficial generally.

Conclusion

LOOCV is another option in assessing different models and determining which is most appropriate. As such, this is a tool that is used by many data scientist.

Validation Set for Regression in R

Estimating error and looking for ways to reduce it is a key component of machine learning. In this post, we will look at a simple way of addressing this problem through the use of the validation set method.

The validation set method is a standard approach in model development. To put it simply, you divide your dataset into a training and a hold-out set. The model is developed on the training set and then the hold-out set is used for prediction purposes. The error rate of the hold-out set is assumed to be reflective of the test error rate.

In the example below, we will use the “Carseats” dataset from the “ISLR” package. Our goal is to predict the competitors’ price for a carseat based on the other available variables. Below is some initial code

library(ISLR)
data("Carseats")
str(Carseats)
## 'data.frame':    400 obs. of  11 variables:
##  $ Sales      : num  9.5 11.22 10.06 7.4 4.15 ...
##  $ CompPrice  : num  138 111 113 117 141 124 115 136 132 132 ...
##  $ Income     : num  73 48 35 100 64 113 105 81 110 113 ...
##  $ Advertising: num  11 16 10 4 3 13 0 15 0 0 ...
##  $ Population : num  276 260 269 466 340 501 45 425 108 131 ...
##  $ Price      : num  120 83 80 97 128 72 108 120 124 124 ...
##  $ ShelveLoc  : Factor w/ 3 levels "Bad","Good","Medium": 1 2 3 3 1 1 3 2 3 3 ...
##  $ Age        : num  42 65 59 55 38 78 71 67 76 76 ...
##  $ Education  : num  17 10 12 14 13 16 15 10 10 17 ...
##  $ Urban      : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 2 2 1 1 ...
##  $ US         : Factor w/ 2 levels "No","Yes": 2 2 2 2 1 2 1 2 1 2 ...

We need to divide our dataset into two part. One will be the training set and the other the hold-out set. Below is the code.

set.seed(7)
train<-sample(x=400,size=200)

Now, for those who are familiar with R you know that we haven’t actually made our training set. We are going to use the “train” object to index items from the “Carseat” dataset. What we did was set the seed so that the results can be replicated. Then we used the “sample” function using two arguments “x” and “size”. X represents the number of examples in the “Carseat” dataset. Size represents how big we want the sample to be. In other words, we want a sample size of 200 of the 400 examples to be in the training set. Therefore, R will randomly select 200 numbers from 400.

We will now fit our initial model

car.lm<-lm(CompPrice ~ Income+Sales+Advertising+Population+Price+ShelveLoc+Age+Education+Urban, data = Carseats,subset = train)

The code above should not be new. However, one unique twist is the use of the “subset” argument. What this argument does is tell R to only use rows that are in the “train” index. Next, we calculate the mean squared error.

mean((Carseats$CompPrice-predict(car.lm,Carseats))[-train]^2)
## [1] 77.13932

Here is what the code above means

  1. We took the “CompPrice” results and subtracted them from the prediction made by the “car.lm” model we developed.
  2. Used the test set which here is identified as “-train” minus means everything that is not in the “train”” index
  3. the results were squared.

The results here are the baseline comparison. We will now make two more models each with a polynomial in one of the variables. First, we will square the “income” variable

car.lm2<-lm(CompPrice ~ Income+Sales+Advertising+Population+I(Income^2)+Price+ShelveLoc+Age+Education+Urban, data = Carseats,subset = train)
mean((Carseats$CompPrice-predict(car.lm2,Carseats))[-train]^2)
## [1] 75.68999

You can see that there is a small decrease in the MSE. Also, notice the use of the “I” function which allows us to square “income”. Now, let’s try a cubic model

car.lm3<-lm(CompPrice ~ Income+Sales+Advertising+Population+I(Income^3)+Price+ShelveLoc+Age+Education+Urban, data = Carseats,subset = train)
mean((Carseats$CompPrice-predict(car.lm3,Carseats))[-train]^2)
## [1] 75.84575

This time there was an increase when compared to the second model. As such, higher order polynomials will probably not improve the model.

Conclusion

This post provided a simple example of assessing several different models use the validation approach. However, in practice, this approach is not used as frequently as there are so many more ways to do this now. Yet, it is still good to be familiar with a standard approach such as this.

Teaching Small Children to Write

Teaching a child to write is an interesting experience. In this post, I will share some basics ideas on one way this can be done.

To Read or not to Read

Often writing is taught after the child has learned to read. A major exception to this is the Montessori method of reading. For Montessori, a child should learn to write before reading. This is probably because writing is a more tactile experience when compared to reading and Montessori was a huge proponent of experiential learning. In addition, if you can write you can definitely read under this assumption.

Generally, I teach young children how to read first. This is because I want the child to know the letters before trying to write them.

The Beginning

If the child is already familiar with the basics of reading writing is probably more about hand-eye coordination than anything else. The first few letters are quite the experience. This is affected by age as well. Smaller children will have much more difficulty with writing than older children.

A common strategy to motivate a child to write is to have them first learn to spell their name. This can work depending on how hard the child’s name is to spell. A kid named “Dan” will master writing his name quickly. However, a kid with a longer name or a transliterated name from another language is going to have a tough time. I knew one student who misspelled their name for almost a year and a half because it was so hard to write in English.

A common way to teach actually writing is to allow the child to trace the words on dot paper. By doing this they develop the muscle memory for writing. Once this is successful the child will then attempt to write the letters with the tracing paper. This process can easily take a year.

Sentences and Paragraphs

After,  they learn to write letters and words it is time to begin writing sentences. A six-year-old, with good penmanship, will probably not be able to write a sentence with support. Writing and spelling and different skills initially and it is the adult’s job to provide support for the spelling aspect as the child explains what they want to write about.

With help, children can create short little stories that may be one to two paragraphs in length. Yet they will still need a lot of support to do this.

By eight years of age, a child can probably write a paragraph on their own about simple concepts or stories. This is when the teaching and learning can really get interesting as the child can now write to learn instead of focusing on learning to write.

Conclusion

Writing is a skill that is hard to find these days. With so many other forms of communication, writing is not a skill that children want to focus on. Nevertheless, learning to write by basic literacy is an excellent way to develop communication skills and interact with people in situations where face-to-face contact is not possible.

Homeschooling Concerns

Parents frequently have questions about homeschooling. In this post, we look at three common questions related to homeschooling.

  1. How do you know if your child has learned
  2. What do you do about socializing
  3. What about college

How do You know if they Learned

One definition of learning is a change in observable behavior. In other words, one-way a parent can know that their child is learning is through watching for changes in behavior. For example, you are teaching addition and the child begins to do addition on their own. It is evidence that they have learned something. There is no need for standardized testing in order to indicate this.

A lot of the more advanced forms of assessment including standardized test was created in order to assess the progress of a huge number of students. In the context of homeschooling with only a few students, such rigorous measures are unnecessary. governments need sophisticated measures of achievement because of the huge populations that they serve which would be inappropriate when dealing with one or two elementary students.

Another way to know what your child has learned is to look at what they are studying right now. For example, if my child is reading I know that they have probably mastered the alphabet. Otherwise, how could the read? I also know that they probably have mastered the most of the phonics. In other words, current struggles are an indication of what was mastered before.

What about Socializing

The answer to this question really depends on your position on socializing. Many parents want their child to act like other children. For example, if my child is 7 I want him to act like other 7-year-olds.

Other parents want their child to learn how to act like an adult. For them, they want their 7-year-old child to imitate the behavior of them (the parents) rather than the behavior of other 7-year-olds. A child will only rise to the expectations of those around them. Being around children encourages childish behavior because that’s the example. Again for many parents, this is what they want, however, others see this differently.

The reality is that until middle-age most of the people we interact with are older than us. As such, it is beneficial for a child to spend a large amount of time around people who are older than them and understand the importance of setting an example that can be imitated.

All socializing is not the same. Adult-to-child socializing provides a child with an example of how to be an adult rather than how to be a child. Besides, most small children would love to be around their parents all day. They only grow to love friends so much because those are the people who give them the most attention.

What about College

This question is the hardest to answer as it depends on context a great deal. Concerns with college can be alleviated by having the child take the GED in the US or local college entrance examinations in other countries.

It is also important to keep careful records of what the child studies during high school. Most colleges do not care about K-8 learning but really want to know what happens during grades 9-12. Keep records of the courses the child took as well as the grades. It will also be necessary to take the SAT or ACT in most countries as well.

Conclusion

Homeschooling is an option for people who want to spend the maximum amount of time possible with their children. Concerns about learning, socializing, and college are unnecessary if the parents are willing to thoroughly dedicate themselves and provide their children with a learning environment that develops their children wholistically.

What it Takes to Homeschool

Some may be wondering what does it take to homeschool. Below are some characteristics of the homeschool.

Time management

Being able to adhere to a schedule is a prerequisite for homeschooling. It is tempting to just kind of doing things whenever when you have this kind of freedom. However, in order to be successful, you have to hold yourself responsibility like your boss would. This is difficult for most people who are not used to autonomy.

This is not to say there should be no flexibility. Rather, the schedule should not be cheated because of laziness. There must be a set schedule for studying for the sake of behavior management of the children. If the child doesn’t know what to expect they may challenge you when you flippantly decide they need to study. Consistency is a foundational principle of homeschooling.

Discipline

Discipline means being able to do something even when you do not feel like doing it. In homeschooling, you have to teach whether you want to or not. Remember, sometimes we had to work at our jobs when we didn’t feel like it and the same with teaching in the home. If you’re tired you still have to teach, if you’re a little sick you still have to teach, if you’re angry you still have to teach.

The child is relying on you to provide them with the academic skills needed to compete in the world. This cannot be neglected for trivial reasons. Lesson plans are key. Either buy them or make them. Keep track of completed assignment and note the progress of the student.

Toughness

As a homeschooling parent, you are the only authority in the child’s life. This means all discipline falls under your jurisdiction. One reasons parents enjoy sending their kids to school is to burden the public school teachers with their own child’s poor behavior. “Let the school deal with him” is a common comment I have heard when I was a k12 teacher. However, when you teach as a homeschool parent only you have the pleasure of disciplining your child.

Discipline is not only about taking away privileges and causing general suffering for unacceptable behavior. Discipline also includes communicating clearly with your child to prevent poor behavior, have clear rules that are always enforced, as well as providing a stable environment in which to study.

Patience

Homeschooling also requires patience. For example, you are teaching a basic first-grade math concept to your child that takes several weeks for them to learn.  Naturally, you start to get angry with the child and yourself for the lack of progress. You may even begin to question if you have what it takes to do this. However, after waiting for what seems an eternity they child finally gets it.

This is the reality of homeschooling. No matter how bad you think you are the child will eventually get it when they are ready. This requires patience in the parent and some confidence in their own ability to help their child to grow.

Conclusion

There are many more ideas I could share. However, this is sufficient for now. In general, I would not recommend homeschooling for the typical family as the above traits are usually missing in the parents. Many parents want to homeschool for emotional reasons. The problem with this is that when they feel bad they will not want to continue the experience. Homeschooling can involve love but it must transcend emotions in order to endure for several years.

Teaching Math in the Homeschool

Teaching a child to count and do simple math is much more challenge then many would believe. Below is a simple process that I accidentally developed from working with kindergarten home-school student for two years. Keep in mind that often these steps overlapped.

  1. Number recognition
  2. Counting
  3. Counting with manipulatives
  4. Flashcards with larger numbers
  5. Writing numbers
  6. Adding with manipulatives
  7. Subtraction with manipulatives
  8. Visual math

1.  Number Recognition

Number recognition simple involved the use of flashcards with the child. I would hold up a number and tell the child what the number was. Memorizing is perhaps one of the easiest things the young mind can do as critical thinking comes much later. This initial process probably took about 6 months with a four-year-old to learn number 1-20.

2. Counting

With the numbers memorized, the next step was to actually learn to count. I did this by holding up the same flashcards. After the child identify what number it was I would then flip the flashcard over and have them count the number of objects on the card. My goal was to have them make a connection between the abstract number and the actual amount that could be seen and counted.

Again it took about six months for the four and half-year-old student to master this from numbers 1-20. It was a really stressful six months.

3. Counting with Manipulatives

The next few steps happen concurrently for the most part. I started to have the student count with manipulatives. I would show or say a number and expect the student to count the correct number using the manipulatives. This was done with numbers 1-20 only.

4. Flashcards with Larger Numbers 

At the same time, I worked with the student to learn numbers beyond 20. This was strictly for memorization purposes. This continued from 4.5 to 6 years of age. Eventually, the child could identify numbers 1-999. However, the never discovered the pattern of counting. By pattern, I mean how the 0-9 cycle repeats in the tens, how the 1-9 cycle repeats for the tens when moving to 100s, etc. The child only knew the numbers through brute memorization.

5. Writing Numbers

Writing numbers was used as preparation for doing addition. It was as simple as giving the student some numbers to trace on paper. It took about 8 months for the student to write numbers with any kind of consistency.

6. Adding with Manipulatives

This involved me writing a math problem and having the student solve the problem use manipulatives. For example, 2 + 2 would be solved by having the student count two manipulatives and then count two more and then count the total.

My biggest concern was having the child understand the + and = sign. The plus sign was easy but the equal sign was mysterious for a long time. However, the learning rate was picking up and the kid learn this in about 3 months

7. Subtraction with Manipulatives

Same as above but only took one month to learn

8. Visual Math

At this stage,  the child was doing worksheets on their own. Manipulatives were allowed as a crutch to get through the problems. However, the child was now being encouraged to use their fingers for counting purposes. This was a disaster for several weeks as the lack the coordination to open and close the fingers independent of each other.

Conclusion

This entire process took two years to complete from ages 4-6 working with the child one-on-one. By the age of six, the child could add and subtract anything from 1-30 and was ready for 1st grade.

I would recommend waiting longer to start math with a child. Being 4 was probably too young for this particular child. Better to wait untili 5 or 6 to learn numbers and counting. There more danger in starting early then there is in starting late.

Confusing Words for Small Children

In this post, we will look at some commonly used words that can bring a great deal of frustration to adults when communicating with small children. The terms are presented in the following categories

  • Deictic terms
  • Interrogatives
  • Locational terms
  • Temporal terms

Deictic Terms

Deictic terms fall under the umbrella of pragmatic development or understanding of the context in which words are used. Examples of deictic terms include such words as this, that, these, those, here, there, etc. What makes these words confusing for young children and even ESL speakers is that the meaning of these words depends on the context. Below is a clear way to communicate followed by a way that is unclear using a deixis term

Clear communication: Take the book
Unclear communication: Take that

The first sentence makes it clear what to take which in this example is the book. However, for a child or ESL speaker, the second sentences can be mysterious. What does “that” mean. It takes pragmatic or contextual knowledge to determine what “that” is referring to in the sentence. Children usually cannot figure this out while an ESL speaker will watch the body language (nonlinguistic cues) of the speaker to figure this out.

Interrogatives

A unique challenge for children is understanding interrogatives. These are such words as who, what, where, when, and why. The challenge with these questions is they involve explaining the cause, time, and or reasons. Many parents have asked the following question without receiving an adequate answer

Why did you take the book?

The typical 3-year old is going to wonder what the word “why” means. Off course, you can combine a deictic term with an interrogative and completely lose a child

Why did you do that?

Locational Terms

Locational terms are prepositions words such as in, under, above, behind etc. These words can be challenging for young children because they have to understand the perspective of the person speaking. Below is an example.

Put the book under the table.

Naturally, the child is trying to understand what “under” means. We can also completely confuse a child by using terms from all the categories we have discussed so far.

Why did you put that under the table?

This sentence would probably be unclear to many native speakers. The ambiguity is high especially with the term “that” included.

Temporal Terms

Temporal terms are about time. Commonly used words include before, after, while, etc. These terms are difficult for children because young children do not quite grasp the concept of time. Below is an example of a sentence with a temporal term.

Before, dinner, grab the book

The child is probably wondering when they are supposed to get the book. Naturally, we can combine all of our terms to make a truly nightmarish sentence.

Why did you put that under the table after dinner?

Conclusion

The different terms mentioned here are terms that can cause frustration when trying to communicate. To alleviate, these problems parents and teachers should avoid these terms when possible by using nouns. In addition, using body language to indicate position or pointing to whatever you are talking about can help young children to infer the meaning