Author Archives: Dr. Darrin

Review of “The Greek News”

In this post, we will take a look at the book History News: The Greek News by Anton Powell and Philip Steele (pp. 32).

The Summary

This book takes actually historical events from Ancient Greece and reduces them into newspaper style articles. The writing style is similar to anything you would see on CNN, NBC, New York Times, etc. Some of the stories in the book include an article on the anger of Greeks on colonists returning to Greece instead of staying overseas The anger was due to the lack of food in Greece and the frustration of having to support the returning colonist.

Another story is the victories of Alexander the Great and his untimely death in his early thirties. There are also several articles on life in Sparta as well as the Olympic Games. There are also advertisements on several pages just as in a real newspaper. My personal the potty training toilet for small children (pg. 18). I am assuming this book is historically accurate

The Good

The authors truly earn an ‘A’ for creativity. Taking the unknown (Greek History) and combining it with the know (modern day news writing) is an excellent pedagogical tool. Like all newspapers, there are many illustrations. Not with photos of course as they were not invented yet but with hand drawings. 

The stories are interesting and give you a picture of everyday life in Greece. There are interviews with housewives, actors, and even architects.

The Bad

Creativity can also be a curse. I love this approach but it may be confusing to people who cannot juxtapose news articles with ancient Greek history. This is probably especially true with young children as they are unfamiliar with both Ancient Greece and news-style writing.

The writing almost assumes that the reader is Greek. Again this requires a lot of background knowledge prior to using this text. Perhaps at the end of a unit on Greece would be an appropriate time to use this text. You may want to try photocopying a few articles and reading them together as a class.

The Recommendation

This book deserves 4.5/5 stars. It is highly engaging with its use of illustration and the clever use of news style writing. The kids will enjoy the pictures and the unique approach to teaching. In addition, for students, they need to be prepared for this type of learning experience through other forms of exposure to Ancient Greece. In other words, this text is excellent supplementary materials, however, a foundation should be laid in advance of the main points of Ancient Greece to avoid confusion due to the writing style of this text.

Advertisements

Social Dimensions of Language

In sociolinguistics, social dimensions are the characteristics of the context that affect how language is used. Generally, there are four dimensions to the social context that are measured are analyzed through the use of five scales. The four dimension and five scales are as follows.

  • Social distance
  • Status
  • Formality
  • Functional (which includes a referential and affective function)

This post will explore each of these four social dimensions of language.

Social Distance

Social distance is an indicator of how well we know someone that we are talking to.  Many languages have different pronouns and even declensions in their verbs based on how well they know someone.

For example, in English, a person might say “what’s up?” to a friend. However, when speaking to a stranger, regardless of the strangers status, a person may say something such as “How are you?”. The only reason for the change in language use is the lack of intimacy with the stranger as compared to the friend.

Status

Status is related to social ranking. The way we speak to peers is different than how we speak to superiors. Friends are called by their first name while a boss, in some cultures, is always referred to by Mr/Mrs or sir/madam.

The rules for status can be confusing. Frequently we will refer to our parents as mom or dad but never Mr/Mrs. Even though Mr/Mrs is a sign of respect it violates the intimacy of the relationship between a parent and child. As such, often parents would be upset if their children called them Mr/Mrs.

Formality

Formality can be seen as the presence or absences of colloquial/slang in a person’s communication. In a highly formal setting, such as a speech, the language will often lack the more earthy style of speaking. Contractions may disappear, idioms may be reduced, etc. However, when spending time with friends at home a more laid-back manner of speaking will emerge

However, when spending time with friends at home a more laid-back manner of speaking will emerge. One’s accent becomes more promeneint, slang terms are permissiable, etc.

Function (Referential & Affective)

Referential is a measure of the amount of information being shared in a discourse. The use of facts, statistics, directions, etc. Affective relates to the emotional content of communication and indicates how someone feels about the topic.

Often referential and affective functions interrelated such as in the following example.

James is a 45 year-old professor of research who has written several books but is still a complete idiot!

This example above shares a lot of information as it shares the person’s name, job, and accomplishments. However, the emotions of the speaker are highly negative towards James as they call James a “complete idiot.”

Conclusion 

The social dimensions of language are useful to know in order to understand what is affecting how people communicate. The concepts behind the four dimensions impact how we talk without most us knowing why or how. This can be frustrating but also empowering as people will understand why they adjust to various contexts of language use.

Journal Writing

A journal is a log that a student uses to record their thoughts about something. This post will provide examples of journals as well as guidelines for using journals in the classroom.

Types of Journals

There are many different types of journals. Normally, all journals have some sort of dialog happening between the student and the teacher. This allows both parties to get to know each other better.

Normally, journals will have a theme or focus. Examples in TESOL would include journals that focus on grammar, learning strategies, language-learning, or recording feelings. Most journals will focus on one of these to the exclusion of the others.

Guidelines for Using Journals

Journals can be useful if they are properly planned. As such, a teacher should consider the following when using journals.

  1. Provide purpose-Students need to know why they are writing journals. Most students seem to despise reflection and will initially reject this learning experience
  2. Forget grammar-Journals are for writing. Students need to set aside the obsession they have acquired for perfect grammar and focus on developing their thoughts about something. There is a time and place for grammar and that is for summative assessments such as final drafts of research papers.
  3. Explain the grading process-Students need to know what they must demonstrate in order to receive adequate credit.
  4. Provide feedback-Journals are a dialog. As such, the feedback should encourage and or instruct the students.  The feedback should also be provided consistently at scheduled intervals.

Journals take a lot of time to read and provide feedback too. In addition, the handwriting quality of students can vary radically which means that some students journals are unreadable.

Conclusion

Journaling is an experience that allows students to focus on the process of learning rather than the product. This is often neglected in the school experience. Through journals, students are able to focus on the development of ideas without wasting working memory capacity on grammar and syntax. As such, journals can be a powerful in developing critical thinking skills.

Review of “Usborne Time Traveler”

This post is a review of the book Usborne Time Traveler (pp. 130).

The Summary

This is a historical text that takes you on a journey of historical time periods the Knights, Vikings, Romans, and ancient Egypt. An unnamed boy has this “helmet” that allows him to travel to this different periods.

In each period, there is a list of the type of people you will read about as well as a fictitious family. The family is always a wealthy or aristocratic family. For example, in the Knight’s section of the book, you learn about Baron Godfrey’s family. You watch his son Simon become a knight. During the Roman section, we meet Petronius and his family and see his sister Antonia disciplining the children.

Each section of the book depicts daily life and events during that period. For example, during the Viking section, there is preparation for a raid on a village. During the Egyptian section of the book, you get to witness a trip to the market as well as a feast. You also get to witness Baron’s Godfrey’s castle survive a siege from a rival nobleman.

The Good

This book provides examples of the clothing, food, language, and other customs of each culture. The pictures are simple yet provide excellent examples that young children can understand. The fictitious family used in each section helps pedagogically as children can relate to the idea of a family and this knowledge helps them to understand the complex aspects of each time periods culture and ways.

Watching the families interact with their world was always interesting and helped in making this ancient history interesting and relevant. From Caius walking to school with a torch to the funeral of Olaf, it seems as if you are actually there for this small experiences.

The Bad

It’s hard to find any complaints about this book. Both old and young can enjoy this text. The older students can read the text and the younger can focus on the pictures. However, there are some violent scenes in the text at times that some parents may object to.

The Recommendation

This book is absolutely 5/5. It is well-written, has excellent illustrations, and paid attention to concepts of teaching and communication. This book should be any every elementary school’s history teacher’s library.

Cradle Approach to Portfolio Development

Portfolio development is one of many forms of alternative assessment available to teachers. When this approach is used, generally the students collected their work and try to make sense of it through reflection.

It is surprisingly easy for portfolio development to amount to nothing more than archiving work. However, the CRADLE approach was developed by Gottlieb to alleviate potential confusion over this process. CRADLE stands for the following

C ollecting
R eflecting
A ssessing
D ocumenting
L inking
E valuating

Collecting

Collecting is the process in which the students gather materials to include in their portfolio. It is left to the students to decide what to include. However, it is still necessary for the teacher to provide clear guidelines in terms of what can be potentially selected.

Clear guidelines include stating the objectives as well as explaining how the portfolio will be assessed. It is also important to set aside class time for portfolio development.

Some examples of work that can be included in a portfolio include the following.

  • tests, quizzes
  • compositions
  • electronic documents (powerpoints, pdfs, etc)

Reflecting

Reflecting happens through the student thinking about the work they have placed in the portfolio. This can be demonstrated many different ways. Common ways to reflect include the use of journals in which students comment on their work. Another way for young students is the use of checklist.

Another way for young students is the use of a checklist. Students simply check the characteristics that are present in their work. As such, the teacher’s role is to provide class time so that students are able to reflect on their work.

Assessing

Assessing involves checking and maintaining the quality of the portfolio over time. Normally, there should a gradual improvement in work quality in a portfolio. This is a subjective matter that is negotiated by the student and teacher often in the form of conferences.

Documenting

Documenting serves more as a reminder than an action. Simply, documenting means that the teacher and student maintain the importance of the portfolio over the course of its usefulness. This is critical as it is easy to forget about portfolios through the pressure of the daily teaching experience.

Linking

Linking is the use of a portfolio to serve as a mode of communication between students, peers, teachers, and even parents. Students can look at each other portfolios and provide feedback. Parents can also examine the work of their child through the use of portfolios.

Evaluating

Evaluating is the process of receiving a grade for this experience. For the teacher, the goal is to provide positive washback when assessing the portfolios. The focus is normally less on grades and more qualitative in nature.

Conclusions

Portfolios provide rich opportunities for developing intrinsic motivation, individualize learning, and critical thinking. However, the trying to affix a grade to such a learning experience is often impractical. As such, portfolios are useful but it can be hard to prove that any learning took place.

Data Munging with Dplyr

Data preparation aka data munging is what most data scientist spend the majority of their time doing. Extracting and transforming data is difficult, to say the least. Every dataset is different with unique problems. This makes it hard to generalize best practices for transforming data so that it is suitable for analysis.

In this post, we will look at how to use the various functions in the “dplyr”” package. This package provides numerous ways to develop features as well as explore the data. We will use the “attitude” dataset from base r for our analysis. Below is some initial code.

library(dplyr)
data("attitude")
str(attitude)
## 'data.frame':    30 obs. of  7 variables:
##  $ rating    : num  43 63 71 61 81 43 58 71 72 67 ...
##  $ complaints: num  51 64 70 63 78 55 67 75 82 61 ...
##  $ privileges: num  30 51 68 45 56 49 42 50 72 45 ...
##  $ learning  : num  39 54 69 47 66 44 56 55 67 47 ...
##  $ raises    : num  61 63 76 54 71 54 66 70 71 62 ...
##  $ critical  : num  92 73 86 84 83 49 68 66 83 80 ...
##  $ advance   : num  45 47 48 35 47 34 35 41 31 41 ...

You can see we have seven variables and only 30 observations. Our first function that we will learn to use is the “select” function. This function allows you to select columns of data you want to use. In order to use this feature, you need to know the names of the columns you want. Therefore, we will first use the “names” function to determine the names of the columns and then use the “select”” function.

names(attitude)[1:3]
## [1] "rating"     "complaints" "privileges"
smallset<-select(attitude,rating:privileges)
head(smallset)
##   rating complaints privileges
## 1     43         51         30
## 2     63         64         51
## 3     71         70         68
## 4     61         63         45
## 5     81         78         56
## 6     43         55         49

The difference is probably obvious. Using the “select” function we have 3 instead of 7 variables. We can also exclude columns we do not want by placing a negative in front of the names of the columns. Below is the code

head(select(attitude,-(rating:privileges)))
##   learning raises critical advance
## 1       39     61       92      45
## 2       54     63       73      47
## 3       69     76       86      48
## 4       47     54       84      35
## 5       66     71       83      47
## 6       44     54       49      34

We can also use the “rename” function to change the names of columns. In our example below, we will change the name of the “rating” to “rates.” The code is below. Keep in mind that the new name for the column is to the left of the equal sign and the old name is to the right

attitude<-rename(attitude,rates=rating)
head(attitude)
##   rates complaints privileges learning raises critical advance
## 1    43         51         30       39     61       92      45
## 2    63         64         51       54     63       73      47
## 3    71         70         68       69     76       86      48
## 4    61         63         45       47     54       84      35
## 5    81         78         56       66     71       83      47
## 6    43         55         49       44     54       49      34

The “select”” function can be used in combination with other functions to find specific columns in the dataset. For example, we will use the “ends_with” function inside the “select” function to find all columns that end with the letter s.

s_set<-head(select(attitude,ends_with("s")))
s_set
##   rates complaints privileges raises
## 1    43         51         30     61
## 2    63         64         51     63
## 3    71         70         68     76
## 4    61         63         45     54
## 5    81         78         56     71
## 6    43         55         49     54

The “filter” function allows you to select rows from a dataset based on criteria. In the code below we will select only rows that have a 75 or higher in the “raises” variable.

bigraise<-filter(attitude,raises>75)
bigraise
##   rates complaints privileges learning raises critical advance
## 1    71         70         68       69     76       86      48
## 2    77         77         54       72     79       77      46
## 3    74         85         64       69     79       79      63
## 4    66         77         66       63     88       76      72
## 5    78         75         58       74     80       78      49
## 6    85         85         71       71     77       74      55

If you look closely all values in the “raise” column are greater than 75. Of course, you can have more than one criteria. IN the code below there are two.

filter(attitude, raises>70 & learning<67)
##   rates complaints privileges learning raises critical advance
## 1    81         78         56       66     71       83      47
## 2    65         70         46       57     75       85      46
## 3    66         77         66       63     88       76      72

The “arrange” function allows you to sort the order of the rows. In the code below we first sort the data ascending by the “critical” variable. Then we sort it descendingly by adding the “desc” function.

ascCritical<-arrange(attitude, critical)
head(ascCritical)
##   rates complaints privileges learning raises critical advance
## 1    43         55         49       44     54       49      34
## 2    81         90         50       72     60       54      36
## 3    40         37         42       58     50       57      49
## 4    69         62         57       42     55       63      25
## 5    50         40         33       34     43       64      33
## 6    71         75         50       55     70       66      41
descCritical<-arrange(attitude, desc(critical))
head(descCritical)
##   rates complaints privileges learning raises critical advance
## 1    43         51         30       39     61       92      45
## 2    71         70         68       69     76       86      48
## 3    65         70         46       57     75       85      46
## 4    61         63         45       47     54       84      35
## 5    81         78         56       66     71       83      47
## 6    72         82         72       67     71       83      31

The “mutate” function is useful for engineering features. In the code below we will transform the “learning” variable by subtracting its mean from its self

attitude<-mutate(attitude,learningtrend=learning-mean(learning))
head(attitude)
##   rates complaints privileges learning raises critical advance
## 1    43         51         30       39     61       92      45
## 2    63         64         51       54     63       73      47
## 3    71         70         68       69     76       86      48
## 4    61         63         45       47     54       84      35
## 5    81         78         56       66     71       83      47
## 6    43         55         49       44     54       49      34
##   learningtrend
## 1    -17.366667
## 2     -2.366667
## 3     12.633333
## 4     -9.366667
## 5      9.633333
## 6    -12.366667

You can also create logical variables with the “mutate” function.In the code below, we create a logical variable that is true when the “critical” variable” is higher than 80 and false when “critical”” is less than 80. The new variable is called “highCritical”

attitude<-mutate(attitude,highCritical=critical>=80)
head(attitude)
##   rates complaints privileges learning raises critical advance
## 1    43         51         30       39     61       92      45
## 2    63         64         51       54     63       73      47
## 3    71         70         68       69     76       86      48
## 4    61         63         45       47     54       84      35
## 5    81         78         56       66     71       83      47
## 6    43         55         49       44     54       49      34
##   learningtrend highCritical
## 1    -17.366667         TRUE
## 2     -2.366667        FALSE
## 3     12.633333         TRUE
## 4     -9.366667         TRUE
## 5      9.633333         TRUE
## 6    -12.366667        FALSE

The “group_by” function is used for creating summary statistics based on a specific variable. It is similar to the “aggregate” function in R. This function works in combination with the “summarize” function for our purposes here. We will group our data by the “highCritical” variable. This means our data will be viewed as either TRUE for “highCritical” or FALSE. The results of this function will be saved in an object called “hcgroups”

hcgroups<-group_by(attitude,highCritical)
head(hcgroups)
## # A tibble: 6 x 9
## # Groups:   highCritical [2]
##   rates complaints privileges learning raises critical advance
##                            
## 1    43         51         30       39     61       92      45
## 2    63         64         51       54     63       73      47
## 3    71         70         68       69     76       86      48
## 4    61         63         45       47     54       84      35
## 5    81         78         56       66     71       83      47
## 6    43         55         49       44     54       49      34
## # ... with 2 more variables: learningtrend , highCritical 

Looking at the data you probably saw no difference. This is because we are not done yet. We need to summarize the data in order to see the results for our two groups in the “highCritical” variable.

We will now generate the summary statistics by using the “summarize” function. We specifically want to know the mean of the “complaint” variable based on the variable “highCritical.” Below is the code

summarize(hcgroups,complaintsAve=mean(complaints))
## # A tibble: 2 x 2
##   highCritical complaintsAve
##                   
## 1        FALSE      67.31579
## 2         TRUE      65.36364

Of course, you could have learned this through doing a t.test but this is another approach.

Conclusion

The “dplyr” package is one powerful tool for wrestling with data. There is nothing new in this package. Instead, the coding is simpler than what you can excute using base r.

Review of “Eye Wonder: Space”

In this post, we will take a look at the book Eye Wonder: Space (Eye Wonder) by Simon Holland (pp.48).

The Summary

This book takes on a journey defining the various characteristics related to space. The journey begins on earth where you look at the stars. From there, the book talks about the moon, the sun, the planets of the solar system, the Milky Way, and places in space beyond our galaxy.

The Good

This book is rich in photos which is consistent with its title. Students get to see what Mars,  asteroids, and even what life is like in space for humans. The book also offers explanations about the characteristics of various features of space. For example, it explains why Mercury is so hot, how stars die, as well why Mars is red.

This text is definitely for individual reading. The way the text is set up and the pictures make it that way.

The Bad

One of the biggest problems with this text is the choice of font color. If the background is black the font color was always white which is acceptable. However, if the background was any other color the font color was black. This often led to problems with trying to read black font on the surface of red Mars, on a night sky filled with stars, or when looking at the deep blue Neptune. There were also times when the text was probably too small for younger readers

There were also times when the text was probably too small for younger readers. However, the small text was normally used for details that did not affect the big picture.

The Recommendation

This book deserves 3/5 stars. It can provide some entertainment for one or a small group of students. It can also provide supplemental information for both the teacher or students. Add it to your library if you are looking to broaden the number of available books.

Teaching a Child to Read

Learning to read is in no way an easy experience. In order to read at even the most basic level requires mastery of syntax, phonology, morphology, and semantics at a minimum. These are skills that we expect a child normally under the age of 8 to show some proficiency at.

This post will explain a process for teaching reading to small children that worked. Of course, there is no claim here that this is the way but it does provide an example. When I began this experience I had been an educator for years at higher grades but had never actually taught anybody how to read. My training and experience have mostly been in improving reading comprehension skills.

The Process

The process I stumble upon goes as follows

  1. Letter recognition
  2. Letter sound production
  3. Word family phonics
  4. Sight words
  5. Reading stories with support from steps 3 & 4

Each step builds on the steps before it

Letter Recognition

The first step in this process was to have the child recognize the letters of the alphabet. This was done through the use of flashcards. In many ways, this was the easiest step. I thought it would take a year for a 4-year-old to learn this but it only took 3-4 months

Letter recognition relates to morphology as letters are in many ways morphemes that cannot be further divided. At this point, the learning experience is simply memory only with no application

Letter Sound Production

Once the alphabet was memorized, I exposed the student to the sounds of the letters. The student then had to reproduce the sound in addition to recognizing what letter it was.

This was much tougher. The student would either forget what letter it was or forget the sound or both. There was a lot of frustration. However, after several more months, we were ready to move on.

Letter sound production is an example of phonology or the understanding of the sounds letters make. This is a crucial step in learning to read.

Word Family Phonics

At this stage, we combine several letters and “sound” them out to produce words. Often, the words used had the same ending or morpheme such as “-ap”, “-at”, “-ad”. etc. and only the first letter would change. This helps the student to recognize patterns quickly at least in theory.

There was also an introduction to vowels and other common morphemes. Looking back I consider this a mistake as it seemed to be confusing for the student. In addition, although phonics are valuable in learning to sound out words I found them to lack context and read “cap”, “tap”, and “map” outside the setting of some story was boring for the student.

Sight Words

Sight words are words that are so common in English that they need to be memorized. Often they cannot be sounded out because they violate the rules of phonology but this is not always the case.

There are two common systems of sight words and these are Dolch and Fry respectively. In terms of which is better, it doesn’t really matter. I used Fry’s and again I think the lack of context was a problem as I was asking the student to learn words that lack an immediate application.

Reading Stories

After about a year of preparatory training, we finally began reading stories. The stories were little short stories appropriate for kindergarteners. At first, it was difficult but the student began to improve rapdily. It was much easier (usually) to get them to cooperate as well.

Conclusion 

The most important point is perhaps not the most obvious one. despite my inexperience and mistakes in pedagogy, the student still learned to read. In many ways, the student learned to read in spite of me. This should be reassuring for many teachers. Even bad teaching can get good results if the aspects of planning, discipline, and commitment to success are there. Students seem to grow as long as they have some guidance.

I would say the most important thing in terms of teaching reading is to actually make them read. Reading provides context and motivation as the student can see what they cannot do. Studying all of the theoretical aspects of reading such as phonics and letters are only beneficial when the child knows they need to know this.

Therefore, if you are provided with an opportunity to teach a child to read start with stories and as the struggle teach only what they are struggling with. For example, if they are having a hard time with long “o” sound, reinforcing that with supplemental theoretical work will make sense for the child. As such, children learn best by doing rather than talking about what they will do.

Types of Rubrics for Writing

Grading essays, papers and other forms of writing is subjective and frustrating for teachers at times. One tool that helps in improving the consistency of the marking, as well as the speed, is the use of rubrics. In this post, we will look at three commonly used rubrics which are…

  • Holistic
  • Analytical
  • Primary trait

Holistic Rubric

A holistic rubric looks at the overall quality of the writing. Normally, there are several levels on the rubric and each level has several descriptors on it. Below is an example template

Presentation1.gifThe descriptors must be systematic which means that they are addressed in each level and in the same order. Below is an actual Holistic Rubric for Writing.

Presentation1In the example above, there are four levels of marking. The descriptors are

  • idea explanation
  • coherency
  • grammar

Between levels, different adverbs and adjectives are used to distinguish the levels.  For example, in level one, “ideas are thoroughly explained” becomes “ideas are explained” in the second level. The use of adverbs is one of the easiest ways to distinguish between levels in a holistic rubric.

Holistic rubrics offer the convenience of fast marking that is easy to interpret and comes with high reliability. The downside is that there is a lack of strong feedback for improvement.

Analytical Rubrics

Analytical rubrics assign a score to each individual attribute the teacher is looking for in the writing. In other words, instead of lumping all the descriptors together as is done in a holistic rubric, each trait is given its own score. Below is a template of an analytical rubric.

Presentation1

You can see that the levels are across the top and the descriptors across the side. Best performance moves from left to right all the way to worst performance. Each level is assigned a range of potential point values.

Below is an actual holistic writing template

Presentation1

Analytical rubrics provide much more washback and learning than holistic. Of course, they also take a  lot more time for the teacher to complete as well.

Primary Trait

A lesser-known way of marking papers is the use of primary trait rubric. With primary trait, the student is only assessed on one specific function of writing. For example, persuasion if they are writing an essay or perhaps vocabulary use for an ESL student writing paragraphs.

The template would be similar to a holistic rubric except that there would only be on descriptor instead of several. The advantage of this is that it allows the teacher and the student to focus on one aspect of writing. Naturally, this can be a disadvantage as writing involves more than one specific skill.

Conclusion

Rubrics are useful for a variety of purposes. For writing, it is critical that you understand what the levels and descriptors are one deciding on what kind of rubric you want to use. In addition, the context affects the use of what type of rubric to use as well.

Review of “The Usborne Book of Houses and Homes”

The Houses and Homes (World geography) by Carol Bowyer (pp. 32) provides insights into how people live from all over the world.

The Summary

This book covers how people live in various climates and locations throughout the world. Living in water, living in caves, in icy places, and the jungle are just some of the examples from the text.

The text is not limited to just housing but also discusses the cultures of various people groups. Students learn about the Turcoman women of Iran making felt for their tents, the Huichol of Mexico grinding maize, and the hunting style of the Eskimos of Alaska to name a few.

The Good 

The multitude of illustrations is always a strength of books from Usborne. Students will be able to see how these people live with an emphasis on the way they live. There are also activities that the students can do that the book provides. For example, the can play an Eskimo game, learn how to make good luck crosses like the Huichol, and how to make a tent.

The text is readable for older elementary students. Younger students would enjoy and learn a great deal from seeing the pictures. In many ways, there is a little bit for everybody in this text.

The Bad

Some of the illustrations are small which relegates this book to the library of your classroom. With so much rich illustration many kids can bypass reading and just learn through the pictures. This is only a problem if you are trying to get the kids to read. For more sensitive people there is a little nudity as the illustrator drew pictures of what the people actual wear or do not wear.

The Recommendation

I would give this book 3.5/5 stars. It’ great supplementary material for any social studies course. The activities provided are more for fun than learning. However, the visuals are excellent for exposing children and stimulating discussion about how people live in the world today.

Guiding the Writing Process

How a teacher guides the writing process can depend on a host of factors. Generally, how you support a student at the beginning of the writing process is different from how you support them at the end. In this post, we will look at the differences between these two stages of writing.

The Beginning

At the beginning of writing, there are a lot of decisions that need to be made as well as extensive planning. Generally, at this point, grammar is not the deciding factor in terms of the quality of the writing. Rather, the teacher is trying to help the students to determine the focus of the paper as well as the main ideas.

The teacher needs to help the student to focus on the big picture of the purpose of their writing. This means that only major issues are addressed at least initially. You only want to point at potential disaster decisions rather than mundane details.

It is tempting to try and fix everything when looking at rough drafts. This not only takes up a great deal of your time but it is also discouraging to students as they deal with intense criticism while still trying to determine what they truly want to do. As such, it is better to view your role at this point as a counselor or guide and not as detail oriented control freak.

At this stage, the focus is on the discourse and not so much on the grammar.

The End

At the end of the writing process, there is a move from general comments to specific concerns. As the student gets closer and closer to the final draft the “little things” become more and more important. Grammar comes to the forefront. In addition, referencing and the strength of the supporting details become more important.

Now is the time to get “picky” this is because major decisions have been made and the cognitive load of fixing small stuff is less stressful once the core of the paper is in place. The analogy I like to give is that first, you build the house. Which involves lots of big movements such as pouring a foundation, adding walls, and including a roof. This is the beginning of writing. The end of building a house includes more refined aspects such as painting the walls, adding the furniture, etc. This is the end of the writing process.

Conclusion

For writers and teachers, it is important to know where they are in the writing process. In my experience, it seems as if it is all about grammar from the beginning when this is not necessarily the case. At the beginning of a writing experience, the focus is on ideas. At the end of a writing experience, the focus is on grammar. The danger is always in trying to do too much at the same time.

Academic vs Applied Research

Academic and applied research are perhaps the only two ways that research can be performed. In this post, we will look at the differences between these two perspectives on research.

Academic Research

Academic research falls into two categories. These two categories are

  • Research ON your field
  • Research FOR your field

Research ON your field is research is research that is searching for best practice. It looks at how your academic area is practiced in the real world. A scholar will examine how well a theory is being applied or used in a real-world setting and make recommendations.

For example, in education, if a scholar does research in reading comprehension, they may want to determine what are some of the most appropriate strategies for teaching reading comprehension. The scholar will look at existing theories and such which one(s) are most appropriate for supporting students.

Research ON your field is focused on existing theories that are tested with the goal of developing recommendations for improving practice.

Research FOR your field is slightly different. This perspective seeks to expand theoretical knowledge about your field. In orders, the scholar develops new theories rather than assess the application of older ones.

An example of this in education would be developing a new theory in reading comprehension. By theory, it is meant explanation. Famous theories in education include Piaget’s stages of development, Kohlberg’s stages of moral development, and more. At their time each of these theories pushes the boundaries of our understanding of something.

The main thing about academic research is that it leads to recommendations but not necessarily to answers that solve problems. Answering problems is something that is done with applied research.

Applied Research

Applied research is also known as research IN your field. This type of research is often performed by practitioners in the field.

  • research IN your field

There are several forms of research IN your field and they are as follows

  • Formative
  • Monitoring
  • Summative

Formative research is for identifying problems. For example, a teacher may notice that students are not performing well or doing their homework. Formative applied research is when the detective hat is put on and the teacher begins to search for the cause of this behavior.

The results of formative research lead to some sort of an action plan to solve the problem. During the implementation of the solution, monitoring applied research is conducted. Monitoring research is conducted during implementation of a solution to see how things are going.

For example, if the teacher discovers that students are struggling with reading because they are struggling with phonological awareness.  They may implement a review program of this skill for the students. Monitoring would involve assessing student performance of reading during the program.

Summative applied research is conduct at the end of implementation to see if the objectives of the program were met. Returning to the reading example, if the teacher’s objective was to improve reading comprehension scores 10% the summative research would assess how well the students can now read and whether there was a 10% improvement.

In education, applied research is also known as action research.

Conclusion

Research can serve many different purposes.  Academics focus on recommendations, not action while practitioners want to solve problems and perhaps not recommend as much. The point is that understanding what type of research you are trying to conduct can help you in shaping the direction of your study.

Review of “A Child’s History of the World”

The history textbook A Child’s History of the World by V.M. Hillyer (pp. 432) was originally written almost 100 years ago. Since then it has been revised and expanded by several other authors. This review is based on the 2014 edition of the text.

The Summary

This textbook is a survey of world history written at the comprehension level of a child. With most surveys, the text covers a little bit everything. Examples of topics in the book include Egyptian, Jewish, Greek, Roman, African, British civilizations and even the rise of the US and USSR. Naturally, many of the major wars of the past 5,000 years are covered as well.

Famous characters from history who are discussed in the book range from Alexander the Great to Jesus Christ as well as Emperor Constantine and even Richard Wagner the famous German composer of the 19th century.

The Good 

For a child’s book, there is a surprising amount of detail. For example, the book explains about  Zoroastrianism, which was the religion of the Medo-Persian empire. How many students today are familiar with such a topic?  In addition, the text is really written in an easy to read format.

The chapters are short, which is critical for young readers. There is also support with pronouncing various words that may be unusual to a western student.  There are also some illustrations throughout the book

The Bad

Given its age (almost 100) the pedagogical approach of the book is outdated. It’s heavy on text and light on illustrations Furthermore, the book lacks any sort of learning tools common in today’s textbooks such as inserts, vocabulary words, questions, discussion items, etc. It is literally just text.

At the time that it was written this text could probably be read by a small child. Today, however, the writing style would probably be more appropriate for high school as in-depth reading is not as common as it once was. With so much text it is almost impossible to read this to a class. My students became extremely bored and antsy when I attempted this even though a chapter is only three pages in length at times. I had to scrape reading it aloud and try another way to teach historical concepts. As such both, whole-class an individual reading of this text is difficult because peoples’ habits have change since the Depression.

The Recommendation

I would give this book 1.5/5 stars. It needs significant pedagogical support in order to be effective in the 21st-century classroom. The teacher would need to prepare support materials in order to help students with understanding the text. All textbooks require scaffolding support from the teacher but this book requires an extraordinary amount of help to provide learning experiences.

However, this book could be useful as a resource for a teacher who needs additional knowledge to teach history to children. In addition, if a regular textbook is already in use then A Child’s History of the World could serve as supplementary material that would allow the class to go deeper on a particular topic. The days of this text being the main source on history for children are probably over.

Types of Writing

This post will look at several types of writing that are done for assessment purposes. In particular, we will look this from the four level of writing which are

  • Imitative
  • Intensive
  • Responsive
  • Extensive

Imitative 

Imitative writing is focused strictly on the grammatical aspects of writing. The student simply reproduces what they see. This is a common way to teach children how to write. Additional examples of activities at this level include cloze task in which the student has to write the word in the blank from a list, spelling test, matching, and even converting numbers to their word equivalent.

Intensive

Intensive writing is more concern about selecting the appropriate word for a given context. Example activities include grammatical transformation, such as changing all verbs to past tense, sequencing pictures, describing pictures, completing short sentences, and ordering task.

Responsive 

Responsive writing involves the development of sentences into paragraphs. The purpose is almost exclusively on the context or function of writing. Form concerns are primarily at the discourse level which means how the sentences work together to make paragraphs and how the paragraphs work to support a thesis statement. Normally no more than 2-3 paragraphs at this level

Example activities at the responsive level include short reports, interpreting visual aids, and summary.

Extensive

Extensive writing is responsive writing over the course of an entire essay or research paper. The student is able to shape a purpose, objectives, main ideas, conclusions, etc. Into a coherent paper.

For many students, this is exceedingly challenging in their mother tongue and is further exasperated in a second language. There is also the experience of multiple drafts of a single paper.

Marking Intensive & Responsive Papers

Marking higher level papers requires a high degree of subjectivity. THis is because of the authentic nature of this type of assessment. As such, it is critical that the teacher communicate expectations clearly through the use of rubrics or some other form of communication.

Another challenge is the issue of time. Higher level papers take much more time to develop. This means that they normally cannot be used as a form of in class assessment. If they are used as in class assessment then it leads to a decrease in the authenticity of the assessment.

Conclusion

Writing is a critical component of the academic experience. Students need to learn how to shape and develop their ideas in print. For teachers, it is important to know at what level the student is capable of writing at in order to support them for further growth.

Analyzing Twitter Data in R

In this post, we will look at analyzing tweets from Twitter using R. Before beginning, if you plan to replicate this on your own, you will need to setup a developer account with Twitter. Below are the steps

Twitter Setup

  1. Go to https://dev.twitter.com/apps
  2. Create a twitter account if you do not already have one
  3. Next, you want to click “create new app”
  4. After entering the requested information be sure to keep the following information for R; consumer key, consumer secret, request token URL, authorize URL, access token URL

The instruction here are primarily for users of Linux. If you are using a windows machine you need to download a cecert.pem file below is the code

download.file(url=‘http://curl.haxx.se/ca/cacert.pem’,destfile=‘/YOUR_LOCATION/cacert.pem’)

You need to save this file where it is appropriate. Below we will begin the analysis by loading the appropriate libraries.

R Setup

library(twitteR);library(ROAuth);library(RCurl);library(tm);library(wordcloud)

Next, we need to use all of the application information we generate when we created the developer account at twitter. We will save the information in objects to use in R. In the example code below “XXXX” is used where you should provide your own information. Sharing this kind of information would allow others to use my twitter developer account. Below is the code

my.key<-"XXXX" #consumer key
my.secret<-"XXXX" #consumer secret
my.accesstoken<-'XXXX' #access token
my.accesssecret<-'XXXX' ##access secret

Some of the information we just stored now needs to be passed to the “OAuthFactory” function of the “ROAuth” package. We will be passing the “my.key” and “my.secret”. We also need to add the request URL, access URL, and auth URL. Below is the code for all this.

cred<-OAuthFactory$new(consumerKey=my.key,consumerSecret=my.secret,requestURL='https://api.twitter/oauth/request_token',
                       accessURL='https://api.twitter/oauth/access_token',authURL='https://api.twitter/oauth/authorize')

If you are a windows user you need to code below for the cacert.pem. You need to use the “cred$handshake(cainfo=”LOCATION OF CACERT.PEM” to complete the setup process. ake sure to save your authentication and then use the “registerTwitterOAuth(cred)” to finish this. For Linux users, the code is below.

setup_twitter_oauth(my.key, my.secret, my.accesstoken, my.accesssecret)

Data Preparation

We can now begin the analysis. We are going to search twitter for the term “Data Science.” We will look for 1,500 of the most recent tweets that contain this term. To do this we will use the “searchTwitter” function. The code is as follows

ds_tweets<-searchTwitter("data science",n=1500)

We know need to some things that are a little more complicated. First, we need to convert our “ds_tweets” object to a dataframe. This is just to save our search so we don’t have to research again. The code below performs this.

ds_tweets.df<-do.call(rbind,lapply(ds_tweets,as.data.frame))

Second, we need to find all the text in our “ds_tweets” object and convert this into a list. We will use the “sapply” function along with a “getText” Below is the code

ds_tweets.list<-sapply(ds_tweets,function(x) x$getText())

Third, we need to turn our “ds_tweets.list” into a corpus.

ds_tweets.corpus<-Corpus(VectorSource(ds_tweets.list))  

Now we need to do a lot of cleaning of the text. In particular, we need to make all words lower case remove punctuation Get rid of funny characters (i.e. #,/, etc) remove stopwords (words that lack meaning such as “the”)

To do this we need to use a combination of functions in the “tm” package as well as some personally made functions

ds_tweets.corpus<-tm_map(ds_tweets.corpus,removePunctuation)
removeSpecialChars <- function(x) gsub("[^a-zA-Z0-9 ]","",x)#remove garbage terms
ds_tweets.corpus<-tm_map(ds_tweets.corpus,removeSpecialChars) #application of custom function
ds_tweets.corpus<-tm_map(ds_tweets.corpus,function(x) removeWords(x,stopwords())) #removes stop words
ds_tweets.corpus<-tm_map(ds_tweets.corpus,tolower)

Data Analysis

We can make a word cloud for fun now

wordcloud(ds_tweets.corpus)
1.png

We now need to convert our corpus to a matrix for further analysis. In addition, we need to remove sparse terms as this reduces the size of the matrix without losing much information. The value to set it to is at the discretion of the researcher. Below is the code

ds_tweets.tdm<-TermDocumentMatrix(ds_tweets.corpus)
ds_tweets.tdm<-removeSparseTerms(ds_tweets.tdm,sparse = .8)#remove sparse terms

We’ve looked at how to find the most frequent terms in another post. Below is the code for the 15 most common words

findFreqTerms(ds_tweets.tdm,15)
##  [1] "datasto"      "demonstrates" "download"     "executed"    
##  [5] "hard"         "key"          "leaka"        "locally"     
##  [9] "memory"       "mitchellvii"  "now"          "portable"    
## [13] "science"      "similarly"    "data"

Below are words that are highly correlated with the term “key”.

findAssocs(ds_tweets.tdm,'key',.95)
## $key
## demonstrates     download     executed        leaka      locally 
##         0.99         0.99         0.99         0.99         0.99 
##       memory      datasto         hard  mitchellvii     portable 
##         0.99         0.98         0.98         0.98         0.98 
##    similarly 
##         0.98

For the final trick, we will make a hierarchical agglomerative cluster. This will clump words that are more similar next to each other. We first need to convert our current “ds_tweets.tdm” into a regular matrix. Then we need to scale it because the distances need to be standardized. Below is the code.

ds_tweets.mat<-as.matrix(ds_tweets.tdm)
ds_tweets.mat.scale<-scale(ds_tweets.mat)

Now, we need to calculate the distance statistically

ds_tweets.dist<-dist(ds_tweets.mat.scale,method = 'euclidean')

At last, we can make the clusters,

ds_tweets.fit<-hclust(ds_tweets.dist,method = 'ward')
plot(ds_tweets.fit)

1

Looking at the chart, it appears we have six main clusters we can highlight them using the code below

plot(ds_tweets.fit)
groups<-cutree(ds_tweets.fit,k=6)
rect.hclust(ds_tweets.fit,k=6)

1.png

Conclusion

This post provided an example of how to pull data from twitter for text analysis. There are many steps but also some useful insights can be gained from this sort of research.

Review of “First Encyclopedia of the Human Body”

The First Encyclopedia of the Human Body (First Encyclopedias)by Fiona Chandler (pp. 64) provides insights into science for young children.

The Summary
This book explains all of the major functions of the human body as well as some aspects of health and hygiene. Students will learn about the brain, heart, hormones, where babies come from, as well as healthy eating and visiting the doctor.

The Good
This book is surprisingly well-written. The author was able to take the complexities of
the human body and word them in a way that a child can
understand. In addition, the illustrations are rich and interesting. For example, there are pictures of an infare-red scan of a child’s hands, x-rays of broken bones, as well as
pictures of people doing things with their bodies such as running or jumping.

There is also a good mix of small and large photos which allows this book to be used individually or for whole class reading. The large size of the text also allows for younger readers to appreciate not only the pictures but also the reading.

There are also several activities in the book at different places. For example, students are invited to take their pulse, determine how much air is in their lungs, as well as an activity for testing your sense of touch.

In every section of the book, there are links to online activities as well. It seems as though this book has every angle covered in terms of learning.

The Bad
There is little to criticize in this book. It’s a really fun text. Perhaps if you are an expert in the human body you may find things that are disappointing. However, for a layman called to teach young people science, this text is more than adequate.

The Recommendation
I would give this book 5/5 stars. My students loved it and I was able to use it in so many different ways to build activities and discussions. I am sure that the use of this book would be beneficial to almost any teacher in any classroom

Reading Assessment at the Interactive and Extensive Level

In reading assessment, the interactive and extensive level are the highest levels of reading. This post will provide examples of assessments at each of these two levels.

Interactive Level

Reading at this level is focused on both form and meaning of the text with an emphasis on top-down processing. Below are some assessment examples

Cloze

Cloze assessment involves removing certain words from a paragraph and expecting the student to supply them. The criteria for removal is every nth word aka fixed-ratio or removing words with meaning aka rational deletion.

In terms of marking, you have the choice of marking based on the student providing the exact wording or an appropriate wording. The exact wording is strict but consistent will appropriate wording can be subjective.

Read and Answer the Question

This is perhaps the most common form of assessment of reading. The student simply reads a passage and then answer questions such as T/F, multiple choice, or some other format.

Information Transfer

Information transfer involves the students interpreting something. For example, they may be asked to interpret a graph and answer some questions. They may also be asked to elaborate on the graph, make predictions, or explain. Explaining a visual is a common requirement for the IELTS.

Extensive Level

This level involves the highest level of reading. It is strictly top-down and requires the ability to see the “big picture” within a text. Marking at this level is almost always subjective.

Summarize and React

Summarizing and reacting requires the student to be able to read a large amount of information, share the main ideas, and then providing their own opinion on the topic. This is difficult as the student must understand the text to a certain extent and then form an opinion about what they understand.

I like to also have my students write several questions they have about the text This teaches them to identify what they do not know. These questions are then shared in class so that they can be discussed.

For marking purposes, you can provide directions about a number of words, paragraphs, etc. to provide guidance. However, marking at this level of reading is still subjective. The primary purpose of marking should probably be evidence that the student read the text.

Conclusion

The interactive and extensive level of reading is when teaching can become enjoyable. Students have moved beyond just learning to read to reading to learn. This opens up many possibilies in terms of learning experiences.

Reading Assessment at the Perceptual and Selective Level

This post will provide examples of assessments that can be used for reading at the perceptual and selective level.

Perceptual Level

The perceptual level is focused on bottom-up processing of text. Comprehension ability is not critical at this point. Rather, you are just determining if the student can accomplish the mechanical process of reading.

Examples

Reading Aloud-How this works is probably obvious to most teachers. The students read a text out loud in the presence of an assessor.

Picture-Cued-Students are shown a picture. At the bottom of the picture are words. The students read the word and point to a visual example of it in the picture. For example, if the picture has a cat in it. At the bottom of the picture would be the word cat. The student would read the word cat and point to the actual cat in the picture.

This can be extended by using sentences instead of words. For example, if the actual picture shows a man driving a car. There may be a sentence at the bottom of the picture that says “a man is driving a car”. The student would then point to the man in the actual picture who is driving.

Another option is T/F statements. Using our cat example from above. We might write that “There is one cat in the picture” the student would then select T/F.

Other Examples-These includes multiple-choice and written short answer.

Selective Level

The selective level is the next above perceptual. At this level, the student should be able to recognize various aspects of grammar.

Examples

Editing Task-Students are given a reading passage and are asked to fix the grammar. This can happen many different ways. They could be asked to pick the incorrect word in a sentence or to add or remove punctuation.

Pictured-Cued Task-This task appeared at the perceptual level. Now it is more complicated. For example, the students might be required to read statements and label a diagram appropriately, such as the human body or aspects of geography.

Gap-Filling Task-Students read a sentence and complete it appropriately

Other Examples-Includes multiple-choice and matching. The multiple-choice may focus on grammar, vocabulary, etc. Matching attempts to assess a students ability to pair similar items.

Conclusion

Reading assessment can take many forms. The examples here provide ways to deal with this for students who are still highly immature in their reading abilities. As fluency develops more complex measures can be used to determine a students reading capability.

Review of “See How It’s Made”

This is a review of the book See How It’s Made written by Penny Smith and Lorrie Mack.

The Summary

This book takes several everyday products such as ice cream, CDs, t-shirts, crayons, etc. and illustrates the process of how the item is made. The authors take you into the factory where these products are produced and shows you through the use of photographs how each item is made. It can be surprising even for teachers to learn how much work goes into making CDs or apple juice.

The Good

The photo rich environment of the text makes it as realistic as possible. In addition, choosing common everyday items really helps in relevancy for students. Many kids find it interesting to know how pencils and crayons are made. The book is truly engaging at least in a one-on-one situation.

The Bad

The text is small in this book. This would make reading it difficult for younger students. In addition, although I appreciate the photos there are so many jammed onto a single page that it would be difficult to share this book with an entire class. This leaves the book for use only in the class library for individual students. Lastly, kids learn a lot of

Lastly, kids learn a lot of relevant interesting things but there seems to be no overall point to the text. It just a collection of different processes for making things. It is left to the teacher to come up with a reason for reading this

The Recommendation

THis book is 3/5 stars. It’s a great text in terms of the visual stimulus but it can be difficult to read and lacks a sense of direction.

Types of Reading in ESL

Reading for comprehension involves two forms of processing which are bottom-up and top-down. Bottom-up processing involves pulling letters together to make words, words to make sentences, etc. This is most commonly seen as students sounding out words when they read. The goal is primarily to just read the word.

Top-down processing is the use of prior knowledge, usually organized as schemas in the mind to understand what is being read. For example, after a student reads the word “cat” using bottom-up processing they then use top-down processing of what they know about cats such as their appearance, diet, habits, etc.

These two processes work together in order for us to read. Generally, they happen simultaneously as we are frequently reading and using our background knowledge to understand what we are reading.

In the context of reading, there are four types of reading from simplest to most complex and they are

  • Perceptive
  • Selective
  • Interactive
  • Extensive

We will now look at each in detail

Perceptive

Perceptive reading is focused primarily on bottom-processing. In other words, if a teacher is trying to assess this type of reading they simply want to know if the student can read or not. The ability to understand or comprehend the text is not the primary goal at this.

Selective

Selective reading involves looking a reader’s ability to recognize grammar, discourse features, etc. This is done with brief paragraphs and short reading passages. Assessment involves standard assessment items such as multiple-choice, short answer, true/false, etc.

In order to be successful at this level, the student needs to use both bottom-up and top-down processing.  Charts and graphs can also be employed

Interactive

Interactive reading involves deriving meaning from the text. This places even more emphasis on top-down processing. Readings are often chosen from genres that employ implied main ideas rather than stated. The readings are also more authentic in nature and can include announcements, directions, recipes, etc.

Students who lack background knowledge will struggle with this type of reading regardless of their language ability. In addition, inability to think critically will impair performance even if the student can read the text.

Extensive

Extensive is reading large amounts of information and being able to understand the “big picture”. The student needs to be able to separate the details from the main ideas. Many students struggle with this in their native language. As such, this is even more difficult when students are trying to digest large amounts of information in a second language.

Conclusion

Reading is a combination of making sense of the words and using prior knowledge to comprehend text. The levels of reading vary in their difficulty. In order to have success at reading, students need to be exposed to many different experiences in order to have the background knowledge they need that they can call on when reading something new.

Diversity and Lexical Dispersion Analysis in R

In this post, we will learn how to conduct a diversity and lexical dispersion analysis in R. Diversity analysis is a measure of the breadth of an author’s vocabulary in a text. Are provides several calculations of this in their output

Lexical dispersion is used for knowing where or when certain words are used in a text. This is useful for identifying patterns if this is a goal of your data exploration.

We will conduct our two analysis by comparing two famous philosophical texts

  • Analects
  • The Prince

These books are available at the Gutenberg Project. You can go to the site type in the titles and download them to your computer.

We will use the “qdap” package in order to complete the sentiment analysis. Below is some initial code.

library(qdap)

Data Preparation

Below are the steps we need to take to prepare the data

  1. Paste the text files into R
  2. Convert the text files to ASCII format
  3. Convert the ASCII format to data frames
  4. Split the sentences in the data frame
  5. Add a variable that indicates the book name
  6. Combine the two books into one dataframe

We now need to prepare the three text. First, we move them into R using the “paste” function.

analects<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Analects.txt",what='character'),collapse=" ")
prince<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Prince.txt",what='character'),collapse=" ")

We must convert the text files to ASCII format see that R is able to interpret them.

analects<-iconv(analects,"latin1","ASCII","")
prince<-iconv(prince,"latin1","ASCII","")

For each book, we need to make a dataframe. The argument “texts” gives our dataframe one variable called “texts” which contains all the words in each book. Below is the code
data frame

analects<-data.frame(texts=analects)
prince<-data.frame(texts=prince)

With the dataframes completed. We can now split the variable “texts” in each dataframe by sentence. The “sentSplit” function will do this.

analects<-sentSplit(analects,'texts')
prince<-sentSplit(prince,'texts')

Next, we add the variable “book” to each dataframe. What this does is that for each row or sentence in the dataframe the “book” variable will tell you which book the sentence came from. This will be valuable for comparative purposes.

analects$book<-"analects"
prince$book<-"prince"

Now we combine the two books into one dataframe. The data preparation is now complete.

twobooks<-rbind(analects,prince)

Data Analysis

We will begin with the diversity analysis

div<-diversity(twobooks$texts,twobooks$book)
div
           book wc simpson shannon collision berger_parker brillouin
1 analects 30995    0.989   6.106   4.480     0.067         5.944
2 prince   52105    0.989   6.324   4.531     0.059         6.177

For most of the metrics, the diversity in the use of vocabulary is the same despite being different books from different eras in history. How these numbers are calculated is beyond the scope of this post.

Next, we will calculate the lexical dispersion of the two books. Will look at three common themes in history money, war, and marriage.

dispersion_plot(twobooks$texts,grouping.var=twobooks$book,c("money","war",'marriage'))

1

The tick marks show when each word appears. For example, money appears at the beginning of Analects only but is more spread out in tThe PRince. War is evenly dispersed in both books and marriage only appears in The Prince

Conclusion

This analysis showed additional tools that can be used to analyze text in R.

Review of “The Great Wall of China”

In this post, we will look at another book that I have used as a K-12 teacher The Great Wall Of China (Aladdin Picture Books) by Leonard Everett Fisher (pp 31).

The Summary

The title clearly lets you know what the book is about. It focuses on Ch’in Shih Huang Ti and his quest to build a wall that would protect his empire from the Mongols. According to the text, Ch’in Shih Huang Ti was the first supreme emperor of China as he conquered several other small kingdoms to make what we now know as China.

The book depicts how the Mongols were coming and burning down border villages in China and how the Emperor plans and builds the wall. Men were dragged from their families to go and work on the wall. The Emperor even sent his oldest son and crown prince to help build the wall.

The project was a combination of building a new wall while also restoring walls that were in disrepair. Workers who complained or ran away were buried alive. It took a total of ten years to complete what is now called the Great Wall of China.

The Good 

My favorite aspect of the book is the iconic black and white drawings depicting ancient China. The stern look of the Emperor and the soldiers remains of the toughness of characters old western movies. Nobody smiles in the book until the last page when the Emperor is rejoicing over the completion of the Wall.

THe book doesn’t include a lot of text. Rather, the pictures do the majority of the storying telling. The pictures are large enough that you can use this book for a whole-class reading experience where the kids sit around you as you read the text and show them the pictures.

THe author also did an excellent job of simplifying the complexity of the building of the Great Wall into a few pages for young children. For example, there is much more to the Emperor’s son being sent to help build the Great Wall. However, the author reduces this complex problem down to the accusation that the Emperor thought his son was a “whiner.”

The Bad

I can’t say there is bad in this text as it depends on what your purpose is for buying the book. There is not a lot of text in the book as it is primarily picture-based. If you want your students to read on their own there is not a lot to read. For those of us who have a background in Chinese history, the text may be oversimplified.

The Recommendation

I would give this book 4.5/5 stars. Whether for your library or for sharing with your entire class this book will provide a great learning experience about a part of history that is normally not studied as much as it should be.

Assessing Speaking in ESL

In this post, we will look at different activities that can be used to assess a language learner’s speaking ability, Unfortunately, will not go over how to mark or give a grade for the activities we will only provide examples.

Directed Response

In this activity, the teacher tries to have the student use a particular grammatical form by having the student modify something the teacher says. Below is an example.

Teacher: Tell me he went home
Student: He went home

This is obviously not deep. However, the student had to know to remove the words “tell me” from the sentence and they also had to know that they needed to repeat what the teacher said. As such, this is an appropriate form of assessment for beginning students.

Read Aloud

Read aloud is simply having the student read a passage verbatim out loud. Normally, the teacher will assess such things as pronunciation and fluency. There are several problems with this approach. First, reading aloud is not authentic as this is not an in demand skill in today’s workplace. Second, it blends reading with speaking which can be a problem if you do not want to assess both at the same time.

Oral Questionnaires 

Students are expected to respond and or complete sentences. Normally, there is some sort of setting such as a mall, school, or bank that provides the context or pragmatics. below is an example in which a student has to respond to a bank teller. The blank lines indicate where the student would speak.

Teacher (as bank teller): Would you like to open an account?
Student:_______________________
Teacher (as bank teller): How much would you like to deposit?
Student:___________________________

Visual Cues

Visual cues are highly opened. For example, you can give the students a map and ask them to give you directions to a location on the map. In addition, students can describe things in the picture or point to things as you ask them too. You can also ask the students to make inferences about what is happening in a picture. Of course, all of these choices are highly difficult to provide a grade for and may be best suited for formative assessment.

Translation

Translating can be a highly appropriate skill to develop in many contexts. In order to assess this, the teacher provides a word, phrase, or perhaps something more complicated such as directly translating their speech. The student then Takes the input and reproduces it in the second language.

This is tricky to do. For one, it is required to be done on the spot, which is challenging for anybody. In addition, this also requires the teacher to have some mastery of the student’s mother tongue, which for many is not possible.

Other Forms

There are many more examples that cannot be covered here. Examples include interviews, role play, and presentations. However, these are much more common forms of speaking assessment so for most they are already familiar with these.

Conclusion

Speaking assessment is a major component of the ESL teaching experience. The ideas presented here will hopefully provide some additionals ways that this can be done.

Readability and Formality Analysis in R

In this post, we will look at how to assess the readability and formality of a text using R. By readability, we mean the use of a formula that will provide us with the grade level at which the text is roughly written. This is highly useful information in the field of education and even medicine.

Formality provides insights into how the text relates to the reader. The more formal the writing the greater the distance between author and reader. Formal words are nouns, adjectives, prepositions, and articles while informal (contextual) words are pronouns, verbs, adverbs, and interjections.

The F-measure counts and calculates a score of formality based on the proportions of the formal and informal words.

We will conduct our two analysis by comparing two famous philosophical texts

  • Analects
  • The Prince

These books are available at the Gutenberg Project. You can go to the site type in the titles and download them to your computer.

We will use the “qdap” package in order to complete the sentiment analysis. Below is some initial code.

library(qdap)

Data Preparation

Below are the steps we need to take to prepare the data

  1. Paste the text files into R
  2. Convert the text files to ASCII format
  3. Convert the ASCII format to data frames
  4. Split the sentences in the data frame
  5. Add a variable that indicates the book name
  6. Combine the two books into one dataframe

We now need to prepare the two text. The “paste” function will move the text into the R environment.

analects<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Analects.txt",what='character'),collapse=" ")
prince<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Prince.txt",what='character'),collapse=" ")

The text need to be converted to the ASCII format and the code below does this.

analects<-iconv(analects,"latin1","ASCII","")
prince<-iconv(prince,"latin1","ASCII","")

For each book, we need to make a dataframe. The argument “texts” gives our dataframe one variable called “texts” which contains all the words in each book. Below is the code data frame

analects<-data.frame(texts=analects)
prince<-data.frame(texts=prince)

With the dataframes completed. We can now split the variable “texts” in each dataframe by sentence. The “sentSplit” function will do this.

analects<-sentSplit(analects,'texts')
prince<-sentSplit(prince,'texts')

Next, we add the variable “book” to each dataframe. What this does is that for each row or sentence in the dataframe the “book” variable will tell you which book the sentence came from. This will be useful for comparative purposes.

analects$book<-"analects"
prince$book<-"prince"

Lastly, we combine the two books into one dataframe. The data preparation is now complete.

twobooks<-rbind(analects,prince)

Data Analysis

We will begin with the readability. The “automated_readbility_index” function will calculate this for us.

ari<-automated_readability_index(twobooks$texts,twobooks$book)
ari
##       book word.count sentence.count character.count Automated_Readability_Index
## 1 analects      30995           3425          132981                       3.303
## 2   prince      52105           1542          236605                      16.853

Analects is written on a third-grade level but The Prince is written at grade 16. This is equivalent to a Senior in college. As such, The Prince is a challenging book to read.

Next we will calcualte the formality of the two books. The “formality” function is used for this.

form<-formality(twobooks$texts,twobooks$book)
form
##       book word.count formality
## 1   prince      52181     60.02
## 2 analects      31056     58.36

The books are mildly formal. The code below gives you the break down of the word use by percentage.

form$form.prop.by
##       book word.count  noun  adj  prep articles pronoun  verb adverb
## 1 analects      31056 25.05 8.63 14.23     8.49   10.84 22.92   5.86
## 2   prince      52181 21.51 9.89 18.42     7.59   10.69 20.74   5.94
##   interj other
## 1   0.05  3.93
## 2   0.00  5.24

The proportions are consistent when the two books are compared. Below is a visual of the table we just examined.

plot(form)

1.png

Conclusion

Readability and formality are additional text mining tools that can provide insights for Data Scientist. Both of these analysis tools can provide suggestions that may be needed in order to enhance communication or compare different authors and writing styles.

Review of “The Usborne Book of World History”

As a teacher, I have used hundreds of books in my career to instruct and guide students. As such, I decided I would share my thoughts on  some of these books to provide other educators with insights on potential instructional materials,

The Summary

“The Usborne Book of World History” covers, as you can probably guess, the history of the world from the beginning of recorded history to the dawn of the 20th century. Early civilizations such as the Sumerians, Egyptians, and Hittites are covered as well as more recent civilizations such as the Greeks, Romans, British Empire. There are mentions of African ad several Asian civilizations such as the Chinese and Japanese.

The Good

This book is rich with illustrations of all of the historical events and cultural topics. For young readers who are visual learners, this is a superb text for such an experience. In the text, there are illustrations of fighting between the Canaanites and the Philistines, an Assyrian king fighting a lion, life in the city of ancient Athens, and even depictions of settlers moving out west in what would later become the United States. This is completely a visually based learning experience.

The Bad

The focus on illustrations is a strength but also a weakness depending on your goals. The students can get so obsessed with the pictures that they never actually read the text in the book. This can be a problem if you are trying to get your students to develop their reading skills. In addition, The reading level of the text is probably at the 4th-5th-grade level which is beyond younger student.

This book is also not appropriate for a whole class read aloud because several illustrations are crammed onto one page. This would make it challenging for several students to all see what the teacher was talking about at the same time, which could quickly lead to behavioral problems.

Lastly, the book is somewhat detailed oriented in that it provides a bunch of little facts about each Kingdom or historical period. If you want the students to see the big picture you have to trace the themes of history yourself as there seem to be no pedagogical aids beyond the rich illustrations.

The Recommendation

The Usborne Book of World History absolutely deserves 4/5 stars. Buy it and put in your library as an opportunity for your students to “see” history rather than read about it. If you need something for whole-class instruction you better keep looking.

Homeschooling Bilingual Children

A colleague of mine has kids that are half Thai and half African (like Tiger Woods). In the home, both Thai and English are spoken frequently. A major problem with bilingual children is that one of the languages is never truly mastered. This is called semilingualism. The problem was not with the kids learning Thai because their mother was Thai. Instead, my colleague was worried about his kids developing broken poorly understood Pidgin English.

About 8 years ago there was another family whose children were half Thai and half American and they had faced the same problem. However, they overreacted and never spoke Thai in their home in order to make sure their children learned English. This led to the kids knowing only English even though they were half-Thai and lived in Thailand. My friend did not want to make this mistake.

What He Did

I suggested to my friend that he needed to set some sort of schedule in which time was set aside in the home for the use of both languages. Below is the schedule that he developed.

  • Monday – Friday from waking up until 2 pm Thai language
  • Monday-Friday 2 pm to bedtime English language
  • Weekends-English only
  • Exceptions-Home school curriculum is in English with the exception of Thai language

This has worked relatively well. The children are exposed to both languages each day for several hours at a time. Generally, the rule is when dad is home English is used.

To further support the acquisition of English I encouraged my friend to never speak any Thai to his children. This has stunted his development in the language but it’s more important that they learn than him.

For the oldest daughter who is home schooled, Dan and his wife taught her to read and write in Thai and English at the same time. Many language experts would disagree with this and suggest that it is better to learn one language first and to transfer those skills to learning a second language. I see their point but my friend wanted his daughter to have native fluency in both languages to the point that if she is having a dream both languages could be used without a problem so to speak.

Challenges

With bilingual children, all language goals are delayed. This is because the child has to acquire double the vocabulary of a monolingual child. My friend’s daughter didn’t really talk until she was three. However, by five things start to move at a normal pace with some “problems”

  • Word order is sometimes wrong. ie my friend’s daughter will use Thai syntax in English and vice versa.
  • Mixing of the two languages at times (code-switching)

Most kids grow out of this.

Conclusion

Raising bilingual children requires finding a balance between the two languages in the home. I have provided one example but I would like to know how you have dealt with this with your children.

Sentiment Analysis in R

In this post, we will perform a sentiment analysis in R. Sentiment analysis involves employs the use of dictionaries to give each word in a sentence a score. A more positive word is given a higher positive number while a more negative word is given a more negative number. The score is the calculated based on the position of the word, the weight, as well as other more complex factors. This is then performed for the entire corpus to give it a score.

We will do a sentiment analysis in which we will compare three famous philosophical texts

  • Analects
  • The Prince
  • Penesees

These books are available at the Gutenberg Project. You can go to the site type in the titles and download them to your computer.

We will use the “qdap” package in order to complete the sentiment analysis. Below is some initial code.

library(qdap)

Data Preparation

Below are the steps we need to take to prepare the data

  1. Paste the text files into R
  2. Convert the text files to ASCII format
  3. Convert the ASCII format to data frames
  4. Split the sentences in the data frame
  5. Add a variable that indicates the book name
  6. Combine the three books into one dataframe

We now need to prepare the three text. First, we move them into R using the “paste” function.

analects<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Analects.txt",what='character'),collapse=" ")
pensees<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Pascal.txt",what='character'),collapse=" ")
prince<-paste(scan(file ="C:/Users/darrin/Documents/R/R working directory/blog/blog/Text/Prince.txt",what='character'),collapse=" ")

We need to convert the text files to ASCII format see that R is able to read them.

analects<-iconv(analects,"latin1","ASCII","")
pensees<-iconv(pensees,"latin1","ASCII","")
prince<-iconv(prince,"latin1","ASCII","")

Now we make our dataframe for each book. The argument “texts” gives our dataframe one variable called “texts” which contains all the words in each book. Below is the code data frame

analects<-data.frame(texts=analects)
pensees<-data.frame(texts=pensees)
prince<-data.frame(texts=prince)

With the dataframes completed. We can now split the variable “texts” in each dataframe by sentence. We will use the “sentSplit” function to do this.

analects<-sentSplit(analects,'texts')
pensees<-sentSplit(pensees,'texts')
prince<-sentSplit(prince,'texts')

Next, we add the variable “book” to each dataframe. What this does is that for each row or sentence in the dataframe the “book” variable will tell you which book the sentence came from. This will be valuable for comparative purposes.

analects$book<-"analects"
pensees$book<-"pensees"
prince$book<-"prince"

Now we combine all three books into one dataframe. The data preparation is now complete.

threebooks<-rbind(analects,pensees,prince)

Data Analysis

We are now ready to perform the actual sentiment analysis. We will use the “polarity” function for this. Inside the function, we need to use the text and the book variables. Below is the code. polarity analysis

pol<-polarity(threebooks$texts,threebooks$book)

We can see the results and a plot in the code below.

pol
##       book total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
## 1 analects            3425       31383        0.076       0.254              0.299
## 2  pensees            7617      101043        0.008       0.278              0.028
## 3   prince            1542       52281        0.017       0.296              0.056

The table is mostly self-explanatory. We have the total number of sentences and words in the first two columns. Next is the average polarity and the standard deviation. Lastly, we have the standardized mean. The last column is commonly used for comparison purposes. As such, it appears that Analects is the most positive book by a large margin with Pensees and Prince be about the same and generally neutral.

plot(pol)

1.png

The top plot shows the polarity of each sentence over time or through the book. The bluer the more negative and the redder the more positive the sentence. The second plot shows the dispersion of the polarity.

There are many things to interpret from the second plot. For example, Pensees is more dispersed than the other two books in terms of polarity. The Prince is much less dispersed in comparison to the other books.

Another interesting task is to find the most negative and positive sentence. We need to take information from the “pol” dataframe and then use the “which.min” function to find the lowest scoring. The “which.min” function only gives the row. Therefore, we need to take this information and use it to find the actual sentence and the book. Below is the code.

pol.df<-pol$all #take polarity scores from pol.df
which.min(pol.df$polarity) #find the lowest scored sentence
## [1] 6343
pol.df$text.var[6343] #find the actual sentence
## [1] "Apart from Him there is but vice, misery, darkness, death, despair."
pol.df$book[6343] #find the actual book name
## [1] "pensees"

Pensees had the most negative sentence. You can see for yourself the clearly negative words which are vice, misery, darkness, death, and despair. We can repeat this for the most positive sentence

which.max(pol.df$polarity)
## [1] 4839
pol.df$text.var[4839]
## [1] "You will be faithful, honest, humble, grateful, generous, a sincere friend, truthful."
pol.df$book[4839]
## [1] "pensees"

Again Pensees has the most positive sentence with such words as faithful, honest, humble, grateful, generous, sincere, friend, truthful all being positive.

Conclusion

Sentiment analysis allows for the efficient analysis of a large body of text in a highly qualitative manner. There are weaknesses to this approach such as the dictionary used to classify the words can affect the results. In addition, Sentiment analysis only looks at individual sentences and not larger contextual circumstances such as a paragraph. As such, a sentiment analysis provides descriptive insights and not generalizations.

Struggles with Early Childhood Education

I had a friend (Dan) share his experience with me of home schooling his oldest daughter (Jina) and the challenges he faced as he tried to start her education too early in his opinion. He began homeschooling his oldest daughter when she was about four years of age. His goals for the 1st year was simply for his daughter

  • to learn to count to 10
  • to recognize the letters of the alphabet

That was all he wanted for the first year of instruction. Dan friend knew Jina was young, perhaps too young, so he did not want to push it. He just wanted to develop a rhythm of learning and instruction in the family along with the two goals above. In addition, his family was one of only two families who home school their kids in his community and he wanted to make sure his daughter was always on par academically with the other children in the neighborhood as a witness to the benefits of homeschooling.

Yet, a strange thing happened. Both academic goals were achieved in less than four months. Now Jina was getting bored with school already. This meant that Dan now had to raise the level of complexity with more goals

  • recognize numbers
  • Know the sounds of all the letters of the alphabet

By the end of the first year (age 5 now), without any pressure, and by going at her own pace my friend’s daughter could read simple words, count objects, recognize numbers, do simple addition, subtraction, and had the rudiments of telling time. However, near the end of the first year of learning some strange things began to happen.

  • One day Jina would complete a task with no problems but the next day she could not seem to remember the slightest way how to do it. She seemed to inadvertently lose motivation for no reason.
  • Some concepts (telling time) never stuck no matter how many times it was taught and review.
  • She was inconsistent in her ability to recognize words and seemed to lack any ability to generalize concepts (transfer) to other settings. For example, realizing that ‘cap’, ‘snap’, ‘lap’, all end with the -ap ending.

When she turned five, Dan and his wife formally started Jina in an official home school curriculum rather than the ad-hoc stuff they did for the first year. Jina now had the ability to do 1st-grade work thanks to her parents prior teaching. Old struggles subsided and new ones appeared. Unlike the ad-hoc curriculum, the formal home school curriculum had weekly lesson plans and Dan was determined to stick to the “schedule.”

Why the Struggle

Dan still wondered what the problem was. Jina was progressing but it was a chore and I couldn’t understand why. Isn’t it good to start kids in school early? That’s when he asked me.

I explained to him some of the basics of Piaget’s theory of cognitive development. This is not just any theory. Piaget’s ideas are taught to almost all undergrad education majors on the planet.

Piaget proposes that there are four stages of cognitive development

  1. Sensorimotor (0-2 years)-Learning only through senses
  2. Preoperational (2-7)-Symbolic thinking and pretend play
  3. Concrete Operational (7-11)-Ideas applied to literally objects, understand time and quantity.
  4. Formal Operations (12-adult)-Abstract thinking, logic, transfer possible.

Dan was teaching his daughter all of these abstract ideas (counting, reading, telling time, etc) when she was at a preoperational level cognitively.

Reading is a highly abstract experience. Letters on a page have a sound attached to them and these letters can be combined to make words etc.? This is astounding for a child and their minds will struggle with this if they are not ready. Numbers on a page represent an actual amount in the real world? This is another astounding breakthrough for a young child. Dan was teaching his daughter to tell time when she had no idea what time was! He was frustrated when she could not transfer knowledge to new settings when this is normally not possible until they are 11 years or older.

If a child is not developmentally ready for these complex ideas they will struggle with school. If Piaget’s theory is correct (and not everyone agrees), formal schooling should not begin until age 7 for most children. What is meant by formal schooling is the study of math and reading. They should begin learning math and reading at 7. However, traditionally, students have been studying these subjects for several years by the age of 7.

This is not a totally radical idea. Many parents are delaying the enrollment of their child in kindergarten by a year in order to allow them to develop more. The term for this is redshirting

What He Did

By the time I had spoken with Dan Jina was six years old and already in second grade. She was doing better but now Dan and his wife worried about burnout.  He did not want to stop her studies completely because stopping now would mean having to fight with her to begin again. I suggested that they decided to slow down the instruction. Now they complete a weekly lesson plan over two weeks instead of one. This helps to minimize the damage that has taken place while still maintaining a structure of learning in the home. Unfortunately, Jina is learning multiplications when she should be learning to count.

Conclusion

I can say that there is evidence that early education is not best for children. If Piaget is correct a child under 7 is not ready for rigorous study and should be allowed more hands on experiences rather than abstract ones. Of course, there are exceptions but generally, you can start too early but it is difficult to start too late. If a child starts too early they will be in a constant state of struggling. All children are different but I think that parents should be aware that waiting is an option when it comes to formal instruction and one benefit of home schooling is the ability to have authority over your child’s education.

Types of Speaking in ESL

In the context of ESL teaching, ~there are at least five types of speaking that take place in the classroom. This post will define and provide examples of each. The five types are as follows…

  • Imitative
  • Intensive
  • Responsive
  • Interactive
  • Extensive

The list above is ordered from simplest to most complex in terms of the requirements of oral production for the student.

Imitative

At the imitative level, it is probably already clear what the student is trying to do. At this level, the student is simply trying to repeat what was said to them in a way that is understandable and with some adherence to pronunciation as defined by the teacher.

It doesn’t matter if the student comprehends what they are saying or carrying on a conversation. The goal is only to reproduce what was said to them. One common example of this is a “repeat after me” experience in the classroom.

Intensive

Intensive speaking involves producing a limit amount of language in a highly control context. An example of this would be to read aloud a passage or give a direct response to a simple question.

Competency at this level is shown through achieving certain grammatical or lexical mastery. This depends on the teacher’s expectations.

Responsive

Responsive is slightly more complex than intensive but the difference is blurry, to say the least. At this level, the dialog includes a simple question with a follow-up question or two. Conversations take place by this point but are simple in content.

Interactive

The unique feature of intensive speaking is that it is usually more interpersonal than transactional. By interpersonal it is meant speaking for maintaining relationships. Transactional speaking is for sharing information as is common at the responsive level.

The challenge of interpersonal speaking is the context or pragmatics The speaker has to keep in mind the use of slang, humor, ellipsis, etc. when attempting to communicate. This is much more complex than saying yes or no or giving directions to the bathroom in a second language.

Extensive

Extensive communication is normal some sort of monolog. Examples include speech, story-telling, etc. This involves a great deal of preparation and is not typically improvisational communication.

It is one thing to survive having a conversation with someone in a second language. You can rely on each other’s body language to make up for communication challenges. However, with extensive communication either the student can speak in a comprehensible way without relying on feedback or they cannot. In my personal experience, the typical ESL student cannot do this in a convincing manner.

Visualizing Clustered Data in R

In this post, we will look at how to visualize multivariate clustered data. We will use the “Hitters” dataset from the “ISLR” package. We will use the features of the various baseball players as the dimensions for the clustering. Below is the initial code

library(ISLR);library(cluster)
data("Hitters")
str(Hitters)
## 'data.frame':    322 obs. of  20 variables:
##  $ AtBat    : int  293 315 479 496 321 594 185 298 323 401 ...
##  $ Hits     : int  66 81 130 141 87 169 37 73 81 92 ...
##  $ HmRun    : int  1 7 18 20 10 4 1 0 6 17 ...
##  $ Runs     : int  30 24 66 65 39 74 23 24 26 49 ...
##  $ RBI      : int  29 38 72 78 42 51 8 24 32 66 ...
##  $ Walks    : int  14 39 76 37 30 35 21 7 8 65 ...
##  $ Years    : int  1 14 3 11 2 11 2 3 2 13 ...
##  $ CAtBat   : int  293 3449 1624 5628 396 4408 214 509 341 5206 ...
##  $ CHits    : int  66 835 457 1575 101 1133 42 108 86 1332 ...
##  $ CHmRun   : int  1 69 63 225 12 19 1 0 6 253 ...
##  $ CRuns    : int  30 321 224 828 48 501 30 41 32 784 ...
##  $ CRBI     : int  29 414 266 838 46 336 9 37 34 890 ...
##  $ CWalks   : int  14 375 263 354 33 194 24 12 8 866 ...
##  $ League   : Factor w/ 2 levels "A","N": 1 2 1 2 2 1 2 1 2 1 ...
##  $ Division : Factor w/ 2 levels "E","W": 1 2 2 1 1 2 1 2 2 1 ...
##  $ PutOuts  : int  446 632 880 200 805 282 76 121 143 0 ...
##  $ Assists  : int  33 43 82 11 40 421 127 283 290 0 ...
##  $ Errors   : int  20 10 14 3 4 25 7 9 19 0 ...
##  $ Salary   : num  NA 475 480 500 91.5 750 70 100 75 1100 ...
##  $ NewLeague: Factor w/ 2 levels "A","N": 1 2 1 2 2 1 1 1 2 1 ...

Data Preparation

We need to remove all of the factor variables as the kmeans algorithm cannot support factor variables. In addition, we need to remove the “Salary” variable because it is missing data. Lastly, we need to scale the data because the scaling affects the results of the clustering. The code for all of this is below.

hittersScaled<-scale(Hitters[,c(-14,-15,-19,-20)])

Data Analysis

We will set the k for the kmeans to 3. This can be set to any number and it often requires domain knowledge to determine what is most appropriate. Below is the code

kHitters<-kmeans(hittersScaled,3)

We now look at some descriptive stats. First, we will see how many examples are in each cluster.

table(kHitters$cluster)
## 
##   1   2   3 
## 116 144  62

The groups are mostly balanced. Next, we will look at the mean of each feature by cluster. This will be done with the “aggregate” function. We will use the original data and make a list by the three clusters.

round(aggregate(Hitters[,c(-14,-15,-19,-20)],FUN=mean,by=list(kHitters$cluster)),1)
##   Group.1 AtBat  Hits HmRun Runs  RBI Walks Years CAtBat  CHits CHmRun
## 1       1 522.4 143.4  15.1 73.8 66.0  51.7   5.7 2179.1  597.2   51.3
## 2       2 256.6  64.5   5.5 30.9 28.6  24.3   5.6 1377.1  355.6   24.7
## 3       3 404.9 106.7  14.8 54.6 59.4  48.1  15.1 6480.7 1783.4  207.5
##   CRuns  CRBI CWalks PutOuts Assists Errors
## 1 299.2 256.1  199.7   380.2   181.8   11.7
## 2 170.1 143.6  122.2   209.0    62.4    5.8
## 3 908.5 901.8  694.0   303.7    70.3    6.4

Now we can see some difference. It seems group 3 are young (5.6 years of experience) starters based on the number of at-bats they get. Group 1 is young players who may not get to start due to the lower at-bats the receive. Group 2 is old (15.1 years) players who receive significant playing time and have but together impressive career statistics.

Now we will create our visual of the three clusters. For this, we use the “clusplot” function from the “cluster” package.

clusplot(hittersScaled,kHitters$cluster,color = T,shade = T,labels = 4)

1.png

In general, there is little overlap between the clusters. The overlap between groups 1 and 3 may be due to how they both have a similar amount of experience.

Conclusion

Visualizing the clusters can help with developing insights into the groups found during the analysis. This post provided one example of this.