# Social Dimensions of Language

In sociolinguistics, social dimensions are the characteristics of the context that affect how language is used. Generally, there are four dimensions to the social context that are measured are analyzed through the use of five scales. The four dimension and five scales are as follows.

• Social distance
• Status
• Formality
• Functional (which includes a referential and affective function)

This post will explore each of these four social dimensions of language.

Social Distance

Social distance is an indicator of how well we know someone that we are talking to.  Many languages have different pronouns and even declensions in their verbs based on how well they know someone.

For example, in English, a person might say “what’s up?” to a friend. However, when speaking to a stranger, regardless of the strangers status, a person may say something such as “How are you?”. The only reason for the change in language use is the lack of intimacy with the stranger as compared to the friend.

Status

Status is related to social ranking. The way we speak to peers is different than how we speak to superiors. Friends are called by their first name while a boss, in some cultures, is always referred to by Mr/Mrs or sir/madam.

The rules for status can be confusing. Frequently we will refer to our parents as mom or dad but never Mr/Mrs. Even though Mr/Mrs is a sign of respect it violates the intimacy of the relationship between a parent and child. As such, often parents would be upset if their children called them Mr/Mrs.

Formality

Formality can be seen as the presence or absences of colloquial/slang in a person’s communication. In a highly formal setting, such as a speech, the language will often lack the more earthy style of speaking. Contractions may disappear, idioms may be reduced, etc. However, when spending time with friends at home a more laid-back manner of speaking will emerge

However, when spending time with friends at home a more laid-back manner of speaking will emerge. One’s accent becomes more promeneint, slang terms are permissiable, etc.

Function (Referential & Affective)

Referential is a measure of the amount of information being shared in a discourse. The use of facts, statistics, directions, etc. Affective relates to the emotional content of communication and indicates how someone feels about the topic.

Often referential and affective functions interrelated such as in the following example.

James is a 45 year-old professor of research who has written several books but is still a complete idiot!

This example above shares a lot of information as it shares the person’s name, job, and accomplishments. However, the emotions of the speaker are highly negative towards James as they call James a “complete idiot.”

Conclusion

The social dimensions of language are useful to know in order to understand what is affecting how people communicate. The concepts behind the four dimensions impact how we talk without most us knowing why or how. This can be frustrating but also empowering as people will understand why they adjust to various contexts of language use.

# Journal Writing

A journal is a log that a student uses to record their thoughts about something. This post will provide examples of journals as well as guidelines for using journals in the classroom.

Types of Journals

There are many different types of journals. Normally, all journals have some sort of dialog happening between the student and the teacher. This allows both parties to get to know each other better.

Normally, journals will have a theme or focus. Examples in TESOL would include journals that focus on grammar, learning strategies, language-learning, or recording feelings. Most journals will focus on one of these to the exclusion of the others.

Guidelines for Using Journals

Journals can be useful if they are properly planned. As such, a teacher should consider the following when using journals.

1. Provide purpose-Students need to know why they are writing journals. Most students seem to despise reflection and will initially reject this learning experience
2. Forget grammar-Journals are for writing. Students need to set aside the obsession they have acquired for perfect grammar and focus on developing their thoughts about something. There is a time and place for grammar and that is for summative assessments such as final drafts of research papers.
3. Explain the grading process-Students need to know what they must demonstrate in order to receive adequate credit.
4. Provide feedback-Journals are a dialog. As such, the feedback should encourage and or instruct the students.  The feedback should also be provided consistently at scheduled intervals.

Journals take a lot of time to read and provide feedback too. In addition, the handwriting quality of students can vary radically which means that some students journals are unreadable.

Conclusion

Journaling is an experience that allows students to focus on the process of learning rather than the product. This is often neglected in the school experience. Through journals, students are able to focus on the development of ideas without wasting working memory capacity on grammar and syntax. As such, journals can be a powerful in developing critical thinking skills.

# Cradle Approach to Portfolio Development

Portfolio development is one of many forms of alternative assessment available to teachers. When this approach is used, generally the students collected their work and try to make sense of it through reflection.

It is surprisingly easy for portfolio development to amount to nothing more than archiving work. However, the CRADLE approach was developed by Gottlieb to alleviate potential confusion over this process. CRADLE stands for the following

C ollecting
R eflecting
A ssessing
D ocumenting
L inking
E valuating

Collecting

Collecting is the process in which the students gather materials to include in their portfolio. It is left to the students to decide what to include. However, it is still necessary for the teacher to provide clear guidelines in terms of what can be potentially selected.

Clear guidelines include stating the objectives as well as explaining how the portfolio will be assessed. It is also important to set aside class time for portfolio development.

Some examples of work that can be included in a portfolio include the following.

• tests, quizzes
• compositions
• electronic documents (powerpoints, pdfs, etc)

Reflecting

Reflecting happens through the student thinking about the work they have placed in the portfolio. This can be demonstrated many different ways. Common ways to reflect include the use of journals in which students comment on their work. Another way for young students is the use of checklist.

Another way for young students is the use of a checklist. Students simply check the characteristics that are present in their work. As such, the teacher’s role is to provide class time so that students are able to reflect on their work.

Assessing

Assessing involves checking and maintaining the quality of the portfolio over time. Normally, there should a gradual improvement in work quality in a portfolio. This is a subjective matter that is negotiated by the student and teacher often in the form of conferences.

Documenting

Documenting serves more as a reminder than an action. Simply, documenting means that the teacher and student maintain the importance of the portfolio over the course of its usefulness. This is critical as it is easy to forget about portfolios through the pressure of the daily teaching experience.

Linking is the use of a portfolio to serve as a mode of communication between students, peers, teachers, and even parents. Students can look at each other portfolios and provide feedback. Parents can also examine the work of their child through the use of portfolios.

Evaluating

Evaluating is the process of receiving a grade for this experience. For the teacher, the goal is to provide positive washback when assessing the portfolios. The focus is normally less on grades and more qualitative in nature.

Conclusions

Portfolios provide rich opportunities for developing intrinsic motivation, individualize learning, and critical thinking. However, the trying to affix a grade to such a learning experience is often impractical. As such, portfolios are useful but it can be hard to prove that any learning took place.

# Data Munging with Dplyr

Data preparation aka data munging is what most data scientist spend the majority of their time doing. Extracting and transforming data is difficult, to say the least. Every dataset is different with unique problems. This makes it hard to generalize best practices for transforming data so that it is suitable for analysis.

In this post, we will look at how to use the various functions in the “dplyr”” package. This package provides numerous ways to develop features as well as explore the data. We will use the “attitude” dataset from base r for our analysis. Below is some initial code.

library(dplyr)
data("attitude")
str(attitude)
## 'data.frame':    30 obs. of  7 variables:
##  $rating : num 43 63 71 61 81 43 58 71 72 67 ... ##$ complaints: num  51 64 70 63 78 55 67 75 82 61 ...
##  $privileges: num 30 51 68 45 56 49 42 50 72 45 ... ##$ learning  : num  39 54 69 47 66 44 56 55 67 47 ...
##  $raises : num 61 63 76 54 71 54 66 70 71 62 ... ##$ critical  : num  92 73 86 84 83 49 68 66 83 80 ...
##  $advance : num 45 47 48 35 47 34 35 41 31 41 ... You can see we have seven variables and only 30 observations. Our first function that we will learn to use is the “select” function. This function allows you to select columns of data you want to use. In order to use this feature, you need to know the names of the columns you want. Therefore, we will first use the “names” function to determine the names of the columns and then use the “select”” function. names(attitude)[1:3] ## [1] "rating" "complaints" "privileges" smallset<-select(attitude,rating:privileges) head(smallset) ## rating complaints privileges ## 1 43 51 30 ## 2 63 64 51 ## 3 71 70 68 ## 4 61 63 45 ## 5 81 78 56 ## 6 43 55 49 The difference is probably obvious. Using the “select” function we have 3 instead of 7 variables. We can also exclude columns we do not want by placing a negative in front of the names of the columns. Below is the code head(select(attitude,-(rating:privileges))) ## learning raises critical advance ## 1 39 61 92 45 ## 2 54 63 73 47 ## 3 69 76 86 48 ## 4 47 54 84 35 ## 5 66 71 83 47 ## 6 44 54 49 34 We can also use the “rename” function to change the names of columns. In our example below, we will change the name of the “rating” to “rates.” The code is below. Keep in mind that the new name for the column is to the left of the equal sign and the old name is to the right attitude<-rename(attitude,rates=rating) head(attitude) ## rates complaints privileges learning raises critical advance ## 1 43 51 30 39 61 92 45 ## 2 63 64 51 54 63 73 47 ## 3 71 70 68 69 76 86 48 ## 4 61 63 45 47 54 84 35 ## 5 81 78 56 66 71 83 47 ## 6 43 55 49 44 54 49 34 The “select”” function can be used in combination with other functions to find specific columns in the dataset. For example, we will use the “ends_with” function inside the “select” function to find all columns that end with the letter s. s_set<-head(select(attitude,ends_with("s"))) s_set ## rates complaints privileges raises ## 1 43 51 30 61 ## 2 63 64 51 63 ## 3 71 70 68 76 ## 4 61 63 45 54 ## 5 81 78 56 71 ## 6 43 55 49 54 The “filter” function allows you to select rows from a dataset based on criteria. In the code below we will select only rows that have a 75 or higher in the “raises” variable. bigraise<-filter(attitude,raises>75) bigraise ## rates complaints privileges learning raises critical advance ## 1 71 70 68 69 76 86 48 ## 2 77 77 54 72 79 77 46 ## 3 74 85 64 69 79 79 63 ## 4 66 77 66 63 88 76 72 ## 5 78 75 58 74 80 78 49 ## 6 85 85 71 71 77 74 55 If you look closely all values in the “raise” column are greater than 75. Of course, you can have more than one criteria. IN the code below there are two. filter(attitude, raises>70 & learning<67) ## rates complaints privileges learning raises critical advance ## 1 81 78 56 66 71 83 47 ## 2 65 70 46 57 75 85 46 ## 3 66 77 66 63 88 76 72 The “arrange” function allows you to sort the order of the rows. In the code below we first sort the data ascending by the “critical” variable. Then we sort it descendingly by adding the “desc” function. ascCritical<-arrange(attitude, critical) head(ascCritical) ## rates complaints privileges learning raises critical advance ## 1 43 55 49 44 54 49 34 ## 2 81 90 50 72 60 54 36 ## 3 40 37 42 58 50 57 49 ## 4 69 62 57 42 55 63 25 ## 5 50 40 33 34 43 64 33 ## 6 71 75 50 55 70 66 41 descCritical<-arrange(attitude, desc(critical)) head(descCritical) ## rates complaints privileges learning raises critical advance ## 1 43 51 30 39 61 92 45 ## 2 71 70 68 69 76 86 48 ## 3 65 70 46 57 75 85 46 ## 4 61 63 45 47 54 84 35 ## 5 81 78 56 66 71 83 47 ## 6 72 82 72 67 71 83 31 The “mutate” function is useful for engineering features. In the code below we will transform the “learning” variable by subtracting its mean from its self attitude<-mutate(attitude,learningtrend=learning-mean(learning)) head(attitude) ## rates complaints privileges learning raises critical advance ## 1 43 51 30 39 61 92 45 ## 2 63 64 51 54 63 73 47 ## 3 71 70 68 69 76 86 48 ## 4 61 63 45 47 54 84 35 ## 5 81 78 56 66 71 83 47 ## 6 43 55 49 44 54 49 34 ## learningtrend ## 1 -17.366667 ## 2 -2.366667 ## 3 12.633333 ## 4 -9.366667 ## 5 9.633333 ## 6 -12.366667 You can also create logical variables with the “mutate” function.In the code below, we create a logical variable that is true when the “critical” variable” is higher than 80 and false when “critical”” is less than 80. The new variable is called “highCritical” attitude<-mutate(attitude,highCritical=critical>=80) head(attitude) ## rates complaints privileges learning raises critical advance ## 1 43 51 30 39 61 92 45 ## 2 63 64 51 54 63 73 47 ## 3 71 70 68 69 76 86 48 ## 4 61 63 45 47 54 84 35 ## 5 81 78 56 66 71 83 47 ## 6 43 55 49 44 54 49 34 ## learningtrend highCritical ## 1 -17.366667 TRUE ## 2 -2.366667 FALSE ## 3 12.633333 TRUE ## 4 -9.366667 TRUE ## 5 9.633333 TRUE ## 6 -12.366667 FALSE The “group_by” function is used for creating summary statistics based on a specific variable. It is similar to the “aggregate” function in R. This function works in combination with the “summarize” function for our purposes here. We will group our data by the “highCritical” variable. This means our data will be viewed as either TRUE for “highCritical” or FALSE. The results of this function will be saved in an object called “hcgroups” hcgroups<-group_by(attitude,highCritical) head(hcgroups) ## # A tibble: 6 x 9 ## # Groups: highCritical [2] ## rates complaints privileges learning raises critical advance ## ## 1 43 51 30 39 61 92 45 ## 2 63 64 51 54 63 73 47 ## 3 71 70 68 69 76 86 48 ## 4 61 63 45 47 54 84 35 ## 5 81 78 56 66 71 83 47 ## 6 43 55 49 44 54 49 34 ## # ... with 2 more variables: learningtrend , highCritical  Looking at the data you probably saw no difference. This is because we are not done yet. We need to summarize the data in order to see the results for our two groups in the “highCritical” variable. We will now generate the summary statistics by using the “summarize” function. We specifically want to know the mean of the “complaint” variable based on the variable “highCritical.” Below is the code summarize(hcgroups,complaintsAve=mean(complaints)) ## # A tibble: 2 x 2 ## highCritical complaintsAve ## ## 1 FALSE 67.31579 ## 2 TRUE 65.36364 Of course, you could have learned this through doing a t.test but this is another approach. Conclusion The “dplyr” package is one powerful tool for wrestling with data. There is nothing new in this package. Instead, the coding is simpler than what you can excute using base r. # Select the Word Questions in Moodle VIDEO Select the missing word questions in moodle # Guiding the Writing Process How a teacher guides the writing process can depend on a host of factors. Generally, how you support a student at the beginning of the writing process is different from how you support them at the end. In this post, we will look at the differences between these two stages of writing. The Beginning At the beginning of writing, there are a lot of decisions that need to be made as well as extensive planning. Generally, at this point, grammar is not the deciding factor in terms of the quality of the writing. Rather, the teacher is trying to help the students to determine the focus of the paper as well as the main ideas. The teacher needs to help the student to focus on the big picture of the purpose of their writing. This means that only major issues are addressed at least initially. You only want to point at potential disaster decisions rather than mundane details. It is tempting to try and fix everything when looking at rough drafts. This not only takes up a great deal of your time but it is also discouraging to students as they deal with intense criticism while still trying to determine what they truly want to do. As such, it is better to view your role at this point as a counselor or guide and not as detail oriented control freak. At this stage, the focus is on the discourse and not so much on the grammar. The End At the end of the writing process, there is a move from general comments to specific concerns. As the student gets closer and closer to the final draft the “little things” become more and more important. Grammar comes to the forefront. In addition, referencing and the strength of the supporting details become more important. Now is the time to get “picky” this is because major decisions have been made and the cognitive load of fixing small stuff is less stressful once the core of the paper is in place. The analogy I like to give is that first, you build the house. Which involves lots of big movements such as pouring a foundation, adding walls, and including a roof. This is the beginning of writing. The end of building a house includes more refined aspects such as painting the walls, adding the furniture, etc. This is the end of the writing process. Conclusion For writers and teachers, it is important to know where they are in the writing process. In my experience, it seems as if it is all about grammar from the beginning when this is not necessarily the case. At the beginning of a writing experience, the focus is on ideas. At the end of a writing experience, the focus is on grammar. The danger is always in trying to do too much at the same time. # Review of “First Encyclopedia of the Human Body” The First Encyclopedia of the Human Body (First Encyclopedias)by Fiona Chandler (pp. 64) provides insights into science for young children. The Summary This book explains all of the major functions of the human body as well as some aspects of health and hygiene. Students will learn about the brain, heart, hormones, where babies come from, as well as healthy eating and visiting the doctor. The Good This book is surprisingly well-written. The author was able to take the complexities of the human body and word them in a way that a child can understand. In addition, the illustrations are rich and interesting. For example, there are pictures of an infare-red scan of a child’s hands, x-rays of broken bones, as well as pictures of people doing things with their bodies such as running or jumping. There is also a good mix of small and large photos which allows this book to be used individually or for whole class reading. The large size of the text also allows for younger readers to appreciate not only the pictures but also the reading. There are also several activities in the book at different places. For example, students are invited to take their pulse, determine how much air is in their lungs, as well as an activity for testing your sense of touch. In every section of the book, there are links to online activities as well. It seems as though this book has every angle covered in terms of learning. The Bad There is little to criticize in this book. It’s a really fun text. Perhaps if you are an expert in the human body you may find things that are disappointing. However, for a layman called to teach young people science, this text is more than adequate. The Recommendation I would give this book 5/5 stars. My students loved it and I was able to use it in so many different ways to build activities and discussions. I am sure that the use of this book would be beneficial to almost any teacher in any classroom # Reading Assessment at the Perceptual and Selective Level This post will provide examples of assessments that can be used for reading at the perceptual and selective level. # Perceptual Level The perceptual level is focused on bottom-up processing of text. Comprehension ability is not critical at this point. Rather, you are just determining if the student can accomplish the mechanical process of reading. Examples Reading Aloud-How this works is probably obvious to most teachers. The students read a text out loud in the presence of an assessor. Picture-Cued-Students are shown a picture. At the bottom of the picture are words. The students read the word and point to a visual example of it in the picture. For example, if the picture has a cat in it. At the bottom of the picture would be the word cat. The student would read the word cat and point to the actual cat in the picture. This can be extended by using sentences instead of words. For example, if the actual picture shows a man driving a car. There may be a sentence at the bottom of the picture that says “a man is driving a car”. The student would then point to the man in the actual picture who is driving. Another option is T/F statements. Using our cat example from above. We might write that “There is one cat in the picture” the student would then select T/F. Other Examples-These includes multiple-choice and written short answer. # Selective Level The selective level is the next above perceptual. At this level, the student should be able to recognize various aspects of grammar. Examples Editing Task-Students are given a reading passage and are asked to fix the grammar. This can happen many different ways. They could be asked to pick the incorrect word in a sentence or to add or remove punctuation. Pictured-Cued Task-This task appeared at the perceptual level. Now it is more complicated. For example, the students might be required to read statements and label a diagram appropriately, such as the human body or aspects of geography. Gap-Filling Task-Students read a sentence and complete it appropriately Other Examples-Includes multiple-choice and matching. The multiple-choice may focus on grammar, vocabulary, etc. Matching attempts to assess a students ability to pair similar items. Conclusion Reading assessment can take many forms. The examples here provide ways to deal with this for students who are still highly immature in their reading abilities. As fluency develops more complex measures can be used to determine a students reading capability. # Types of Speaking in ESL In the context of ESL teaching, ~there are at least five types of speaking that take place in the classroom. This post will define and provide examples of each. The five types are as follows… • Imitative • Intensive • Responsive • Interactive • Extensive The list above is ordered from simplest to most complex in terms of the requirements of oral production for the student. Imitative At the imitative level, it is probably already clear what the student is trying to do. At this level, the student is simply trying to repeat what was said to them in a way that is understandable and with some adherence to pronunciation as defined by the teacher. It doesn’t matter if the student comprehends what they are saying or carrying on a conversation. The goal is only to reproduce what was said to them. One common example of this is a “repeat after me” experience in the classroom. Intensive Intensive speaking involves producing a limit amount of language in a highly control context. An example of this would be to read aloud a passage or give a direct response to a simple question. Competency at this level is shown through achieving certain grammatical or lexical mastery. This depends on the teacher’s expectations. Responsive Responsive is slightly more complex than intensive but the difference is blurry, to say the least. At this level, the dialog includes a simple question with a follow-up question or two. Conversations take place by this point but are simple in content. Interactive The unique feature of intensive speaking is that it is usually more interpersonal than transactional. By interpersonal it is meant speaking for maintaining relationships. Transactional speaking is for sharing information as is common at the responsive level. The challenge of interpersonal speaking is the context or pragmatics The speaker has to keep in mind the use of slang, humor, ellipsis, etc. when attempting to communicate. This is much more complex than saying yes or no or giving directions to the bathroom in a second language. Extensive Extensive communication is normal some sort of monolog. Examples include speech, story-telling, etc. This involves a great deal of preparation and is not typically improvisational communication. It is one thing to survive having a conversation with someone in a second language. You can rely on each other’s body language to make up for communication challenges. However, with extensive communication either the student can speak in a comprehensible way without relying on feedback or they cannot. In my personal experience, the typical ESL student cannot do this in a convincing manner. # Intensive Listening and ESL Intensive listening is listening for the elements (phonemes, intonation, etc.) in words and sentences. This form of listening is often assessed in an ESL setting as a way to measure an individual’s phonological, morphological, and ability to paraphrase. In this post, we will look at these three forms of assessment with examples. Phonological Elements Phonological elements include phonemic consonant and phonemic vowel pairs. Phonemic consonant pair has to do with identifying consonants. Below is an example of what an ESL student would hear followed by potential choices they may have on a multiple-choice test. Recording: He’s from Thailand Choices: (a) He’s from Thailand (b) She’s from Thailand The answer is clearly (a). The confusion is with the adding of ‘s’ for choice (b). If someone is not listening carefully they could make a mistake. Below is an example of phonemic pairs involving vowels Recording: The girl is leaving? Choices: (a)The girl is leaving? (b)The girl is living? Again, if someone is not listening carefully they will miss the small change in the vowel. Morphological Elements Morphological elements follow the same approach as phonological elements. You can manipulate endings, stress patterns, or play with words. Below is an example of ending manipulation. Recording: I smiled a lot. Choices: (a) I smiled a lot. (b) I smile a lot. I sharp listener needs to hear the ‘d’ sound at the end of the word ‘smile’ which can be challenging for ESL student. Below is an example of stress pattern Recording: My friend doesn’t smoke. Choices: (a) My friend doesn’t smoke. (b) My friend does smoke. The contraction in the example is the stress pattern the listener needs to hear. Below is an example of a play with words. Recording: wine Choices: (a) wine (b) vine This is especially tricky for languages that do not have both a ‘v’ and ‘w’ sound, such as the Thai language. Paraphrase recognition Paraphrase recognition involves listening to an example of being able to reword it in an appropriate manner. This involves not only listening but also vocabulary selection and summarizing skills. Below is one example of sentence paraphrasing Recording: My name is James. I come from California Choices: (a) James is Californian (b) James loves Calfornia This is trickier because both can be true. However, the goal is to try and rephrase what was heard. Another form of paraphrasing is dialogue paraphrasing as shown below Recording: Man: My name is Thomas. What is your name? Woman: My name is Janet. Nice to meet you. Are you from Africa Man: No, I am an American Choices: (a) Thomas is from America (b)Thomas is African You can see the slight rephrase that is wrong with choice (b). This requires the student to listen to slightly longer audio while still have to rephrase it appropriately. Conclusion Intensive listening involves the use of listening for the little details of an audio. This is a skill that provides a foundation for much more complex levels of listening. # Recommendation Engines in R In this post, we will look at how to make a recommendation engine. We will use data that makes recommendations about movies. We will use the “recommenderlab” package to build several different engines. The data comes from At this link, you need to download the “ml-latest.zip”. From there, we will use the “ratings” and “movies” files in this post. Ratings provide the ratings of the movies while movies provide the names of the movies. Before going further it is important to know that the “recommenderlab” has five different techniques for developing recommendation engines (IBCF, UBCF, POPULAR, RANDOM, & SVD). We will use all of them for comparative purposes Below is the code for getting started. library(recommenderlab) ratings <- read.csv("~/Downloads/ml-latest-small/ratings.csv") movies <- read.csv("~/Downloads/ml-latest-small/movies.csv") We now need to merge the two datasets so that they become one. This way the titles and ratings are in one place. We will then coerce our “movieRatings” dataframe into a “realRatingMatrix” in order to continue our analysis. Below is the code movieRatings<-merge(ratings, movies, by='movieId') #merge two files movieRatings<-as(movieRatings,"realRatingMatrix") #coerce to realRatingMatrix We will now create two histograms of the ratings. The first is raw data and the second will be normalized data. The function “getRatings” is used in combination with the “hist” function to make the histogram. The normalized data includes the “normalize” function. Below is the code. hist(getRatings(movieRatings),breaks =10) hist(getRatings(normalize(movieRatings)),breaks =10) We are now ready to create the evaluation scheme for our analysis. In this object we need to set the data name (movieRatings), the method we want to use (cross-validation), the amount of data we want to use for the training set (80%), how many ratings the algorithm is given during the test set (1) with the rest being used to compute the error. We also need to tell R what a good rating is (4 or higher) and the number of folds for the cross-validation (10). Below is the code for all of this. set.seed(123) eSetup<-evaluationScheme(movieRatings,method='cross-validation',train=.8,given=1,goodRating=4,k=10) Below is the code for developing our models. To do this we need to use the “Recommender” function and the “getData” function to get the dataset. Remember we are using all six modeling techniques ubcf<-Recommender(getData(eSetup,"train"),"UBCF") ibcf<-Recommender(getData(eSetup,"train"),"IBCF") svd<-Recommender(getData(eSetup,"train"),"svd") popular<-Recommender(getData(eSetup,"train"),"POPULAR") random<-Recommender(getData(eSetup,"train"),"RANDOM") The models have been created. We can now make our predictions using the “predict” function in addition to the “getData” function. We also need to set the argument “type” to “ratings”. Below is the code. ubcf_pred<-predict(ubcf,getData(eSetup,"known"),type="ratings") ibcf_pred<-predict(ibcf,getData(eSetup,"known"),type="ratings") svd_pred<-predict(svd,getData(eSetup,"known"),type="ratings") pop_pred<-predict(popular,getData(eSetup,"known"),type="ratings") rand_pred<-predict(random,getData(eSetup,"known"),type="ratings") We can now look at the accuracy of the models. We will do this in two steps. First, we will look at the error rates. After completing this, we will do a more detailed analysis of the stronger models. Below is the code for the first step ubcf_error<-calcPredictionAccuracy(ubcf_pred,getData(eSetup,"unknown")) #calculate error ibcf_error<-calcPredictionAccuracy(ibcf_pred,getData(eSetup,"unknown")) svd_error<-calcPredictionAccuracy(svd_pred,getData(eSetup,"unknown")) pop_error<-calcPredictionAccuracy(pop_pred,getData(eSetup,"unknown")) rand_error<-calcPredictionAccuracy(rand_pred,getData(eSetup,"unknown")) error<-rbind(ubcf_error,ibcf_error,svd_error,pop_error,rand_error) #combine objects into one data frame rownames(error)<-c("UBCF","IBCF","SVD","POP","RAND") #give names to rows error ## RMSE MSE MAE ## UBCF 1.278074 1.633473 0.9680428 ## IBCF 1.484129 2.202640 1.1049733 ## SVD 1.277550 1.632135 0.9679505 ## POP 1.224838 1.500228 0.9255929 ## RAND 1.455207 2.117628 1.1354987 The results indicate that the “RAND” and “IBCF” models are clearly worst than the remaining three. We will now move to the second step and take a closer look at the “UBCF”, “SVD”, and “POP” models. We will do this by making a list and using the “evaluate” function to get other model evaluation metrics. We will make a list called “algorithms” and store the three strongest models. Then we will make a objective called “evlist” in this object we will use the “evaluate” function as well as called the evaluation scheme “esetup”, the list (“algorithms”) as well as the number of movies to assess (5,10,15,20) algorithms<-list(POPULAR=list(name="POPULAR"),SVD=list(name="SVD"),UBCF=list(name="UBCF")) evlist<-evaluate(eSetup,algorithms,n=c(5,10,15,20)) avg(evlist) ##$POPULAR
##           TP        FP       FN       TN  precision     recall        TPR
## 5  0.3010965  3.033333 4.917105 661.7485 0.09028443 0.07670381 0.07670381
## 10 0.4539474  6.214912 4.764254 658.5669 0.06806016 0.11289681 0.11289681
## 15 0.5953947  9.407895 4.622807 655.3739 0.05950450 0.14080354 0.14080354
## 20 0.6839912 12.653728 4.534211 652.1281 0.05127635 0.16024740 0.16024740
##            FPR
## 5  0.004566269
## 10 0.009363021
## 15 0.014177091
## 20 0.019075070
##
## $SVD ## TP FP FN TN precision recall TPR ## 5 0.1025219 3.231908 5.115680 661.5499 0.03077788 0.00968336 0.00968336 ## 10 0.1808114 6.488048 5.037390 658.2938 0.02713505 0.01625454 0.01625454 ## 15 0.2619518 9.741338 4.956250 655.0405 0.02620515 0.02716656 0.02716656 ## 20 0.3313596 13.006360 4.886842 651.7754 0.02486232 0.03698768 0.03698768 ## FPR ## 5 0.004871678 ## 10 0.009782266 ## 15 0.014689510 ## 20 0.019615377 ## ##$UBCF
##           TP        FP       FN       TN  precision     recall        TPR
## 5  0.1210526  2.968860 5.097149 661.8129 0.03916652 0.01481106 0.01481106
## 10 0.2075658  5.972259 5.010636 658.8095 0.03357173 0.02352752 0.02352752
## 15 0.3028509  8.966886 4.915351 655.8149 0.03266321 0.03720717 0.03720717
## 20 0.3813596 11.978289 4.836842 652.8035 0.03085246 0.04784538 0.04784538
##            FPR
## 5  0.004475151
## 10 0.009004466
## 15 0.013520481
## 20 0.018063361

Well, the numbers indicate that all the models are terrible. All metrics are scored rather poorly. True positives, false positives, false negatives, true negatives, precision, recall, true positive rate, and false positive rate are low for all models. Remember that these values are averages of the cross-validation. As such, for the “POPULAR” model when looking at the top five movies on average, the number of true positives was .3.

Even though the numbers are terrible the “POPULAR” model always performed the best. We can even view the ROC curve with the code below

plot(evlist,legend="topleft",annotate=T)

We can now determine individual recommendations. We first need to build a model using the POPULAR algorithm. Below is the code.

Rec1<-Recommender(movieRatings,method="POPULAR")
Rec1
## Recommender of type 'POPULAR' for 'realRatingMatrix'
## learned using 9066 users.

We will now pull the top five recommendations for the first two raters and make a list. The numbers are the movie ids and not the actual titles

recommend<-predict(Rec1,movieRatings[1:5],n=5)
as(recommend,"list")
## $1 ## [1] "78" "95" "544" "102" "4" ## ##$2
## [1] "242" "232" "294" "577" "95"
##
## $3 ## [1] "654" "242" "30" "232" "287" ## ##$4
## [1] "564" "654" "242" "30"  "232"
##
## $5 ## [1] "242" "30" "232" "287" "577" Below we can see the specific score for a specific movie. The names of the movies come from the original “ratings” dataset. rating<-predict(Rec1,movieRatings[1:5],type='ratings') rating ## 5 x 671 rating matrix of class 'realRatingMatrix' with 2873 ratings. movieresult<-as(rating,'matrix')[1:5,1:3] colnames(movieresult)<-c("Toy Story","Jumanji","Grumpier Old Men") movieresult ## Toy Story Jumanji Grumpier Old Men ## 1 2.859941 3.822666 3.724566 ## 2 2.389340 3.352066 3.253965 ## 3 2.148488 3.111213 3.013113 ## 4 1.372087 2.334812 2.236711 ## 5 2.255328 3.218054 3.119953 This is what the model thinks the person would rate the movie. It is the difference between this number and the actual one that the error is calculated. In addition, if someone did not rate a movie you would see an NA in that spot Conclusion This was a lot of work. However, with additional work, you can have your own recommendation system based on data that was collected. # Clustering Mixed Data in R One of the major problems with hierarchical and k-means clustering is that they cannot handle nominal data. The reality is that most data is mixed or a combination of both interval/ratio data and nominal/ordinal data. One of many ways to deal with this problem is by using the Gower coefficient. This coefficient compares the pairwise cases in the data set and calculates a dissimilarity between. By dissimilar we mean the weighted mean of the variables in that row. Once the dissimilarity calculations are completed using the gower coefficient (there are naturally other choices), you can then use regular kmeans clustering (there are also other choices) to find the traits of the various clusters. In this post, we will use the “MedExp” dataset from the “Ecdat” package. Our goal will be to cluster the mixed data into four clusters. Below is some initial code. library(cluster);library(Ecdat);library(compareGroups) data("MedExp") str(MedExp) ## 'data.frame': 5574 obs. of 15 variables: ##$ med     : num  62.1 0 27.8 290.6 0 ...
##  $lc : num 0 0 0 0 0 0 0 0 0 0 ... ##$ idp     : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 1 1 ...
##  $lpi : num 6.91 6.91 6.91 6.91 6.11 ... ##$ fmde    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $physlim : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ... ##$ ndisease: num  13.7 13.7 13.7 13.7 13.7 ...
##  $health : Factor w/ 4 levels "excellent","good",..: 2 1 1 2 2 2 2 1 2 2 ... ##$ linc    : num  9.53 9.53 9.53 9.53 8.54 ...
##  $lfam : num 1.39 1.39 1.39 1.39 1.1 ... ##$ educdec : num  12 12 12 12 12 12 12 12 9 9 ...
##  $age : num 43.9 17.6 15.5 44.1 14.5 ... ##$ sex     : Factor w/ 2 levels "male","female": 1 1 2 2 2 2 2 1 2 2 ...
##  $child : Factor w/ 2 levels "no","yes": 1 2 2 1 2 2 1 1 2 1 ... ##$ black   : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...

You can clearly see that our data is mixed with both numerical and factor variables. Therefore, the first thing we must do is calculate the gower coefficient for the dataset. This is done with the “daisy” function from the “cluster” package.

disMat<-daisy(MedExp,metric = "gower")

Now we can use the “kmeans” to make are clusters. This is possible because all the factor variables have been converted to a numerical value. We will set the number of clusters to 4. Below is the code.

set.seed(123)
mixedClusters<-kmeans(disMat, centers=4)

We can now look at a table of the clusters

table(mixedClusters$cluster) ## ## 1 2 3 4 ## 1960 1342 1356 916 The groups seem reasonably balanced. We now need to add the results of the kmeans to the original dataset. Below is the code MedExp$cluster<-mixedClusters$cluster We now can built a descriptive table that will give us the proportions of each variable in each cluster. To do this we need to use the “compareGroups” function. We will then take the output of the “compareGroups” function and use it in the “createTable” function to get are actual descriptive stats. group<-compareGroups(cluster~.,data=MedExp) clustab<-createTable(group) clustab ## ## --------Summary descriptives table by 'cluster'--------- ## ## __________________________________________________________________________ ## 1 2 3 4 p.overall ## N=1960 N=1342 N=1356 N=916 ## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ## med 211 (1119) 68.2 (333) 269 (820) 83.8 (210) <0.001 ## lc 4.07 (0.60) 4.05 (0.60) 0.04 (0.39) 0.03 (0.34) 0.000 ## idp: <0.001 ## no 1289 (65.8%) 922 (68.7%) 1123 (82.8%) 781 (85.3%) ## yes 671 (34.2%) 420 (31.3%) 233 (17.2%) 135 (14.7%) ## lpi 5.72 (1.94) 5.90 (1.73) 3.27 (2.91) 3.05 (2.96) <0.001 ## fmde 6.82 (0.99) 6.93 (0.90) 0.00 (0.12) 0.00 (0.00) 0.000 ## physlim: <0.001 ## no 1609 (82.1%) 1163 (86.7%) 1096 (80.8%) 789 (86.1%) ## yes 351 (17.9%) 179 (13.3%) 260 (19.2%) 127 (13.9%) ## ndisease 11.5 (8.26) 10.2 (2.97) 12.2 (8.50) 10.6 (3.35) <0.001 ## health: <0.001 ## excellent 910 (46.4%) 880 (65.6%) 615 (45.4%) 612 (66.8%) ## good 828 (42.2%) 382 (28.5%) 563 (41.5%) 261 (28.5%) ## fair 183 (9.34%) 74 (5.51%) 137 (10.1%) 42 (4.59%) ## poor 39 (1.99%) 6 (0.45%) 41 (3.02%) 1 (0.11%) ## linc 8.68 (1.22) 8.61 (1.37) 8.75 (1.17) 8.78 (1.06) 0.005 ## lfam 1.05 (0.57) 1.49 (0.34) 1.08 (0.58) 1.52 (0.35) <0.001 ## educdec 12.1 (2.87) 11.8 (2.58) 12.0 (3.08) 11.8 (2.73) 0.005 ## age 36.5 (12.0) 9.26 (5.01) 37.0 (12.5) 9.29 (5.11) 0.000 ## sex: <0.001 ## male 893 (45.6%) 686 (51.1%) 623 (45.9%) 482 (52.6%) ## female 1067 (54.4%) 656 (48.9%) 733 (54.1%) 434 (47.4%) ## child: 0.000 ## no 1960 (100%) 0 (0.00%) 1356 (100%) 0 (0.00%) ## yes 0 (0.00%) 1342 (100%) 0 (0.00%) 916 (100%) ## black: <0.001 ## yes 1623 (82.8%) 986 (73.5%) 1148 (84.7%) 730 (79.7%) ## no 337 (17.2%) 356 (26.5%) 208 (15.3%) 186 (20.3%) ## ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ The table speaks for itself. Results that utilize factor variables have proportions to them. For example, in cluster 1, 1289 people or 65.8% responded “no” that the have an individual deductible plan (idp). Numerical variables have the mean with the standard deviation in parentheses. For example, in cluster 1 the average family size was 1 with a standard deviation of 1.05 (lfam). Conclusion Mixed data can be partition into clusters with the help of the gower or another coefficient. In addition, kmeans is not the only way to cluster the data. There are other choices such as the partitioning around medoids. The example provided here simply serves as a basic introduction to this. # Hierarchical Clustering in R Hierarchical clustering is a form of unsupervised learning. What this means is that the data points lack any form of label and the purpose of the analysis is to generate labels for our data points. IN other words, we have no Y values in our data. Hierarchical clustering is an agglomerative technique. This means that each data point starts as their own individual clusters and are merged over iterations. This is great for small datasets but is difficult to scale. In addition, you need to set the linkage which is used to place observations in different clusters. There are several choices (ward, complete, single, etc.) and the best choice depends on context. In this post, we will make a hierarchical clustering analysis of the “MedExp” data from the “Ecdat” package. We are trying to identify distinct subgroups in the sample. The actual hierarchical cluster creates what is a called a dendrogram. Below is some initial code. library(cluster);library(compareGroups);library(NbClust);library(HDclassif);library(sparcl);library(Ecdat) data("MedExp") str(MedExp) ## 'data.frame': 5574 obs. of 15 variables: ##$ med     : num  62.1 0 27.8 290.6 0 ...
##  $lc : num 0 0 0 0 0 0 0 0 0 0 ... ##$ idp     : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 1 1 ...
##  $lpi : num 6.91 6.91 6.91 6.91 6.11 ... ##$ fmde    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $physlim : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 1 ... ##$ ndisease: num  13.7 13.7 13.7 13.7 13.7 ...
##  $health : Factor w/ 4 levels "excellent","good",..: 2 1 1 2 2 2 2 1 2 2 ... ##$ linc    : num  9.53 9.53 9.53 9.53 8.54 ...
##  $lfam : num 1.39 1.39 1.39 1.39 1.1 ... ##$ educdec : num  12 12 12 12 12 12 12 12 9 9 ...
##  $age : num 43.9 17.6 15.5 44.1 14.5 ... ##$ sex     : Factor w/ 2 levels "male","female": 1 1 2 2 2 2 2 1 2 2 ...
##  $child : Factor w/ 2 levels "no","yes": 1 2 2 1 2 2 1 1 2 1 ... ##$ black   : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...

Currently, for the purposes of this post. The dataset is too big. IF we try to do the analysis with over 5500 observations it will take a long time. Therefore, we will only use the first 1000 observations. In addition, We need to remove factor variables as hierarchical clustering cannot analyze factor variables. Below is the code.

MedExp_small<-MedExp[1:1000,]
MedExp_small$sex<-NULL MedExp_small$idp<-NULL
MedExp_small$child<-NULL MedExp_small$black<-NULL
MedExp_small$physlim<-NULL MedExp_small$health<-NULL

We now need to scale are data. This is important because different scales will cause different variables to have more or less influence on the results. Below is the code

MedExp_small_df<-as.data.frame(scale(MedExp_small))

We now need to determine how many clusters to create. There is no rule on this but we can use statistical analysis to help us. The “NbClust” package will conduct several different analysis to provide a suggested number of clusters to create. You have to set the distance, min/max number of clusters, the method, and the index. The graphs can be understood by looking for the bend or elbow in them. At this point is the best number of clusters.

numComplete<-NbClust(MedExp_small_df,distance = 'euclidean',min.nc = 2,max.nc = 8,method = 'ward.D2',index = c('all'))

## *** : The Hubert index is a graphical method of determining the number of clusters.
##                 In the plot of Hubert index, we seek a significant knee that corresponds to a
##                 significant increase of the value of the measure i.e the significant peak in Hubert
##                 index second differences plot.
## 

## *** : The D index is a graphical method of determining the number of clusters.
##                 In the plot of D index, we seek a significant knee (the significant peak in Dindex
##                 second differences plot) that corresponds to a significant increase of the value of
##                 the measure.
##
## *******************************************************************
## * Among all indices:
## * 7 proposed 2 as the best number of clusters
## * 9 proposed 3 as the best number of clusters
## * 6 proposed 6 as the best number of clusters
## * 1 proposed 8 as the best number of clusters
##
##                    ***** Conclusion *****
##
## * According to the majority rule, the best number of clusters is  3
##
##
## *******************************************************************
numComplete$Best.nc ## KL CH Hartigan CCC Scott Marriot ## Number_clusters 2.0000 2.0000 6.0000 8.0000 3.000 3.000000e+00 ## Value_Index 2.9814 292.0974 56.9262 28.4817 1800.873 4.127267e+24 ## TrCovW TraceW Friedman Rubin Cindex DB ## Number_clusters 6.0 6.0000 3.0000 6.0000 2.000 3.0000 ## Value_Index 166569.3 265.6967 5.3929 -0.0913 0.112 1.0987 ## Silhouette Duda PseudoT2 Beale Ratkowsky Ball ## Number_clusters 2.0000 2.0000 2.0000 2.0000 6.0000 3.000 ## Value_Index 0.2809 0.9567 16.1209 0.2712 0.2707 1435.833 ## PtBiserial Frey McClain Dunn Hubert SDindex Dindex ## Number_clusters 6.0000 1 3.000 3.0000 0 3.0000 0 ## Value_Index 0.4102 NA 0.622 0.1779 0 1.9507 0 ## SDbw ## Number_clusters 3.0000 ## Value_Index 0.5195 Simple majority indicates that three clusters is most appropriate. However, four clusters are probably just as good. Every time you do the analysis you will get slightly different results unless you set the seed. To make our actual clusters we need to calculate the distances between clusters using the “dist” function while also specifying the way to calculate it. We will calculate distance using the “Euclidean” method. Then we will take the distance’s information and make the actual clustering using the ‘hclust’ function. Below is the code. distance<-dist(MedExp_small_df,method = 'euclidean') hiclust<-hclust(distance,method = 'ward.D2') We can now plot the results. We will plot “hiclust” and set hang to -1 so this will place the observations at the bottom of the plot. Next, we use the “cutree” function to identify 4 clusters and store this in the “comp” variable. Lastly, we use the “ColorDendrogram” function to highlight are actual clusters. plot(hiclust,hang=-1, labels=F) comp<-cutree(hiclust,4) ColorDendrogram(hiclust,y=comp,branchlength = 100) We can also create some descriptive stats such as the number of observations per cluster. table(comp) ## comp ## 1 2 3 4 ## 439 203 357 1 We can also make a table that looks at the descriptive stats by cluster by using the “aggregate” function. aggregate(MedExp_small_df,list(comp),mean) ## Group.1 med lc lpi fmde ndisease ## 1 1 0.01355537 -0.7644175 0.2721403 -0.7498859 0.048977122 ## 2 2 -0.06470294 -0.5358340 -1.7100649 -0.6703288 -0.105004408 ## 3 3 -0.06018129 1.2405612 0.6362697 1.3001820 -0.002099968 ## 4 4 28.66860936 1.4732183 0.5252898 1.1117244 0.564626907 ## linc lfam educdec age ## 1 0.12531718 -0.08861109 0.1149516 0.12754008 ## 2 -0.44435225 0.22404456 -0.3767211 -0.22681535 ## 3 0.09804031 -0.01182114 0.0700381 -0.02765987 ## 4 0.18887531 -2.36063161 1.0070155 -0.07200553 Cluster 1 is the most educated (‘educdec’). Cluster 2 stands out as having higher medical cost (‘med’), chronic disease (‘ndisease’) and age. Cluster 3 had the lowest annual incentive payment (‘lpi’). Cluster 4 had the highest coinsurance rate (‘lc’). You can make boxplots of each of the stats above. Below is just an example of age by cluster. MedExp_small_df$cluster<-comp
boxplot(age~cluster,MedExp_small_df)

Conclusion

Hierarchical clustering is one way in which to provide labels for data that does not have labels. The main challenge is determining how many clusters to create. However, this can be dealt with through using recommendations that come from various functions in R.

# Attendance Module in Moodle VIDEO

How to setup the attendance module in Moodle

# Forum Options in Moodle VIDEO

Options for forums in Moodle

# Q&A Forums in Moodle VIDEO

Creating Q&A forums in Moodle

# Each Person Post One Discussion Forum in Moodle VIDEO

Using the Moodle forum option  of each person posts one discussion

# Simple Discussion Forum in Moodle VIDEO

How to create a simple discussion forum in Moodle

# Validating a Logistic Model in R

In this post, we are going to continue are analysis of the logistic regression model from the the post on logistic regression  in R. We need to rerun all of the code from the last post to be ready to continue. As such the code form the last post is all below

library(MASS);library(bestglm);library(reshape2);library(corrplot);
library(ggplot2);library(ROCR)
data(survey)
survey$Clap<-NULL survey$W.Hnd<-NULL
survey$Fold<-NULL survey$Exer<-NULL
survey$Smoke<-NULL survey$M.I<-NULL
survey<-na.omit(survey)
pm<-melt(survey, id.var="Sex")
ggplot(pm,aes(Sex,value))+geom_boxplot()+facet_wrap(~variable,ncol = 3)

pc<-cor(survey[,2:5]) corrplot.mixed(pc)

set.seed(123) ind<-sample(2,nrow(survey),replace=T,prob = c(0.7,0.3)) train<-survey[ind==1,] test<-survey[ind==2,] fit<-glm(Sex~.,binomial,train) exp(coef(fit))

train$probs<-predict(fit, type = 'response') train$predict<-rep('Female',123)
train$predict[train$probs>0.5]<-"Male"
table(train$predict,train$Sex)
mean(train$predict==train$Sex)
test$prob<-predict(fit,newdata = test, type = 'response') test$predict<-rep('Female',46)
test$predict[test$prob>0.5]<-"Male"
table(test$predict,test$Sex)
mean(test$predict==test$Sex)

Model Validation

We will now do a K-fold cross validation in order to further see how our model is doing. We cannot use the factor variable “Sex” with the K-fold code so we need to create a dummy variable. First, we create a variable called “y” that has 123 spaces, which is the same size as the “train” dataset. Second, we fill “y” with 1 in every example that is coded “male” in the “Sex” variable.

In addition, we also need to create a new dataset and remove some variables from our prior analysis otherwise we will confuse the functions that we are going to use. We will remove “predict”, “Sex”, and “probs”

train$y<-rep(0,123) train$y[train$Sex=="Male"]=1 my.cv<-train[,-8] my.cv$Sex<-NULL
my.cv$probs<-NULL We now can do our K-fold analysis. The code is complicated so you can trust it and double check on your own. bestglm(Xy=my.cv,IC="CV",CVArgs = list(Method="HTF",K=10,REP=1),family = binomial) ## Morgan-Tatar search since family is non-gaussian. ## CV(K = 10, REP = 1) ## BICq equivalent for q in (6.66133814775094e-16, 0.0328567092272112) ## Best Model: ## Estimate Std. Error z value Pr(>|z|) ## (Intercept) -45.2329733 7.80146036 -5.798014 6.710501e-09 ## Height 0.2615027 0.04534919 5.766425 8.097067e-09 The results confirm what we alreaedy knew that only the “Height” variable is valuable in predicting Sex. We will now create our new model using only the recommendation of the kfold validation analysis. Then we check the new model against the train dataset and with the test dataset. The code below is a repeat of prior code but based on the cross-validation reduce.fit<-glm(Sex~Height, family=binomial,train) train$cv.probs<-predict(reduce.fit,type='response')
train$cv.predict<-rep('Female',123) train$cv.predict[train$cv.probs>0.5]='Male' table(train$cv.predict,train$Sex) ## ## Female Male ## Female 61 11 ## Male 7 44 mean(train$cv.predict==train$Sex) ## [1] 0.8536585 test$cv.probs<-predict(reduce.fit,test,type = 'response')
test$cv.predict<-rep('Female',46) test$cv.predict[test$cv.probs>0.5]='Male' table(test$cv.predict,test$Sex) ## ## Female Male ## Female 16 7 ## Male 1 22 mean(test$cv.predict==test$Sex) ## [1] 0.826087 The results are consistent for both the train and test dataset. We are now going to create the ROC curve. This will provide a visual and the AUC number to further help us to assess our model. However, a model is only good when it is compared to another model. Therefore, we will create a really bad model in order to compare it to the original model, and the cross validated model. We will first make a bad model and store the probabilities in the “test” dataset. The bad model will use “age” to predict “Sex” which doesn’t make any sense at all. Below is the code followed by the ROC curve of the bad model. bad.fit<-glm(Sex~Age,family = binomial,test) test$bad.probs<-predict(bad.fit,type='response')
pred.bad<-prediction(test$bad.probs,test$Sex)
plot(perf.bad,col=1)

The more of a diagonal the line is the worst it is. As, we can see the bad model is really bad.

What we just did with the bad model we will now repeat for the full model and the cross-validated model.  As before, we need to store the prediction in a way that the ROCR package can use them. We will create a variable called “pred.full” to begin the process of graphing the the original full model from the last blog post. Then we will use the “prediction” function. Next, we will create the “perf.full” variable to store the performance of the model. Notice, the arguments ‘tpr’ and ‘fpr’ for true positive rate and false positive rate. Lastly, we plot the results

pred.full<-prediction(test$prob,test$Sex)
perf.full<-performance(pred.full,'tpr','fpr')
plot(perf.full, col=2)

We repeat this process for the cross-validated model

pred.cv<-prediction(test$cv.probs,test$Sex)
perf.cv<-performance(pred.cv,'tpr','fpr')
plot(perf.cv,col=3)

Now let’s put all the different models on one plot

plot(perf.bad,col=1)
legend(.7,.4,c("BAD","FULL","CV"), 1:3)

Finally, we can calculate the AUC for each model

auc.bad<-performance(pred.bad,'auc')
auc.bad@y.values
## [[1]]
## [1] 0.4766734
auc.full<-performance(pred.full,"auc")
auc.full@y.values
## [[1]]
## [1] 0.959432
auc.cv<-performance(pred.cv,'auc')
auc.cv@y.values
## [[1]]
## [1] 0.9107505

The higher the AUC the better. As such, the full model with all variables is superior to the cross-validated or bad model. This is despite the fact that there are many high correlations in the full model as well. Another point to consider is that the cross-validated model is simpler so this may be a reason to pick it over the full model. As such, the statistics provide support for choosing a model but the do not trump the ability of the research to pick based on factors beyond just numbers.

# Logistic Regression in R

In this post, we will conduct a logistic regression analysis. Logistic regression is used when you want to predict a categorical dependent variable using continuous or categorical dependent variables. In our example, we want to predict Sex (male or female) when using several continuous variables from the “survey” dataset in the “MASS” package.

library(MASS);library(bestglm);library(reshape2);library(corrplot)
data(survey)
?MASS::survey #explains the variables in the study

The first thing we need to do is remove the independent factor variables from our dataset. The reason for this is that the function that we will use for the cross-validation does not accept factors. We will first use the “str” function to identify factor variables and then remove them from the dataset. We also need to remove in examples that are missing data so we use the “na.omit” function for this. Below is the code

survey$Clap<-NULL survey$W.Hnd<-NULL
survey$Fold<-NULL survey$Exer<-NULL
survey$Smoke<-NULL survey$M.I<-NULL
survey<-na.omit(survey)

We now need to check for collinearity using the “corrplot.mixed” function form the “corrplot” package.

pc<-cor(survey[,2:5])
corrplot.mixed(pc)
corrplot.mixed(pc)

We have an extreme correlation between “We.Hnd” and “NW.Hnd” this makes sense because people’s hands are normally the same size. Since this blog post is a demonstration of logistic regression we will not worry about this too much.

We now need to divide our dataset into a train and a test set. We set the seed for. First, we need to make a variable that we call “ind” that is randomly assigned 70% of the number of rows of survey 1 and 30% 2. We then subset the “train” dataset by taking all rows that are 1’s based on the “ind” variable and we create the “test” dataset for all the rows that line up with 2 in the “ind” variable. This means our data split is 70% train and 30% test. Below is the code

set.seed(123)
ind<-sample(2,nrow(survey),replace=T,prob = c(0.7,0.3))
train<-survey[ind==1,]
test<-survey[ind==2,]

We now make our model. We use the “glm” function for logistic regression. We set the family argument to “binomial”. Next, we look at the results as well as the odds ratios.

fit<-glm(Sex~.,family=binomial,train)
summary(fit)
##
## Call:
## glm(formula = Sex ~ ., family = binomial, data = train)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -1.9875  -0.5466  -0.1395   0.3834   3.4443
##
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -46.42175    8.74961  -5.306 1.12e-07 ***
## Wr.Hnd       -0.43499    0.66357  -0.656    0.512
## NW.Hnd        1.05633    0.70034   1.508    0.131
## Pulse        -0.02406    0.02356  -1.021    0.307
## Height        0.21062    0.05208   4.044 5.26e-05 ***
## Age           0.00894    0.05368   0.167    0.868
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
##     Null deviance: 169.14  on 122  degrees of freedom
## Residual deviance:  81.15  on 117  degrees of freedom
## AIC: 93.15
##
## Number of Fisher Scoring iterations: 6
exp(coef(fit))
##  (Intercept)       Wr.Hnd       NW.Hnd        Pulse       Height
## 6.907034e-21 6.472741e-01 2.875803e+00 9.762315e-01 1.234447e+00
##          Age
## 1.008980e+00

The results indicate that only height is useful in predicting if someone is a male or female. The second piece of code shares the odds ratios. The odds ratio tell how a one unit increase in the independent variable leads to an increase in the odds of being male in our model. For example, for every one unit increase in height there is a 1.23 increase in the odds of a particular example being male.

We now need to see how well our model does on the train and test dataset. We first capture the probabilities and save them to the train dataset as “probs”. Next we create a “predict” variable and place the string “Female” in the same number of rows as are in the “train” dataset. Then we rewrite the “predict” variable by changing any example that has a probability above 0.5 as “Male”. Then we make a table of our results to see the number correct, false positives/negatives. Lastly, we calculate the accuracy rate. Below is the code.

train$probs<-predict(fit, type = 'response') train$predict<-rep('Female',123)
train$predict[train$probs>0.5]<-"Male"
table(train$predict,train$Sex)
##
##          Female Male
##   Female     61    7
##   Male        7   48
mean(train$predict==train$Sex)
## [1] 0.8861789

Despite the weaknesses of the model with so many insignificant variables it is surprisingly accurate at 88.6%. Let’s see how well we do on the “test” dataset.

test$prob<-predict(fit,newdata = test, type = 'response') test$predict<-rep('Female',46)
test$predict[test$prob>0.5]<-"Male"
table(test$predict,test$Sex)
##
##          Female Male
##   Female     17    3
##   Male        0   26
mean(test$predict==test$Sex)
## [1] 0.9347826

As you can see, we do even better on the test set with an accuracy of 93.4%. Our model is looking pretty good and height is an excellent predictor of sex which makes complete sense. However, in the next post we will use cross-validation and the ROC plot to further assess the quality of it.

# Generalized Additive Models in R

In this post, we will learn how to create a generalized additive model (GAM). GAMs are non-parametric generalized linear models. This means that linear predictor of the model uses smooth functions on the predictor variables. As such, you do not need to specific the functional relationship between the response and continuous variables. This allows you to explore the data for potential relationships that can be more rigorously tested with other statistical models

In our example, we will use the “Auto” dataset from the “ISLR” package and use the variables “mpg”,“displacement”,“horsepower”,and “weight” to predict “acceleration”. We will also use the “mgcv” package. Below is some initial code to begin the analysis

library(mgcv)
library(ISLR)
data(Auto)

We will now make the model we want to understand the response of “accleration” to the explanatory variables of “mpg”,“displacement”,“horsepower”,and “weight”. After setting the model we will examine the summary. Below is the code

model1<-gam(acceleration~s(mpg)+s(displacement)+s(horsepower)+s(weight),data=Auto)
summary(model1)
##
## Family: gaussian
##
## Formula:
## acceleration ~ s(mpg) + s(displacement) + s(horsepower) + s(weight)
##
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.54133    0.07205   215.7   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
##                   edf Ref.df      F  p-value
## s(mpg)          6.382  7.515  3.479  0.00101 **
## s(displacement) 1.000  1.000 36.055 4.35e-09 ***
## s(horsepower)   4.883  6.006 70.187  < 2e-16 ***
## s(weight)       3.785  4.800 41.135  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) =  0.733   Deviance explained = 74.4%
## GCV = 2.1276  Scale est. = 2.0351    n = 392

All of the explanatory variables are significant and the adjust r-squared is .73 which is excellent. edf stands for “effective degrees of freedom”. This modified version of the degree of freedoms is due to the smoothing process in the model. GCV stands for generalized cross validation and this number is useful when comparing models. The model with the lowest number is the better model.

We can also examine the model visually by using the “plot” function. This will allow us to examine if the curvature fitted by the smoothing process was useful or not for each variable. Below is the code.

plot(model1)

We can also look at a 3d graph that includes the linear predictor as well as the two strongest predictors. This is done with the “vis.gam” function. Below is the code

vis.gam(model1)

If multiple models are developed. You can compare the GCV values to determine which model is the best. In addition, another way to compare models is with the “AIC” function. In the code below, we will create an additional model that includes “year” compare the GCV scores and calculate the AIC. Below is the code.

model2<-gam(acceleration~s(mpg)+s(displacement)+s(horsepower)+s(weight)+s(year),data=Auto)
summary(model2)
##
## Family: gaussian
##
## Formula:
## acceleration ~ s(mpg) + s(displacement) + s(horsepower) + s(weight) +
##     s(year)
##
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.54133    0.07203   215.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
##                   edf Ref.df      F p-value
## s(mpg)          5.578  6.726  2.749  0.0106 *
## s(displacement) 2.251  2.870 13.757 3.5e-08 ***
## s(horsepower)   4.936  6.054 66.476 < 2e-16 ***
## s(weight)       3.444  4.397 34.441 < 2e-16 ***
## s(year)         1.682  2.096  0.543  0.6064
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) =  0.733   Deviance explained = 74.5%
## GCV = 2.1368  Scale est. = 2.0338    n = 392
#model1 GCV
model1$gcv.ubre ## GCV.Cp ## 2.127589 #model2 GCV model2$gcv.ubre
##   GCV.Cp
## 2.136797

As you can see, the second model has a higher GCV score when compared to the first model. This indicates that the first model is a better choice. This makes sense because in the second model the variable “year” is not significant. To confirm this we will calculate the AIC scores using the AIC function.

AIC(model1,model2)
##              df      AIC
## model1 18.04952 1409.640
## model2 19.89068 1411.156

Again, you can see that model1 s better due to its fewer degrees of freedom and slightly lower AIC score.

Conclusion

Using GAMs is most common for exploring potential relationships in your data. This is stated because they are difficult to interpret and to try and summarize. Therefore, it is normally better to develop a generalized linear model over a GAM due to the difficulty in understanding what the data is trying to tell you when using GAMs.

# Wire Framing with Moodle

Before teaching a Moodle course it is critical that a teacher design what they want to do. For many teachers, they believe that they begin the design process by going to Moodle and adding activity and other resources to their class. For someone who is thoroughly familiar with Moodle and have developed courses before this might work. However, for the majority online teachers they need to wire frame what they want their moodle course to look like online.

Why Wire frame a Moodle Course

In the world of  web developers a wire frame is a prototype of what a potential website will look like. The actual wire frame can be made in many different platforms from Word, powerpoint, and even just paper and pencil. Since Moodle is online a Moodle course in many ways is a website so wire framing applies to this context.

It doesn’t matter how a you wire frames their Moodle course. What matters is that you actually do this. Designing what you want to see in your course helps you to make decisions much faster when you are actually adding activities and resources to your Moodle course. It also helps your Moodle support to help you if they have a picture of what the you wants rather than wild hand gestures and frustration.

Wire farming a course also reduces the cognitive load on the teacher. Instead of designing and building the course a the same time. Wire framing splits this task into two steps, which are designing, and then building. This prevents extreme frustration as it is common for a teacher just to stare at the computer screen when trying to design and develop a Moodle course simultaneously.

You never see and architect making his plans while building the building. This would seem careless and even dangerous because the architect doesn’t even know what he wants while he is throwing around concrete and steel. The same analogy applies with designing Moodle courses. A teacher must know what they want, write it down, and then implement it by creating the course.

Another benefit of planning in Word is that it is easier to change things in Word when compared to Moodle. Moodle is amazing but it is not easy to use for those who are not tech-savvy. However, it’s easiest for most of us to copy, paste, and edit in Word.

One Way to Wire Frame a Moodle Course

When supporting teachers to wire frame a Moodle course, I always encourage them to start by developing the course in Microsoft Word. The reason being that the teacher is already familiar with Word and they do not have to struggle to make decisions when using it. This helps them to focus on content and not on how to use Microsoft Word.

One of the easiest ways to wire frame a Moodle course is to take the default topics of a course such as General Information, Week 1, Week 2, etc. and copy these headings into Word, as shown below.

Now, all that is needed is to type in using bullets exactly what activities and resources you want in each section. It is also possible to add pictures and other content to the Word document that can be added to Moodle later.  Below is a preview of a generic Moodle sample course with the general info and week 1 of the course completed.

You can see for yourself how this class is developed. The General Info section has an image to serve as a welcome and includes the name of the course. Under this the course outline and rubrics for the course. The information in the parentheses indicate what type of module it is.

For Week 1, there are several activities. There is a forum for introducing yourself. A page that shares the objectives of that week. Following this are the readings for the week, then a discussion forum, and lastly an assignment. This process completes for however many weeks are topics you have in the course.

Depending on the your need to plan, you can even planned other pages on the site beside the main page. For example, I can wire frame what I want my “Objectives” page to look like or even the discussion topics for my “Discussion” forum.

Of course, the ideas for all these activities comes from the course outline or syllabus that was developed first. In other words, before we even wire frame we have some sort of curriculum document with what the course needs to cover.

Conclusion

The example above is an extremely simple way of utilizing the power of wire framing. With this template, you can confidently go to Moodle and find the different modules to make your class come to life. Trying to conceptualize this in your head is possible but much more difficult. As such, thorough planning is a hallmark of learning.

# Creating Assignments in Moodle Video

Below is a simple and brief video on how to create assignments in Moodle

Advanced ESL students have their own unique set of traits and challenges that an ESL teacher must deal. This post will explain some of these unique traits as well as how to support advanced ESL students.

By this point, the majority of the language processing is automatic. This means that the teacher no longer needs to change the speed at which they talk in most situations.

In addition, the students have become highly independent. This necessitates that the teacher focus on supporting the learning experience  of the students rather than trying to play a more directive role.

The learning activities used in the classroom can now cover a full range of possibilities. Almost all causal reading material is appropriate. Study skills can be addressed at a much deeper level. Such skills as skimming, scanning, determining purpose, etc. can be taught and addressed in the learning. Students can also enjoy debates and author opinion generating experiences.

The Challenges of Advanced ESL Students

One of the challenges of advanced students is they often have a habit of asking the most detailed questions about the most obscure aspects of the target language. To deal this requires a PhD in linguistics or the ability to know what the students really need to know and steer away from mundane grammatical details. It is very tempting to try and answer these types of questions but the average native-speaker does not know all the details of imperfect past tense but rather are much more adept at using it.

Another frustrating problem with advanced students is the ability to continue to make progress in their language development. With any skill, as one gets closer to mastery, the room for improvement becomes smaller and smaller. To move from an advanced student to a superior student takes make several small rather than sweeping adjustments.

This is one reason advanced students often like to ask those minute grammar questions. These small question is where they know they are weak when it comes to communicating. This can be especially stressful if the student is a few points away from reaching some sort of passing score on an English proficiency exam (IELTS, TOEFL, etc.). Minor adjustments need to reach the minimum score are difficult to determine and train.

Conclusion

After beginners, teaching advanced esl students is perhaps the next most challenging teaching experience. Advanced ESL students have a strong sense of what they know and do not know. What makes this challenging is the information they need to understand can be considered some what specializes and not easy to articulate for many teachers.

# Using Groups and Groupings in Activities in Moodle

Making groups and groupings are two features in Moodle that can be used for collaboration and or for organizational purposes in a class. This post will provide examples of how to use groups in an activity in Moodle

Using Groups/Groupings in a Forum

Groups and Groupings can be used in a Forum in order to allow groups to interact during a discussion topic. It is assumed that you already know how to make a forum in Moodle.  Therefore, the instruction in this post will start from the settings window for forums in Moodle.

1.  The option that we need to adjust to use groups in forums is the “Common Module Settings”. If you click on this, you will see the following.

2. Depending on your goals there are several different ways that groups can be used.

• Group mode can be set to visible or separate groups. If groups are visible different groups can see each others discussion but they can only post in their own groups discussion.
• If separate group is selected. Groups will only be able to see their own group’s discussion and no other.
• If the grouping feature is used. Only the groups that are a part of that grouping are added to the forum. The group mode determines if the groups can see each other or not.

In this example we will select group mode set to “visible groups” and groupings to “none once you click “save and display” you will see the following.

3. To see what each group said in their discussion click “all participants” and a drop down menu will be displayed that shows each group.

Using Grouping for Assignments

To use groups in assignments you repeat the steps above. In this example, we will use the grouping feature.

1. The features are viewable in the picture below. I selected “separate groups” and I selected the grouping I wanted. This means only groups in the grouping will have this assignment available to them

2. Another set a features you want to set for an assignment is the “group submission settings”. The options are self-explanatory but here is what I selected.

3. Click save a display and you will see the following

The red messages just states that some people are in more than one group or not in any group. For this example, this is not a problem as I did not assign all students to a group.

Conclusion

The concepts presented here for forums and assignments apply to most activities involving groups in Moodle. Group is very useful for large classes in which students need a space in which they can having meaningful communication with a handful of peers.

# Making Groups in Moodle

One of the many features available for teachers to use is the group mode for activities within a course in Moodle. This post will look at how to setup groups in a Moodle course.

What to Use the Group Mode For?

As with other features in Moodle, the challenge with the group mode is that you can use it for almost anything. The unlimited variety in terms of the application of the group mode makes it challenge for novices to understand and appreciate it. This is because as humans we often want a single clear way  to use something. Below are several different ways in which the group mode can be used in a Moodle course.

• If the same Moodle course is used for two or more different sections the group mode can be used to put students in the same moodle course into different groups by section. For example, if a teacher is teaching two sections of English 101, section 1 would be one group and section 2 would be the other group.
• Groups can also be used so that only certain groups see certain things in a Moodle course. In Moodle, you can limit who sees what be restricting to a certain group.
• A more traditional use is to have students placed in groups to complete group assignments. Placing them in groups allows the group to submit one assignment that Moodle gives all members of the group credit for when it is marked.

If this is not confusing enough, you can also have students in several different groups simultaneously if you wanted. Therefore, whenever you are trying to use Moodle you need to consider what your goal is rather than whether it is possible to do it in Moodle. As stated before, the problem is the flexibility of Moodle and not its inability to facilitate a learning task.

In this post, we are only going to learn how to make groups. In a future post, we will look at using groups in terms of teaching and assignments.

Creating Groups in Moodle

1. After logging into Moodle and selecting a course, you need to go to course administration->users->groups. If you do this correctly you should see the following

2. There are several things to mention before continuing

First, there are two different ways to create groups. You can create them manually by clicking on “create groups” or you can have Moodle make the groups using the “Auto-create groups” button. The auto-group option will be explained in a later post as welling as the grouping feature.

Second, there is a tab called “grouping” this is a feature that allows you to create a group of groups. In other words, several groups can be assigned to a grouping.  This allows you to assign several groups to an activity simultaneously rather than having to add each on manually. This is a great feature for a course that has two sections and each section has group activities. For now we will learn how to make groups manually.

Lastly, the column on the left, called “groups” will display the name of any groups that are created while the column on the left, called “members of” will contain the names of people who are a part of the group. Right now both are empty because there are no groups yet.

3. Click on the “create group” group button and you will see the following.

4. You now need to give the group a name. You also have the privilege to add other information if you want such as description or even a picture to represent the group. After providing the needed information you need to click “save changes” in order to see the following.

5. To add members to our practice group we need to click on the “add/remove” button. After doing this, you will see the following.

6. There are two columns, “potential members” and “group members.” To add people to the “group members” section just highlight whoever you want in the “potential members” side and click “add”. Below is an example of this

Just a note, at the bottom of both the “group member” and “potential member” list is a search function that can be used to find specific people in either section.

7. After placing people in the group, you can click on the “back to group” button. You will see the following.

The group name is displayed on the left and the members of the group are displayed on the right.

Conclusion

In this post we learned how to create groups. However, we have not learned yet how to use groups in a moodle course yet. This will be explained in a future post.

# Inquiry Learning

From the archives

Inquiry learning is form of indirect instruction. Indirect instruction is teaching in which the students are actively involved in their learning by seeking solutions to problems or questions. In inquiry learning, students develop and investigate questions that they may have. The focus in inquiry learning is on what the students want to learn with some support from the teacher about a topic. Below are the steps of inquiry learning.

2. Investigate
3. Create
4. Discuss
5. Reflect

The teacher begins this process by taking the topic of the lesson and turning it into a question for the students to consider. For example, if the topic of a lesson is about flowers, a question to ask would be “How are flowers different from each other?” This is called the teacher-initiated stage of asking.

The student then develop their own questions that should help to answer the main question posed by the…

View original post 344 more words

# Teaching Beginning ESL Students

Beginning ESL students have unique pedagogical needs that make them the most difficult to teach. It’s similar to the challenge of teaching kindergarten. The difficulty is not the content  but rather stripping what is already a basic content into something that is understandable for the most undeveloped of students. Some of the best teachers cannot do this.

This post will provide some suggestions on how to deal with beginning ESL students.

Beginning students need a great deal of repetition. If you have ever tried to learn a language you probably needed to hear phrases many times to understand them. Repetition helps students to remember and imitate what they heard.

This means that the teacher needs to limit the amount of words, phrases, and sentences they teach. This is not easy, especially for new teachers who are often put in charge of teaching beginners and race through the curriculum to the frustration of the beginning students.

Repetition and a slow pace helps students to develop the automatic processing they need in order to achieve fluency. This can also be enhanced by focusing on purpose in communication rather than the grammatical structure of language.

The techniques use din class should short and simple with a high degree variety to offset the boredom of repetition. In other words, find many ways to teach one idea or concept.

Who’s the Center

Beginning students are highly teacher-dependent because of their lack of skills. Therefore, at least initially, the classroom should probably be teacher-centered until the students develop some basic skills.  In general, whenever you are dealing with a new subject the students are totally unfamiliar with it is better to have a higher degree of control of the learning experience.

Being the center of the learning experiences requires the teacher to provide most of the examples of well-spoken, written English. Your feedback is critical for the students to develop their own language skills. The focus should be more towards fluency rather than accuracy.

However, with time cooperative and student-centered activities can become more prominent. In the beginning, too much freedom can be frustrating for language learners who lack any sort of experience to draw upon to complete activities. Within a controlled environment, student creativity can blossom.

Conclusion

Being a beginning level ESL teacher is a tough job. It requires a skill set of patience, perseverance, and a gift at simplicity.  Taking your time and determining who the center of learning is are ways in which to enhances success for those teaching beginners

# Learner-Centered Instruction

Learner-centered instruction is a term that has been used in education for several decades now. One of the challenges of extremely popular terms in a field such as learner-centered instruction is that the term losses its meaning as people throw it into a discussion with  knowing  exactly what the term means.

The purpose of this post is to try and explain some of the  characteristics of learner-centered instruction without being exhaustive.  In addition, we will look at the challenges to this philosophy as well as ways to make it happen in the classroom.

Focus on the Students

Learner-centered instruction is focused on  the students. What this means is that the teacher takes into account the learning goals and objectives of the students in establishing what to teach. This requires the teacher to conduct at least an informal needs assessment to figure out what the students want to learn.

Consultation with the students allows for the students to have some control over their learning which is empowering as viewed by those who ascribe to critical theory. Consultation also allows students to be creative and innovative. This sounds like a perfect learning situation but to be  this centered  on the learner can be difficult

Challenge of Learner-Centered Instruction

Since the learning experience is determined by the students, the teacher does not have any presupposed plan in place prior to consulting with the students. As such, not having a plan in  place before hand is extremely challenging for new teachers and difficult even for experienced ones. The teacher doesn’t know what to expect in terms of the needs of the students.

In theory, almost no class follows such a stringent approach to learner-centered instruction. Most schools have to meet government requirements, prepare students for the work place, and or show improvements in testing. This limits the freedom of the teacher to be learner-centered in many ways. External factors cannot be  ignored to adhere to the philosophy of learner-centered instruction.

Finding a Compromise

One way to be learner-centered while still having some sort of a plan prior to teaching is to rethink the level at which the students have voice in the  curriculum. For example, if it is not possible to change the objectives of a course, the teacher  can have the students develop the assignments they want to do to achieve an objective.

The teacher could also allow the students to pick from several different assignments that all help to achieve the same objective(s). This gives the students some control over their learning while allowing the teacher to adhere to external requirements. It also allows the teacher to be prepared in some way prior to the teaching.

Conclusion

The average educator does not have the autonomy to give to students to allow for the full implementation of learner-centered instruction. However, there are several ways to adjust one’s approach to teaching that will allow students to have a sense of control over their learning.

# Axis and Labels in ggplot2

In this post, we will look at how to manipulate the labels and positioning of the data when using ggplot2. We will use the “Wage” data from the “ISLR” package. Below is initial code needed to begin.

library(ggplot2);library(ISLR)
data("Wage")

Manipulating Labels

Our first example involves adding labels for the x, y axis as well as a title. To do this we will create a histgram of the wage variable and save it as a variable in R. By saving the histogram as a variable it saves time as we do not have to recreate all of the code but only add the additional information. After creating the histogram and saving it to a variable we will add the code for creating the labels. Below is the code

myHistogram<-ggplot(Wage, aes(wage, fill=..count..))+geom_histogram()
myHistogram+labs(title="This is My Histogram", x="Salary as a Wage", y="Number")

By using the “labs” function you can add a title and information for the x and y axis. If your title is really long you can use the code “” to break the information into separate lines as shown below.

myHistogram+labs(title="This is the Longest Title for a Histogram \n that I have ever Seen in My Entire Life", x="Salary as a Wage", y="Number")

Discrete Axis Scale

We will now turn our attention to working with discrete scales. Discrete scales deal with categorical data such as boxplots and bar charts. First, we will store a boxplot of the wages subsetted by level of education in a variable and we will display it.

myBoxplot<-ggplot(Wage, aes(education, wage,fill=education))+geom_boxplot()
myBoxplot

Now, by using the “scale_x_discrete” function along with the “limits” argument we are able to change the order of the gorups as shown below

myBoxplot+scale_x_discrete(limits=c("5. Advanced Degree","2. HS Grad","1. < HS Grad","4. College Grad","3. Some College"))

Continuous Scale

The most common modification to a continuous scale is to modify the range. In the code below, we change the default range of “myBoxplot” to something that is larger.

myBoxplot+scale_y_continuous(limits=c(0,400))

Conclusion

This post provided some basic insights into modifiying plots using ggplot2.

# Pie Charts and More Using ggplot2

This post will explain several types of visuals that can be developed in using ggplot2. In particular, we are going to make three specific types of charts and they are…

• Pie chart
• Bullseye chart
• Coxcomb diagram

To complete this task, we will use the “Wage” dataset from the “ISLR” package. We will use the “education” variable which has five factors in it. Below is the initial code to get started.

library(ggplot2);library(ISLR)
data("Wage")

Pie Chart

In order to make a pie chart, we first need to make a bar chart and add several pieces of code to change it into a pie chart. Below is the code for making a regular bar plot.

ggplot(Wage, aes(education, fill=education))+geom_bar()

We will now modify two parts of the code. First, we do not want separate bars. Instead, we want one bar. The reason being is that we only want one pie chart so before that we need one bar. Therefore, for the x value in the “aes” function, we will use the argument “factor(1)” which tells R to force the data as one factor on the chart thus making one bar. We also need to add the “width=1” inside the “geom_bar” function. This helps with spacing. Below is the code for this

ggplot(Wage, aes(factor(1), fill=education))+geom_bar(width=1)

To make the pie chart, we need to add the “coord_polar” function to the code which adjusts the mapping. We will include the argument “theta=y” which tells R that the size of the pie a factor gets depends on the number of people in that factor. Below is the code for the pie chart.

ggplot(Wage, aes(factor(1), fill=education))+
geom_bar(width=1)+coord_polar(theta="y")

By changing the “width” argument you can place a circle in the middle of the chart as shown below.

ggplot(Wage, aes(factor(1), fill=education))+
geom_bar(width=.5)+coord_polar(theta="y")

Bullseye Chart

A bullseye chart is a pie chart that shares the information in a concentric way. The coding is mostly the same except that you remove the “theta” argument from the “coord_polar” function. The thicker the circle the more respondents within it. Below is the code

ggplot(Wage, aes(factor(1), fill=education))+
geom_bar(width=1)+coord_polar()

Coxcomb Diagram

The Coxcomb Diagram is similar to the pie chart but the data is not normalized to fit the entire area of the circle. To make this plot we have to modify the code to make the by removing the “factor(1)” argument and replacing it with the name of the variable and be reading the “coord_polor” function. Below is the code

ggplot(Wage, aes(education, fill=education))+
geom_bar(width=1)+coord_polar()

Conclusion

These are just some of the many forms of visualizations available using ggplot2. Which to use depends on many factors from personal preference to the needs of the audience.

# Providing Quiz Feedback in Moodle

Like all of it’s other features in Moodle, the quiz module has so many options as to make it difficult to use. In this post, we are going to look at providing feedback to students for their participation in a quiz.

In the example used in this post, we are going to use a quiz that was already developed in a prior post as the example for this blogpost.

The first step is to click on “edit settings” to display all of the various options available for the quiz. Once there, you want to scroll down to “review options”. After doing this you will see the following

As you can see, there are four columns and under each column there are 7 choices. The columns are about the timing of the feedback. Feedback can happen immediately after an attempt, it can happen after the student finishes the quix but is still available for others to take, or it can happen after everyone has taken the quiz and the quiz is no longer available.

Which type of timing you pick depends on your goals. If the quiz is for learning and not for assessment perhaps “immediately after the attempt” is best. However, if this is a formal summative assessment it might be better to provide feedback after the quiz is closed.

The options under each column are the same. By clicking on the question mark you can get a better explanation of what it is.

Overall Feedback

One important feedback feature is “Overall Feedback”. This tells the student a general idea of their understanding. You can set it up so that different overall feedback is given based on their score. Below is a screen shot of overall feedback

In the example, the first boundary is for scores of 100 and above and the second boundary is for scores 1-99. Students who get 100 know they are OK while students with less than 100 will get a different feedback. You have to add boundaries manually. Also, remember to add the percent sign after the number

General Feedback and Specific Feedbackfor a Question

General feedback for a question is the feedback a person gets regardless of their answer. To find this option you need to either create a question or edit a questions.

Specific feedback depends on the answer they pick. Below is a visual of both general and specific feedback.

Below is an example of the feedback a student would get taking the example quiz in this post. In the picture below, the student got the question wrong and received the feedback for an incorrect response.

Conclusion

The quiz module is a great way to achieve many different forms of assessment online. Whether the assessment is formative or summative the quiz module is one option. However, due to the complex nature of Moodle it is important that a teacher knows exactly what they want before attempting to use the quiz module.

# Making Quiz Questions in Moodle

One of Moodle’s many features is the quiz activity, which allows a teacher to assess a student’s knowledge in a variety of ways. However, before developing a quiz, a teacher needs to have questions developed and ready to be incorporated into the quiz.

The purpose of this post is to explain how to develop questions that are available to be used in a quiz.

Make a Category

When making questions it is important to be organized and this involves making categories in which to put your questions. To do this you need to click on course administrator|questions bank|Categories. After doing this you will see something similar to the image below.

You want to click add category and type a name for your category. In the picture below we named the category “example”. When you are finished click “add category and you will see the following.

Finding the Question Bank

Now that we have a question category we need to go to the question bank. To do so click on  course administrator|question bank. You should see something similar to the following.

Select the “example” category you made and then “click create new question.” You should see the following.

As you can see, there are many different forms of questions available. The type of questions you should ask depends on many factors. For now, we will make a true or false example question. Once you select the option for T/F question you will see the following.

The question name is for identifying the question in the bank and not on the quiz. Therefore, avoid calling your questions “question 1, 2,3 etc.” because if you have multiply quizzes you will not know which question one to take from your bank. You need to develop some sort of cataloging system for your questions such as the following

## 1 Q1 TF 2016

This means the following

• 1 means this is number 1
• Q1 means this is quiz 1
• TF means the question is true false
• 2016 is the year the question was developed

How you do this is your own decision and this is just an example.

The other boxes on this page are self-explanatory. General feedback is what the student receives whether they are right or wrong. The other feedback is given depending on the response. After making a question selecting if it is true or false you will see the following.

In a future post, we will learn how to take questions from the question bank and incorporate them into an actually quiz.

# See the #fireworks I created by blogging on #WordPressDotCom. My 2015 annual report.

See the fireworks educationalresearchtechniques created by blogging on WordPress.com. Check out their 2015 annual report.