See the fireworks educationalresearchtechniques created by blogging on WordPress.com. Check out their 2015 annual report.
Source: See the #fireworks I created by blogging on #WordPressDotCom. My 2015 annual report.
See the fireworks educationalresearchtechniques created by blogging on WordPress.com. Check out their 2015 annual report.
Source: See the #fireworks I created by blogging on #WordPressDotCom. My 2015 annual report.
Many countries in the world have a national and an official language. The origins of these two distinctions are wrap in politics, history, and culture.
A national language is a language with a political, cultural, and social unit connected with it. An official language is a language used by the government of a country. However, both of these terms are used for politic means in many countries.
A national language is often used to unite the people. Examples of this include Japanese in Japan, French in France, and even English in Great Britain. Each country has a complex history behind its selection of a national language.
The process of developing a national language involves four steps which are…
Selection
Selecting a language to serve as the national language is a political process. Picking the wrong language could rip a nation a part. Different countries have approached this in different ways. Indonesia selected a Malay pidgin as its national language to unite its country. The Philippines choose Tagalog or Filipino as their national language, which was met with great resistance.
Codification
Codification involves standardizing the language. This involves the development of grammar rules and dictionaries. American English was heavily influenced by Noah Webster and his work in developing dictionaries. Webster specifically wanted to develop an American dialect of English in order to unify the new country.
Elaboration
Elaboration is the process of extending the language into new domains such as academics, medicine, or some other field. Many languages, pidgins, and or creoles, do not have ways of communicating highly abstract terms. In order to serve as an official language, terms need to be developed to handle any form of communication.
Acceptance
After developing a language in order for it to become the national language, steps must be taken to convince the people to use it. This is often done through a combination of propaganda and follows the leader. When government officials use the language locals often begin to follow.
Conclusion
The use of a language by a nation has a complex process that involves several steps. Every country has some story behind the development of its language. This rarely does not happen by chance.
Grounded theory is a systematic approach to qualitative research that involves the development of theory or the description of a process/action. The key characteristic of grounded theory is the systematic nature of it. This in contrast to most qualitative methods that are highly flexible in how a researcher can go about collection and analysis of data.
Due to its structured nature, grounded theory is an excellent beginning point for those who are interested in qualitative research. This is especially true for those who come from a quantitative background in which the steps of conducting research are clear.
However, there is some disagreement in conducting grounded theory as there are several different approaches that vary in the amount of structure they provided. In this post, we will look specifically at the grounded theory design know as the systematic approach.
Systematic Approach
The systematic approach to grounded theory focuses heavily on inductive thinking. In many ways, the researcher starts with the most specific information they collected and summarize and move to the most abstract characteristics they were able to find through analyzing the data. This experience involves three steps in the coding process.
Open coding involves making the initial categories in which to place the data. For example, let’s say you are looking at how principals support their teachers in professional development. You notice that several teachers share how the principals serve a leadership role in their professional development. This information of the principals as leaders in professional development could serve as a category.
A Category can also have dimensionalized properties. This means that there is a continuum on which the trait is seen. For example, a principal can be one of several types of leaders in professional development. He can be a dictator or at the other extreme, he can be laissez faire. Both of these are examples of leadership and there would be many examples in-between.
Axial Coding
Step two involves axial coding. This involves taking one of your categories and making it the central phenomenon of the study. For example, if you are convinced that the heart of professional development for teachers is the leadership of the principal this would become the central phenomenon. All the other categories are one of the following.
Another name for this is the coding paradigm. So an example is attached Doc1
The attachment shows the coding for systematic grounded theory and the interrelation among the various factors. It is very similar to developing a statistical model which is why grounded theory is an excellent starting point for first-time qualitative researchers.
Selective Coding
Selective coding involves taking the coding paradigm and converting it to written text. It involves writing out the storyline in which the process happens and providing an explanation. It is not at all easy to take all of the information involved with interviews, developing a paradigm, and finally writing this down in coherent language.
Conclusion
Grounded theory is an established qualitative method. This method involves three steps that take data, diagrams it, and lastly, describes a process using words.
Prediction is one of the key concepts of machine learning. Machine learning is a field of study that is focused on the development of algorithms that can be used to make predictions.
Anyone who has shopped online at has experienced machine learning. When you make a purchase at an online store, the website will recommend additional purchases for you to make. Often these recommendations are based on whatever you have purchased or whatever you click on while at the site.
There are two common forms of machine learning, unsupervised and supervised learning. Unsupervised learning involves using data that is not cleaned and labeled and attempts are made to find patterns within the data. Since the data is not labeled, there is no indication of what is right or wrong
Supervised machine learning is using cleaned and properly labeled data. Since the data is labeled there is some form of indication whether the model that is developed is accurate or not. If the is incorrect then you need to make adjustments to it. In other words, the model learns based on its ability to accurately predict results. However, it is up to the human to make adjustments to the model in order to improve the accuracy of it.
In this post, we will look at using R for supervised machine learning. The definition presented so far will make more sense with an example.
The Example
We are going to make a simple prediction about whether emails are spam or not using data from kern lab.
The first thing that you need to do is to install and load the “kernlab” package using the following code
install.packages("kernlab")
library(kernlab)
If you use the “View” function to examine the data you will see that there are several columns. Each column tells you the frequency of a word that kernlab found in a collection of emails. We are going to use the word/variable “money” to predict whether an email is spam or not. First, we need to plot the density of the use of the word “money” when the email was not coded as spam. Below is the code for this.
plot(density(spam$money[spam$type=="nonspam"]), col='blue',main="", xlab="Frequency of 'money'")
This is an advance R post so I am assuming you can read the code. The plot should look like the following.

As you can see, money is not used to frequently in emails that are not spam in this dataset. However, you really cannot say this unless you compare the times ‘money’ is labeled nonspam to the times that it is labeled spam. To learn this we need to add a second line that explains to us when the word ‘money’ is used and classified as spam. The code for this is below with the prior code included.
plot(density(spam$money[spam$type=="nonspam"]), col='blue',main="", xlab="Frequency of 'money'") lines(density(spam$money[spam$type=="spam"]), col="red")
Your new plot should look like the following

If you look closely at the plot doing a visual inspection, where there is a separation between the blue line for nonspam and the red line for spam is the cutoff point for whether an email is spam or not. In other words, everything inside the arc is labeled correctly while the information outside the arc is not.
The next code and graph show that this cutoff point is around 0.1. This means that any email that has on average more than 0.1 frequency of the word ‘money’ is spam. Below is the code and the graph with the cutoff point indicated by a black line.
plot(density(spam$money[spam$type=="nonspam"]), col='blue',main="", xlab="Frequency of 'money'") lines(density(spam$money[spam$type=="spam"]), col="red") abline(v=0.1, col="black", lw= 3)
Now we need to calculate the accuracy of the use of the word ‘money’ to predict spam. For our current example, we will simply use in “ifelse” function. If the frequency is greater than 0.1.
We then need to make a table to see the results. The code for the “ifelse” function and the table are below followed by the table.
predict<-ifelse(spam$money > 0.1, "spam","nonspam") table(predict, spam$type)/length(spam$type)
predict nonspam spam nonspam 0.596392089 0.266898500 spam 0.009563138 0.127146273
Based on the table that I am assuming you can read, our model accurately calculates that an email is spam about 71% (0.59 + 0.12) of the time based on the frequency of the word ‘money’ being greater than 0.1.
Of course, for this to be true machine learning we would repeat this process by trying to improve the accuracy of the prediction. However, this is an adequate introduction to this topic.
Pidgin and Creole are two common terms used in linguistics to describe a language. This post will define and explain some of the characteristics of these two linguistic terms
Pidgin
A pidgin is a language that does not have any native speakers. In other words, it is a younger language that is developed as a means of communicating between two groups who do not speak the same language.
Pidgins are frequently developed for business and trading. Buying and selling and other transactions are reasons for the development of a pidgin. Pidgins are not used as a form of group identification but rather for practical communication.
A pidgin is also the combination of two different languages. The language that provides the majority of the vocabulary is called the superstrate and the minority language is called the substrate.
Pidgins are highly simplified in their grammar and syntax. For example, pidgins are often missing affixes, inflections, and a smaller vocabulary compared to other languages.
A pidgin usually sounds ridiculous to a speaker of either of the two languages it is derived from. As such, they are often difficult to learn for a speaker of either the superstrate or substrate language to learn as they do not follow the normal rules of grammar as found in the superstrate or substrate language.
There are many pidgins in the world today. Many came as a result of slavery in the western hemisphere. Slaves came from different parts of Africa and often could not communicate without developing a pidgin.
In Asia, most countries have or had some form of pidgin English such as Thailand “Tinglish”, Japan has “Japanese Bamboo English.” Over time, many pidgins mature into what we call creoles.
Creole
A creole is a pidgin that now has native speakers. Children grow speaking a creole as their first language. There are also other differences between a pidgin and creole.
Since it is the first language of a group, creoles are used in many more areas of life and have a much richer structure. Furthermore, a creole has a much more standardized grammar rules.
People’s attitudes towards a creole are often different as well. Since it is the first language of many people, there is a sense of pride over using the language. A creole can also be used to identify members of a group. This was not possible with a pidgin as pidgins serve as a way of communicating between two groups while creoles are for communicating both between groups and within a group.
Examples of creoles include “Manglish” (Malaysian English), “Singlish” (Singaporean English) and “Taglish” (Tagalog English).
Conclusion
Pidgins and creoles serve the purpose of communicating among people groups who have different languages. With time a pidgin may become a creole if native speakers of a pidgin develop.
Survey design is used to describe the opinions, beliefs, behaviors, and or characteristics of a population based on the results of a sample. This design involves the use of surveys that include questions, statements, and or other ways of soliciting information from the sample. This design is used for descriptive purpose primarily but can be combined with other designs (correlational, experimental) at times as well. In this post, we will look at the following.
Types of Survey Design
There are two common forms of survey design which are cross-sectional and longitudinal. A cross-sectional survey design is the collection of data at one specific point in time. Data is only collected once in a cross-sectional design.
A cross-sectional design can be used to measure opinions/beliefs, compare two or more groups, evaluate a program, and or measure the needs of a specific group. The main goal is to analyze the data from a sample at a given moment in time.
A longitudinal design is similar to a cross-sectional design with the difference being that longitudinal designs require collection over time.Longitudinal studies involve cohorts and panels in which data is collected over days, months, years and even decades. Through doing this, a longitudinal study is able to expose trends over time in a sample.
Characteristics of Survey Design
There are certain traits that are associated with survey design. Questionnaires and interviews are a common component of survey design. The questionnaires can happen by mail, phone, internet, and in person. Interviews can happen by phone, in focus groups, or one-on-one.
The design of a survey instrument often includes personal, behavioral and attitudinal questions and open/closed questions.
Another important characteristic of survey design is monitoring the response rate. The response rate is the percentage of participants in the study compared to the number of surveys that were distributed. The response rate varies depending on how the data was collected. Normally, personal interviews have the highest rate while email request has the lowest.
It is sometimes necessary to report the response rate when trying to publish. As such, you should at the very least be aware of what the rate is for a study you are conducting.
Conclusion
Surveys are used to collect data at one point in time or over time. The purpose of this approach is to develop insights into the population in order to describe what is happening or to be used to make decisions and inform practice.
One of the strongest points of R in the opinion of many are the various features for creating graphs and other visualizations of data. In this post, we begin to look at using the various visualization features of R. Specifically, we are going to do the following
Using Plots
The ‘plot’ function is one of the basic options for graphing data. We are going to go through an example using the ‘islands’ data that comes with the R software. The ‘islands’ software includes lots of data, in particular, it contains data on the lass mass of different islands. We want to plot the land mass of the seven largest islands. Below is the code for doing this.
islandgraph<-head(sort(islands, decreasing=TRUE), 7)
plot(islandgraph, main = "Land Area", ylab = "Square Miles")
text(islandgraph, labels=names(islandgraph), adj=c(0.5,1))
Here is what we did
Below is what the graph should look like.

Changing Point Color and Shape in a Graph
For visual purposes, it may be beneficial to manipulate the color and appearance of several data points in a graph. To do this, we are going to use the ‘faithful’ dataset in R. The ‘faithful’ dataset indicates the length of eruption time and how long people had to wait for the eruption. The first thing we want to do is plot the data using the “plot” function.

As you see the data, there are two clear clusters. One contains data from 1.5-3 and the second cluster contains data from 3.5-5. To help people to see this distinction we are going to change the color and shape of the data points in the 1.5-3 range. Below is the code for this.
eruption_time<-with(faithful, faithful[eruptions < 3, ])
plot(faithful)
points(eruption_time, col = "blue", pch = 24)
Here is what we did

Conclusion
In this post, we learned the following
Diglossia literally means “two tongues.” This definition gives the impression that diglossia and bilingualism are the same thing. However, diglossia is a distinct form of bilingualism in that the use of the two languages are determined by the function.
A diglossia consists of a high and low language. The high language is used for specific purposes such as business transactions, ceremonies, and religious rites. The low language is used for everyday conversation. You would never hear a person use the low language for normal conversation.
The context in which the high and low languages are used are called domains. There are many different domains such as family, work, school, church, etc. Each of these domains calls for either the high or low language. For example, the high language may be used when speaking of politics while the low language may be used for speaking about sports.
There are several examples of diglossia in the world. In America, African Americans often have their own distinct form of English which functions as a low language. Regular or standard English would be the high language in this situation. At home, African American English is spoken and in public, a switch to standard English is often made.
There is often an interaction between diglossia and bilingualism in language. In general, there are four ways in which diglossia and bilingualism can interact in a community.
Below are examples of each
Diglossia and Bilingualism
An example of this is an African American community where the people can speak standard English (high language), African American English (low language) while also being fluent in another language like Spanish (second language).
Diglossia but not Bilingualism
Same as above, the African American community knows standard English as well as African American English but the community does not speak Spanish or any other language.
Bilingualism but no Diglossia
The African American community speaks standard English and also speaks another language, such as Spanish, but does not use African American English.
Neither Diglossia or Bilingualism
The African American community only speaks standard English and does not speak African American English or any other language such as Spanish.
Conclusion
Communities vary in their perception of their high and low languages. Some look down on the low language while using it while others are proud of the low language while feeling forced to learn the high. The points are that with diglossia, the use of a second language is connected to a particular social setting.
Correlational research is focused on examining the relationships among two or more variables. This information can be used either to explain a phenomenon or to make predictions. This post will explain the two forms of correlational design as well as the characteristics of correlational design in general.
Explanatory Design
An explanatory design seeks to determine to what extent two or more variables co-vary. Co-vary simply means the strength of the relationship of one variable to another. In general, two or more variables can have a strong, weak, or no relationship. This is determined by the product moment correlation coefficient, which is usually referred to as r. The r is measured on a scale of -1 to 1. The higher the absolute value the stronger the relationship.
For example, let’s say we do a study to determine the strength of the relationship between exercise and health. Exercise is the explanatory variable and health is the response variable. This means that we are hoping the exercise will explain health or you can say we are hoping that health responds to exercise. In this example, let’s say that there is a strong relationship between exercise and health with an r of 0.7. This literally means that when exercise goes up one unit, that health improves by 0.7 units or that the more exercise a person gets the healthier they are. In other words, when one increases the other increase as well.
Exercise is able to explain a certain amount of the variance (amount of change) in health. This is done by squaring the r to get the r-squared. The higher the r-squared to more appropriate the model is in explaining the relationship between the explanatory and response variable. This is where regression comes from.
This also holds true for a negative relationship but in negative relationships when the explanatory variables increase the response variable decreases. For example, let’s say we do a study that examines the relationship between exercise and age and we calculate an r of -0.85. This means that when exercise increases one unit age decreases 0.85. In other words, more exercises mean that the person is probably younger. In this example, the relationship is strong but indicates that the variables move in opposite directions.
Prediction Design
Prediction design has most of the same functions as explanatory design with a few minor changes. In prediction design, we normally do not use the term explanatory and response variable. Rather we have predictor and outcome variable as terms. This is because we are trying to predict and not explain. In research, there are many terms for independent and dependent variable and this is because different designs often use different terms.
Another difference is the prediction designs are focused on determining future results or forecasting. For example, if we are using exercise to predict age we can develop an equation that allows us to determine a person’s age based on how much they exercise or vice versa. Off course, no model is 100% accurate but a good model can be useful even if it is wrong at times.
What both designs have in common is the use of r and r square and the analysis of the strength of the relationship among the variables.
Conclusion
In research, explanatory and prediction correlational designs have a place in understanding data. Which to use depends on the goals of the research and or the research questions. Both designs have