Tag Archives: assessment

Discrete-Point and Integrative Language Testing Methods

Within language testing there has arisen over time at least two major viewpoints on assessment. Originally,  the view was that assessing language should look specific elements of a language or you could say that language assessment should look at discrete aspects of the language.

A reaction to this discrete methods came about with the idea that language is wholistic so testing should be integrative or address many aspects of language simultaneously. In this post, we will take a closer look at discrete and integrative language testing methods through providing examples of each along with a comparison.

Discrete-Point Testing

Discrete-point testing works on the assumption that language can be reduce to several discrete component “points” and that these “points” can be assessed. Examples of discrete-point test items in language testing include multiple choice, true/false, fill in the blank, and spelling.

What all of these example items have in common is that they usually isolate an aspect of the language from the broader context. For example, a simple spelling test is highly focus on the orthographic characteristics of the language. True/false can be used to assess knowledge of various grammar rules etc.

The primary criticism of discrete-point testing was its discreteness. Many believe that language is wholistic and that in the real world students will never have to deal with language in such an isolated way. This led to the development of integrative language testing methods.

Integrative Language Testing Methods

Integrative language testing is based on the unitary trait hypothesis, which states that language is indivisible. This is in complete contrast to discrete-point methods which supports dividing language into specific components.  Two common integrative language assessments includes cloze test and dictation.

Cloze test involves taking an authentic reading passage and removing words from it. Which words remove depends on the test creator. Normally, it is every 6th or 7th word but it could be more or less or only the removal of key vocabulary. In addition, sometimes potential words are given to the student to select from or sometimes the list of words is not given to the student

The students job is to look at the context of the entire story to determine which words to write into the blank space.  This is an integrative experience as the students have to consider grammar, vocabulary, context, etc. to complete the assessment.

Dictation is simply writing down what was heard. This also requires the use of several language skills simultaneously in a realistic context.

Integrative language testing also has faced criticism. For example, discrete-point testing has always shown that people score differently in different language skills and this fact has  been replicated in many studies. As such, the exclusive use of integrative language approaches is not supported by most TESOL scholars.


As with many other concepts in education the best choice between discrete-point and integrative testing is a combination  of both. The exclusive use of either will not allow the students to demonstrate mastery of the language.


Distributed Practice: A Key Learning Technique

A key concept in teaching and learning is the idea of distributed practice. Distributed practice is a process in which the teacher deliberately arranges for their students to practice a skill or use knowledge in many learning sessions that are short in length and distributed over time.

The purpose behind employing distributed practice is to allow for the reinforcement of the material in the students mind through experiencing the content several times. In this post, we will look at pros and cons of distributed practice as well as practical applications of this teaching technique

Pros and Cons

Distributed practice helps to maintain student motivation through requiring short spans of attention and motivation. For most students, it is difficult to study anything for long periods of time. Through constant review and exposure students become familiar with the content.

Another benefit is the prevention of mental and physical fatigue. This is related to the first point. Fatigue interferes with information processing. Therefore, a strategy that reduces fatigue can help in students’ learning new material.

However, there are times when short intense sessions are not enough to achieving mastery. Project learning may be one example. When completing a project, it often requires several long stretches of completing task that is not conducive to distributed practice.

Application Examples

When using distributed practice it is important to remember to keep the length of the practice short. This maintains motivation. In addition, the time between sessions should initial be short as well and lengthen as mastery develops. If the practice sessions are too far a part, students will forget.

Lastly, the skill should be practice over and over for a long period of time. How long depends on the circumstances. The point is that distributed practice takes a commitment to returning to a concept the students need to master over a long stretch of time.

One of the most practical examples of distributed practice may be in any curriculum that employs a spiral approach. A spiral curriculum is one in which key ideas are visited over and over through a year or even over several years of curriculum.

For our purposes, distributed practice is perhaps a spiral approach employed within a unit plan or over the course of a semester. This can be done in many ways such as.

  • The use of study guides to prepare for quizzes
  • Class discussion
  • Student presentations of key ideas
  • Collaborative project

The primary goal should be to employ several different activities that require students to return to the same material from different perspectives.


Distributed practice is a key teaching technique that many teachers employ even if they are not familiar with the term. Students cannot see any idea or skill once. The most be exposed several times in order to develop mastery of the skill. As such, understanding how to distribute practice is important for student learning.

Direct and Indirect Test Items

In assessment, there are two categories that most test items fall into which are direct and indirect test items. Direct test items ask the student to complete some sort of authentic action. Indirect test items measure a students knowledge about a subject. This post will provide examples of test items that are either direct or indirect items.

Direct Test Items

Direct test items used authentic assessment approaches. Examples in TESOL would include the following…

  • For speaking: Interviews and presentations
  • For writing: Essay questions
  • For reading: Using real reading material and having the student respond to question verbally and or in writing
  • For listening: Following oral directions to complete a task

The primary goal of direct test items is to be as much like real-life as possible. Often, direct testing items are integrative, which means that the student has to apply several skills at once. For example, presentations involve more than just speaking but also the writing of the speech, the reading or memorizing of the speech as well as the critical thinking skills to develop the speech.

Indirect Test Items

Indirect test items assess knowledge without authentic application. Below are some common examples of indirect test items.

  • Multiple choice questions
  • Cloze items
  • Paraphrasing
  • Sentence re-ordering

Multiple Choice

Multiple choice questions involve the use of a question followed by several potential answers. It is the job of the student to determine what is the most appropriate answer. Some challenges with writing multiple choice is the difficulty of writing incorrect choices. For ever correct answer you need several wrong ones. Another problem is that with training, students can learn how to improve their success on multiple choice test without having a stronger knowledge of the subject matter.

Cloze Items

Cloze item involve giving the student a paragraph our sentence with one or more blanks in it that the student have to complete. One problem with Cloze items is that more than one answer may be acceptable for a blank. This can lead to a great deal of confusion when marking the test.


Paraphrasing is strictly for TESOL and involves having the student rewrite a sentence in a slightly different way as the example below.

“I’m sorry I did not go to the assembly”

I wish________________________________

In the example above the student needs to write the sentence in quotes starting with the phrase “I wish.” The challenging is determining if the paraphrase is reasonable as this is highly subjective.

Sentence Re-Ordering

In this item for TESOL assessment, a student is given a sentence that is out of order and they have to arrange the words so that an understandable sentence is developed. This one way to assess knowledge of syntax. The challenge is that for complex sentences more than one answer may be possible

It is important to remember that all indirect items can be integrative or discrete-point. Unlike integrative, discrete point only measures one narrow aspect of knowledge at a time.


A combination of direct and indirect test items would probably best ensure that a teacher is assessing students so that they have success. What mixture of the two to use always depends on the context and needs of the studnets


Validity is often seen as a close companion of reliability. Validity is the assessment of the evidence that indicates that an instrument is measuring what it claims to measure. An instrument can be highly reliable (consistent in measuring something) yet lack validity. For example, an instrument may reliably measure motivation but not valid in measuring income. The problem is that an instrument that measures motivation would not measure income appropriately.

In general, there are several ways to measure validity, which includes the following.

  • Content validity
  • Response process validity
  • Criterion-related evidence of validity
  • Consequence testing validity

Content Validity

Content validity is perhaps the easiest way to assess validity. In this approach, the instrument is given to several experts who assess the appropriateness or validity of the instrument. Based on their feedback, a determination of the validity is determined.

Response Process Validity

In this approach, the respondents to an instrument are interviewed to see if they considered the instrument to be valid. Another approach is to compare the responses of different respondents for the same items on the instrument. High validity is determined by the consistency of the responses among the respondents.

Criterion-Related Evidence of Validity

This form of validity involves measuring the same variable with two different instruments. The instrument can be administered over time (predictive validity) or simultaneously (concurrent validity). The results are then analyzed by finding the correlation between the two instruments. The stronger the correlation implies the stronger validity of both instruments.

Consequence Testing Validity

This form of validity looks at what has happen to the environment after an instrument was administered. An example of this would be improved learning due to test. Since the the students are studying harder it can be inferred that this is due to the test they just experienced.


Validity plays an important role in the development of instruments in quantitative research. Which form of validity to use to assess the instrument depends on the researcher and the context that he or she is facing.

Assessing Reliability

In quantitative research, reliability measures an instruments stability and consistency. In simpler terms, reliability is how well an instrument is able to measure something repeatedly. There are several factors that can influence reliability. Some of the factors include unclear questions/statements, poor test administration procedures, and even the participants in the study.

In this post, we will look at different ways that a researcher can assess the reliability of an instrument. In particular, we will look at the following ways of measuring reliability…

  • Test-retest reliability
  • Alternative forms reliability
  • Kuder-Richardson Split Half Test
  • Coefficient Alpha

Test-Retest Reliability

Test-retest reliability assesses the reliability of an instrument by comparing results from several samples over time. A researcher will administer the instrument at two different times to the same participants. The researcher then analyzes the data and looks for a correlation between the results of the two different administrations of the instrument. in general, a correlation above about 0.6 is considered evidence of reasonable reliability of an instrument.

One major drawback of this approach is that often given the same instrument to the same people a second time influences the results of the second administration. It is important that a researcher is aware of this as it indicates that test-retest reliability is not foolproof.

Alternative Forms Reliability 

Alternative forms reliability involves the use of two different instruments that measure the same thing. The two different instruments are given to the same sample. The data from the two instruments are analyzed by calculating the correlation between them. Again, a correlation around 0.6 or higher is considered as an indication of reliability.

The major problem with this is that it is difficult to find two instruments that really measure the same thing. Often scales may claim to measure the same concept but they may both have different operational definitions of the concept.

Kuder-Richardson Split Half Test

The Kuder-Richardson test involves the reliability of categorical variables. In this approach, an instrument is cut in half and the correlation is found between the two halves of the instrument. This approach looks at  internal consistency of the items of an instrument.

Coefficient Alpha

Another approach that looks at internal consistency is the Coefficient Alpha. This approach involves administering an instrument and analyze the Cronbach Alpha. Most statistical programs can calculate this number. Normally, scores above 0.7 indicate adequate reliability. The coefficient alpha can only be used for continuous variables like Lickert scales


Assessing reliability is important when conducting research. The approaches discussed here are among the most common. Which approach is best depends on the circumstances of the study that is being conducted.

Reasons for Testing

Testing is done for many different reasons in various fields such as education,  business, and even government. There are many motivations that people have for using evaluation. In this post, we will look at four reasons that testing is done. The five reasons are…

  • For placement
  • For diagnoses
  • For assessing progress
  • For determining proficiency
  • For providing evidence of competency

For Placement

Placement test serve the purpose of determining at what level a student should be placed. There are often given at the beginning of a student’s learning experience at an institution, often before taking any classes. Normally, the test will consist of specific subject knowledge that a student needs to know in order to have success at a certain level.

For Diagnoses

Diagnostic test are for identifying weaknesses or learning problems. There similar to a doctor looking over a patient and trying to diagnose the patients health problem. Diagnostic test help in identifying gaps in knowledge and help a teacher to know what they need to do to help their students.

For Assessing Progress

Progress test are used to assess how the students are doing in comparison to the goals and objectives of the curriculum.  At the university level, these are the mid-terms and final exams that students take. How well the students is able to achieve the objects of the course is measured by progress test.

For Determining Proficiency 

Testing for proficiency provides a snapshot of the student is able to do right now. They do not provide a sign of weaknesses like diagnoses nor do they assess progress in comparison to a curriculum like progress test. Common examples of this type of test are test that are used to determine admission into a program such as the SAT, MCAT, or GRE.

For Providing Evidence of Proficiency 

Sometimes, people are not satisfied with traditional means of evaluation. For them, they want to see what the student can do by having the student through examining the students performance over several assignments over the course of a semester. This form of assessment  provides a way of having students produce work that demonstrates improvement in the classroom.

One of the most common forms of assessment that provides evidence of proficiency is the portfolio. In this approach, the students collect assignments that they have done over the course of the semester to submit. The teacher is able to see how the progress as he sees the students’ improvement over time. Such evidence is harder to track through using tests.


How to assess is best left for the teacher to decide. However, teachers need options that they can use when determining how to assess their students. The examples provided here give teachers ideas on what can assessment they can use in various situations.

Giving Feedback on Written Work

Marking papers and providing feedback is always a chore. However, nothing seems to be more challenging in teaching then providing feedback for written work. There are so many things that can go wrong when students write. Furthermore, the mistakes made are often totally unique to each student. This makes it challenging to try and solve problems by teaching all the students at once. Feedback for writing must often be tailor-made for each student. Doing this for a small class is doable but few have the luxury of teaching a handful of students.

Despite the challenge, there are several practical ways to streamline the experience of providing feedback for papers. Some ideas include the following

  • Structuring the response
  • Training the students
  • Understanding your purpose for marking

Structuring the Response

A response to a student should include the following two points

  1. What went well (positive feedback)
  2. What needs to improvement (constructive feedback)

The response should be short and sweet. No more than a few sentences. It is not necessary to report ever flaw to the student. Rather, point out the majors and deal with other problems later.

If it is too hard to try and explain what went wrong sometimes providing an example of a rewritten paragraph from the student’s paper is enough to give feedback. The student compares your writing with their own to see what needs to be done.

Training Students

Students need to know what you want. This means that clear communication about expectations saves time on providing feedback. Providing rubrics is one way of lessen a teacher’s workload. Students see the expectations for the grade they want and target those expectations accordingly. The rubric also helps the teacher to be more consistent in marking papers and providing feedback.

Peer-evaluation is another tool for saving time. Students are more likely to think about what they are doing when hearing it from peers. In addition, students can find some of the smaller problems, such as grammar, so that the teacher can focus on shaping the ideas of the paper. Depending on the maturity of the students, it is better to let them look at it before you invest any energy in providing feedback.

What’s Your Purpose

Many teachers will mark papers and try to catch everything every singly time. This means that they are looking at the flow of the paragraph, the connection of the main ideas, will also catching typos and grammatical mistakes. This approach is often overwhelming and extremely time-consuming. In addition, it is discouraging to students who receive papers that are covered in red.

Another approach is what is called selective marking. Selective marking is when a teacher focus only on specific issues in a paper. For example, a teacher might only focus on paragraph organization for a first draft and focus on the overall flow of the paper later. With this focus, the teacher and students can handle similar issues at the same time that are much more defined then checking everything at once.

Personally, I believe it is best to focus on macro issues such as paragraph organization and overall consistency first before focusing on grammatical issues. If the ideas are going in the right direction it is easy to spot grammar issues. In addition, if the students know English well, most grammar issues are irritating rather than completely crippling in understanding the thrust of the paper. However, perfect grammar without a thesis is a hopeless paper.


There is no reason to overwork ourselves in marking papers. Basic adjustments in strategy can lead to students who are provided feedback without a teacher over doing it.

Dealing with Mistakes and Providing Feedback

Students are in school to learn. We learn most efficiently when we make mistakes. Understanding how students make mistakes and the various types of mistakes that can happen can help teachers to provide feedback.

Julian Edge describes three types of mistakes

  • Slips-miscalculations that students make that they can fix themselves
  • Errors-Mistakes students cannot fix on their own but require assistance
  • Attempts-A student tries but does not yet know how to do it

It is the last two as a teacher that we are most concern. Helping students with errors and providing assistants with attempts is critical to the development of student learning.

Assessing Students

Students need to know at least two things whenever they are given feedback

  1. What they did well (positive feedback)
  2. What they need to do in order to improve (constructive feedback)

Positive feedback provides students with an understanding of what they have mastered. Whatever they did correctly are things they do not need to worry about for now. Knowing this helps students to focus on their growth areas.

Constructive feedback indicates to students what they need to work. It is not enough to tell students what is wrong. A teacher should also provide suggests on how to deal with the mistakes. The suggestions for improvement become the standard by which the student is judge in the future.

For example, if a student is writing an essay and is struggling with passive voice the teacher indicates what the problem is. After this, the teacher provides suggestions or even examples of switching from passive to active voice. Whenever the essay is submitted again the teacher looks for improve in this particular area of the assignment.

Ways of Giving Feedback

Below are some ways to provide feedback to students

  • Comments-A common method. The teacher writes on the assignment the positive and constructive feedback. This can be used in almost any situation but can be very time-consuming.
  • Grades-This approach is most useful for a summative assessment or when students are submitting something for the final time. The grade indicates the level of mastery that the student has achieved.
  • Self-evaluation-Students judge themselves. This is best done through providing them with a rubric so that they evaluate their performance. Very useful for projects and saves the teacher a great deal of time
  • Peer-evaluation-Same as above except peers evaluate the student instead of himself or herself.

Mistakes are what students do. It is the teacher’s responsibility to turn mistakes into learning opportunities. This can happen through careful feedback the encourages growth and not discouragement.

Assessing Learning

Assessment is focused on determining a students’s progress as related to academics. In this post, we will examine several types of assessment common in education today. The types we will look at are

  • direct observation
  • Written responses
  • Oral responses
  • Rating by others

Direct Observation

Direct observation are instances in which a teacher watches a student to see if learning has occurred. For example, a parent that has instructed a child in how to tie their shoe will watch the child doing this. When successful, as observed, the parent is assured that learning has occurred. If the child is not successful the parent knows to provide some form of intervention, such as reteaching, to help the child to have success.

Problems with direct observation include the issue of only being able to focus on what is seen. There is no way of knowing what is going on in the child’s mind. Another challenge is that just because the behavior is not observed does not mean that no learning has happen. Students can understand, at times, with being able to perform.

Written Response

Written response is the assessing of a student’s response in writing. These can take the form of test quizzes, homework, and more. The teacher reads the student’s response and determines if there is adequate evidence to indicate that learning has happen. Appropriate answers indicate evidence of learning

In terms of problems, written responses can be a problem for students who lack writing skills. This is especially true for ESL students. In addition, writing takes substantial thinking skills that some students may not posses.

Oral Responses

Oral responses involve a student responding verbally to a question or sharing their opinion. Again issues with language can be a barrier along with difficulties with expressing and articulating one’s opinion. Culturally, mean parts of the world do not encourage students to express themselves verbally. This puts some students at a disadvantages when this form of assessment is employed.

For teachers leading discussion, it is often critical that they develop methods for rephrasing student comments as well as strategies for developing thinking skills through the use of questions.

Rating by Others

Rating by others can involve teachers, parents, administrators, peers, etc. These individuals assess the performance of a student and provide feedback. The advantages of this includes having multiply perspectives on a students progress. Every individual has their own biases but when several people assess such threats to validity are lessen.

Problems with rating by others includes finding people who have the time to come and watch a particular student. Another issue is training the raters to assess appropriately. As such, though this is an excellent method, it is often difficult to use.


The tools mentioned in this post are intended to help people new to teaching to see different options in assessment. When assessing students, multiple approaches are often the best. The provide a fuller picture of what the student can do. Therefore, when looking to assess students consider several different approaches to verify that learning has occurred.

Portfolio Assessment

One type of assessment that has been popular a long time is the portfolio. A portfolio is usually a collection of student work over a period of time. There are five common steps to developing student portfolios. These steps are

  1. Determine the purpose of the portfolio.
  2. Identify evidence of skill mastery to be in the portfolio.
  3. Decide who will develop the portfolio.
  4. Pick evidence to place in portfolio
  5. Create portfolio rubric

1. Determine the Purpose of the Portfolio

The student needs to understand the point of the portfolio experience. This helps in creating relevance for the student as well as enhancing the authenticity of the experience. Common reasons for developing portfolios includes the following…

  • assessing progress
  • assigning grade
  • communicating with parents

2. Identify Evidence of Skill Mastery

The teacher and the students need to determine what skills will the portfolio provide evidence for. Common skills that portfolios provide evidence for are the following

  • Complex thinking processes-The use of information such as essays
  • Products-Development of drawings, graphs, songs,
  • Social skills-Evidence of group work

3. Who will Develop the Portfolio

This step has to do with deciding on who will set the course for the overall development of the portfolio. At times, it is the student who has complete authority to determine what to include in a portfolio. At other times, it is the student and the teacher working together. Sometimes, even parents provide input into this process.

4. Pick the Evidence for the Portfolio

The evidence provide must support the skills mention in step two. Depending on who has the power to select evidence, they still may need support in determining if the evidence they selected is appropriate. Regardless, of the requirement, the student needs a sense of ownership in the portfolio.

5. Develop Portfolio Rubric

The teacher needs to develop a rubric for the purpose of grading the student. The teacher needs to explain what they want to see as well as what the various degrees of quality are.


Portfolios are a useful tool for helping students in assessing their own work. Such a project helps in developing a deeper understanding of what is happening in the classroom. Teachers need to determine for themselves when portfolios are appropriate for their students.

After the Exam: Grading Systems II

In this post, we conclude our discussion on grading systems by looking at less common approaches. There are at least three other approaches to grading. These systems are comparison with aptitude, comparison with effort, and comparison with improvement.

Comparison with Aptitude

In this approach, a student is compared with their own potential. In other words, the teacher grades the student on whether or not the student is reaching their full potential on an assignment as determined by the teacher. For example, if an average student does average work, they get an “A.” However, if an excellent student does average work they get a “C”.  To get an “A”, the excellent student must do excellent work as determined by the teacher.

The advantage of this system is everyone, regardless of ability, has a chance at earning high grades. However, the disadvantages are serious. The teacher gets to decide what potential a student has. If the teacher is wrong, weak students are pushed too hard, strong students may not be pushed hard enough, and or vice versa. This grading is also unfair to stronger students as weaker students earn the same grade for inferior work.

Comparison with Effort

This approach does not look at potential as much as it looks at how hard a student works. To receive a higher grade an average student must demonstrate a great deal of effort on a test. For the strong student, if they show little effort on an assessment they will receive a lower grade.

This system has the same advantages and disadvantages of the aptitude system. It is unfair to the stronger students to be held to a different standard in comparison to their peers. Also, it is hard to be objective when determining the amount of effort a student puts forth.

Comparison with Improvement

This system of grading looks at the progress a student makes over time to assign a grade. Students who improve the most will receive the highest grade. Students who show little improvement will not do so well.

This system is more objective then the previous to examples because it relies on data collected over time that is more than a teacher’s impression. However, one significant drawback is the student who does well from the beginning. If a student is strong from the beginning there will be little improvement. Committing to this grading system could hurt high-performing students.


Which system to use depends on the context and needs of your students. The number rule for grading is to maintain consistent within one assessment but it is perhaps okay to flexible from one assignment to the next.

After the Exam: Grading Systems

After the students submit their exams and they have been marked by you, it is time to determine the grades. This can actually be very controversial as there are different grading systems. In this discussion, we will look at two of the most common grading systems and examine their advantages and disadvantages. The grading systems discussed in this blog are comparison with other students and comparison with a standard.

Comparison with Students

Comparison with students is the process of comparing the results of one student with the results of another student. Another term for this is “grading on the curve.” For example, if a test is worth 100 points and the highest score is 85, the total points possible would be reduce to 85. The removal of 15 points raises the grade of all of the students significantly because the standard is the 85 of the highest performing student rather than the absolute value of 100.

Students, particularly the average and low performing ones, love this approach. The reason for this is that they get a boost in their grade without having to demonstrate any further evidence of proficiency in meeting the objectives. Teachers often appreciate this method as well, as it helps students and reduces the pressure of having to fail individuals or give students low grades.

A drawback to this approach is the pressure it places on high-performing students. The good students face pressure to not study as much in order to have a lower grade that benefits the group. Students also have a way of finding out who got the highest score and this can lead to social problems for stronger students.

One way to avoid the pressure on the top student is specify a percentage of students who will receive a certain grade. For example, the top 10% of students will receive an “A” the next 10% of students will receive a “B” and so on. This makes the top performers a group of students rather than an  individual. However, student performance becomes categorical rather than continuous, which some may claim is not accurate.

A question to ask yourself when determining the appropriateness of “grading on a curve” is the context of the subject. It may be okay for someone with an 85 to get an “A” in philosophy. However, do you want a heart doctor operating on you who earned an “A” by earning an 85 or a heart doctor who earned an “A” by scoring a 100? Sometimes this difference is significant.

Comparison with a Standard

Comparison with a standard is comparing students to a specific criteria such as the ABCDF system. Each letter is assigned a percentage out of a hundred and the grade is determined from this. For example, using a traditional grading scale, a student with a “94” would receive an A.

The advantage of this system is the objectivity of the grading system (marking is highly subjective, especially for essay items). Either student received an 94 or they did not. There is no subjective curve. Those who received a high grade truly earned it while those who received a low grade deserved it.

One problem is that different places can use different scales. For example, an “A” in many US Universities is normally 90% and above. However, an “A” in Thailand universities is set at only 80%. Both are seen as “excellent” students. This makes comparisons of students difficult. Using the doctor analogy, who do you want to perform heart surgery on you the 80% “A” doctor or the 90% “A” doctor?


In the next post, we will look at lesser known grading systems that will provide alternatives for teachers searching for ways to help their students. If you have any suggestion or ways of dealing with grading, please share this information in the comments.

Tips for Writing Excellent Essay Items

In the last post, there was a discussion on developing essay items. This post will provide ideas on when to use essay items, how to write essay items, and ways to mark essay items.

When to Use Essays

Here are several reasons to know when essays may be appropriate. Of Course, this is not an exhaustive list but it will provide a framework for you to make your own decision.

  • Class size–Even the most ambitious teacher does not want to read 50 essays. Keep in mind the size of the class when deciding if essay items work for you. Generally, classes under 20 can use long response or limited response, classes of 20-40 can use  limited response, and above 40 maybe another form of assessment is best but it is your personal decision.
  • Cheating–Normally, it much more difficult for students to copy from one another when using essay items. Although I once caught my middle school students attempting to do this. Each answer for essay items must be unique, which is not possible with objective items.
  • Objectives–If your objectives are from the higher levels of Bloom’s Taxonomy essays are one way to assess if the student have met the objectives. However, sophisticated multiple choice can also do this as well.

How to Write Essay Items

One of clearest way to write essay  items is to approach them the same way as writing objectives. This means that for the most part essay items should include.

  • an action (what they will do) such as explain, predict, organize, evaluate, etc.
  • a condition (the context)
  • Proficiency (criteria for grading) such as content, clarity, thinking, consistency, etc.

Below is an example

Within Southeast Asia, predict which country will have the strongest economic growth over the next 20 years. You will be assessed upon the clarity, content, organization, and depth of thinking of your response. Your response should be 1,000-1,500 words.

Here are the three components in paraentheses

Within Southeast Asia (condition), predict which country will have the strongest economic growth over the next 20 years (action). You will be assessed upon the clarity, content, organization, and depth of thinking of your response (proficiency). Your response should be 1,000-1,500 words.

Here are some other tips

  • Define the task or action for the students. See previous example
  • Avoid using optional items. This leads to students being evaluated based on different items which makes comparison difficult from a statistical point. It is recommended that all students answer the same items for this reason.
  • Establish limits in words (see example above). This relates again to comparison. If one student writes 5,000 words and another writes 500, it is hard to compare since there was no standard set.
  • Make sure the essay item relates to your objectives. This happens by developing a test blueprint.

Marking Essay Items

The criteria for grading should be a part of the essay item and falls under the proficiency component. These same traits in the proficiency component should be a part of a rubric the teacher uses to mark the assignment. Rubrics help with grading consistently. The details of making rubrics is the topic of another post.

The ideas here are just an introduction to making essay items. There is always other and better ways to approach a problem. If you have other ideas please share in the comments section.

Developing Essay Items

Essay items are questions that requires the student to supply and develop the correct answer. This is different from objective items in which the options are provided and the student selects from among them. Essay items focus upon higher level thinking in comparison to the lower level thinking focus of objective items. There are two common types of essay items and they are the long response essay and the limited response essay.

Long Response Essay

The long response essay is a complex essay of several or more paragraphs that addresses a challenging questions that requires deep thinking. An example of a long response essay item is below.

Compare and contrast Ancient Egypt and Ancient Mesopotamia. Consider the geographic, economic, social, and military approaches. Your response will be graded upon accuracy, depth of thinking, organization, and clarity.

Such a question as the one above requires significant critical thinking in order to identify how these two nations were similar and how they were different. There are an infinite number of potential answers and approaches. A distinct trait of essay items is the potential for so many equally acceptable solutions. Success in determine in the quality of the response rather than in finding one correct answer.

Limited Response Essay

Limited response essays items require a students to recall information in an organize way in order to address a specific problem. The length of the response may be a paragraph or two and the answer does not have the same depth as long response. One reason the answers are shorter and simpler is because these types of questions may only address one issue per item. Long response essay items will deal with several issues in each item. Below is an example of a limited response item.

Explain two differences between Ancient Egypt and Ancient Mesopotamia. 

The answer to this question could easily be supplied in a short paragraph. The student list two differences and they should receive full credit. If you compare this item to the long response item you can see the difference in difficulty. One difference is there is no criteria on how the student will be graded. The assumptions is listing two differences is enough for full credit. Another difference is the expectations. The long response wanted several comparisons and contrasts while the limited response only required to short contrast.

In the next post we will discuss when to use essay items, give suggestions for their development, and ideas for marking.

Writing Test Items for Exams with Power: Part III Multiple Choice Items

Multiple choice items are probably the most popular objective item that are used for test. The advantage of multiple-choice items in comparison to true and false and matching is that multiple choice can assess higher levels of thinking. In other words, multiple choice items can go beyond recall and deal with matters such as application and justification

There are two components to a multiple choice item. The statement or question of the multiple choice item is called the stem. The answer choices are called options. There are usually four or five options per stem for multiple choice items.

Below are some tips for developing multiple choice items

Stem Clues

A stem clue is when the words in the stem are similar to the words in the options. This similarity serves as a signal for sharp students. Consider the example..

When the Israelites where in Canaan, which of the following was a threat to them?
A. Canaanites
B. Indians
C. Americans
D. Spanish

The word Canaan is in the stem and the word Canaanites is one of the options and most students would rightly guess it is the correct answer.

Grammatical Clues

Sometime grammar can give the answer away. Take a look at the example.

Steve Jobs was an____________.
A. Lawyer
B. Doctor
C. Entrepreneur
D. Movie Star

The give away here is the indefinite article “an” in the stem. Only the option “entrepreneur” can be correct grammatically.

Unequal Option Lengths

The longest answer is often the correct answer for whatever reason. I do not think this requires an example.

Other Uses of Multiple Choice

Multiple choice can also be used for higher level thinking. For example, in mathematics a teacher writes a word problem and provides several options as potential answer. The student must calculate the correct answer on a separate piece of paper and then select the correct answer on the test.

For geography, a teacher can provide a map and have students answer multiple choice items about the map. Students must use the map to find the answers. These are just some of many ways that multiple choice items can go beyond recall

Tips and Conclusion

Here are some simple tips for improving multiple choice items

  • All wrongs answers should be believable and related to the question
  • Avoid negative questions as they are confusing to many students
  • Make sure there is only one correct answer
  • Rotate the position of the correct answer. Remember the most common answer is “C.” Therefore, force yourself not to use this option too often

There is much more that can be said about this topic. However, for those new to developing multiple choice items the information provided will serve as starting point for developing your own way of developing test items.

Testing with Power: How to Develop Great Test Items Part II

Today we will continue our discussing on developing excellent test items by looking at how to write matching items. Matching test items involved two columns. The side to the left has the descriptions (or they should) and the side to the right has terms. Below is an example

Directions: Column A contains descriptions of various famous basketball players. Column B contains the names of several famous basketball players. After examining both columns select the basketball who matches each description. Each answer can be used only once.

Column A                                                                     Column B

  1. Played for the Cleveland Cavaliers                A.        Michael Jordan
  2. Is Jewish                                                         B.         Tim Duncan
  3. Won six NBA championships                         C.        Lebron James
  4. Studied at Syracuse                                       D.        Carmelo Anthony
  5. Grew up in Italy                                              E.         Kobe Bryant                                                                                                                                                                F.         Amar’e Stoudamir
    G.        Dwayne Wade

I know the answers are scattered. Formatting is difficult in wordpress sometimes

This example has several strong points.

  • Homogeneity– All of the items have something in common in that they all are basketball players. The name of this is homogeneity. This makes it harder for the students to guess but makes it easier for them to remember what the correct item is because they are accessing information on one subject instead of several. A common mistake in developing matching items is to put disparate terms together which is confusing for learners.
  • Order of Columns– The descriptions should go on the left and the terms on the right. This is because the descriptions are longer and take more time to read. Read the long stuff first and then find the short answer in the right column second. Many people put the terms on the left and the descriptions on the right, which is detrimental to student performance. They read one short answer and have to shuffle through several long descriptions
  • More Terms than Descriptions– There should be more terms than descriptions in order to prevent guessing. This also helps to prevent students from losing two points instead of one. If the number of descriptions and terms are the same if a student gets one wrong they get two wrong because two answers will be in the wrong place. If there are extra terms this could be avoided.
  • One Description for One Term– There should be one correct item for each description. Anything else is confusing for many students.
  • Miscellaneous- Number descriptions and give letters to terms. Descriptions should be longer than the information in the terms column.

Developing matching items with these concepts in mind will help students to have success in the examinations you give them. Are there other strategies for matching? If so, please share in the comments section

Test Items I

In developing assessments, there are two types of test items. The two types of test items are objective test items and essay test items. Objective test items are items that only have one correct answer. Examples include true and false, multiple choice, matching, and completion.

Essay items are items that can have many different answers. There are two common types and they are restricted response and extended essay. With either of these items the student response to an open-ended question.

In this post, we are going to take a closer look at true and false items by defining and providing information on how to develop them.

True and False Items

True and false items are easy to make. You write a statement and you ask the student if the statement is true or not. In comparison to other test items, true and false items can be made quickly. However, there are some concerns with there use.

One problem with true and false items is that a student who guesses has a 50% chance of success. This means on average, that a student who knows nothing could get 50% on a true and false test if all they did was guess. Even though they failed, this is a high grade for someone who knows nothing.

One way to deal with this, is that when a student identifies that a statement is false, they also need to indicate what aspect of the statement is incorrect. Below is an example.

Directions: For each statement below read the statement and place a check next to true if the state is true or next to false if the statement is false. If the statement is false, underline the word in the statement that is false and write the correct word in the space provided. 

T   F Thailand is south of Malaysia


In the example above, the answer is false. Therefore, we will put a check next to F. Next, we have to determine what word in the statement is incorrect. In this example, the word “south” is wrong” Thailand is north of Malaysia. Therefore, we underline south and write the word north in the space provided. Are completed example is below.

T   F√ Thailand is south of Malaysia


This approach helps to reduce the risk of random guessing by students. Now they have to know why an answer is false in order to receive full credit. The only problem with this is that if the answer is true and the student guesses correctly, there is no way of knowing. Below are some suggestions for developing true and false items

Tips for True and False Items

  1. Make sure the statement is absolutely true at all times. If there are exception it is not true or false.
  2. In relation to the first tip, avoid words that are indefinite  such as long, short, hard, soft and also words that are absolute such as always, never, permanently. These phrases are normally false.
  3. Avoid developing a predictable pattern in the responses such as TTFF or FFTT or TFTFT. Students will find the pattern and follow it.

We hope that these ideas help you in developing true and false statements. If you have other suggestions, please include them in the comments.

Test Blueprint

Developing assessments is often difficult. Teachers wonder if they have covered all the material, they have to think about how to assess the students, and they need to consider reaching them at different levels of thinking. This is not in anyway easy for most teachers.

One way to deal with these problems is through the use of what is called a test blueprint. A test blueprint is a map of the objectives that are assessed on the test as well as a map of the different levels of learning that each question addresses. Below is an example. You can click on the image to make it bigger.

Objectives-page0001 (1)

The example above is a test on curriculum development. There are four objectives that are assessed on this test and they are.

  1. Develop aims from needs assessment.
  2. Develop goals
  3. Develop standards
  4. Develop objectives

These four objectives are assessed at one of three levels from Bloom’s taxonomy. These levels are

  1. Knowledge
  2. Comprehension
  3. Application

For these test I am looking at how my students perform these four objectives at the knowledge, comprehension, and application levels of Bloom’s Taxonomy.

For each level of Bloom’s Taxonomy, I asked 1 or 2 questions related to my objectives. This number is then totaled at the far right. For example, for objective one, developing aims from needs assessment, I have 1 true and false question at the knowledge level, 1 multiple choice question at the comprehension level, and one long essay question at the application level. This gives me a total of 3 questions that focus for the “developing aims from the needs assessment objective.”

These three questions for objective 1 represent 15% of the total questions on the test. In other words, not only do I know how many questions I asked in each level for this objective, I also know how much of the total test is represented in my first objective. This helps in maintaining a balanced test as sometimes teachers give too much weight to one specific piece of information.

Not only can you know how many questions you ask about each objective, you can also know how many questions you have for each level of Bloom’s taxonomy. For example, looking at the chart, you can see that there are 6 knowledge questions which represents 30% of the total questions on this exam. Again, such information helps in maintaining a balanced exam.

At the very bottom of the chart you can also find out how many of each type of question you asked on the exam. For example, there are 6 True and false questions and 8 multiple choice for a percentages of 30% and 40% of the total questions. This also helps with balance. You can make sure that different forms of questions are an appropriate percentage of the total exam.

The test blueprint helps teachers to develop balance exams. Objectives, question type, and the level of the questions can all be taken into account when developing an assessment.