Finding Text in a Vector in R

This post will cover how to search for text within a vector in R. There are times when you may be working with a lot of information and you want to find a specific piece of information. For example, let’s say you have a list of names that are not in alphabetical order and you want to know how many names start with the letter “E”. To solve this problem, you need to learn how to search text by searching for a pattern. Below is an example of how to do this followed by an explanation.

  • > Student.names
     [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Ed"      "Frank"   "Gabe"    "Hank"   
     [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
    [17] "Quinton" "Alex"    "Andre"   "Aron"    "Bob"     "Rick"    "Simon"   "Steve"  
    [25] "Thomas"  "Tim"     "Victor"  "Vince"   "William" "Warren"  "Wilson"  "Ted"    
    [33] "Dan"     "Eric"    "Ernest"  "Fred"    "Jim"     "Ethan"   "Lance"   "Mitch"  
    [41] "Pete"    "John"   
    > grep("E",Student.names)
    [1]  5 34 35 38
  1. You have to create the variable ‘Student.names’ and type all the names above as a vector
  2. Next, you use the ‘grep’ function to determine which of the names start with “E” in the variable ‘Student.names’
  3. R tells by position or index which names start with ‘E’

Now you know where the names that start with ‘E’ are but you don’t know the actual names. Below is how you extract the names from the variable.

>  Student.names[grep("E", Student.names)]
[1] "Ed"     "Eric"   "Ernest" "Ethan"

Here is what happened

  1. You told the computer that you want a subset of all the names that start with “E” from the variable ‘Student.names’
  2. You used the ‘grep function to do this.
  3. R returned the names that start with ‘E’

Substituting Text

You can also substitute text in a vector. For example, let’s say you want to replace the name ‘Ed’ in the ‘Student.names’ variable with the more formal name of ‘Edward’ here is how it is done. Just so you know, ‘Ed’ was the 5th name in the list but below it will be replaced with ‘Edward.

> gsub("Ed", "Edward", Student.names)
 [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Edward"  "Frank"   "Gabe"    "Hank"   
 [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
[17] "Quinton" "Alex"    "Andre"   "Aron"    "Bob"     "Rick"    "Simon"   "Steve"  
[25] "Thomas"  "Tim"     "Victor"  "Vince"   "William" "Warren"  "Wilson"  "Ted"    
[33] "Dan"     "Eric"    "Ernest"  "Fred"    "Jim"     "Ethan"   "Lance"   "Mitch"  
[41] "Pete"    "John"
  1. In this example, we used the ‘gsub’ function to replace the name ‘Ed’ with ‘Edward
  2. Using ‘gsub’ we tell R to find ‘Ed’ and replace it with ‘Edward in the variable ‘Student.names’
  3. R completes the code and prints the list as seen above

Hopefully, the information provided will give you ideas into using text in R

1 thought on “Finding Text in a Vector in R

  1. Pingback: Finding Text in a Vector in R | Education and R...

Leave a Reply