Finding Text in a Vector in R

This post will cover how to search for text within a vector in R. There are times when you may be working with a lot of information and you want to find a specific piece of information. For example, let’s say you have a list of names that are not in alphabetical order and you want to know how many names start with the letter “E”. To solve this problem, you need to learn how to search text by searching for a patter. Below is an example of how to do this followed by explanation.

  • > Student.names
     [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Ed"      "Frank"   "Gabe"    "Hank"   
     [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
    [17] "Quinton" "Alex"    "Andre"   "Aron"    "Bob"     "Rick"    "Simon"   "Steve"  
    [25] "Thomas"  "Tim"     "Victor"  "Vince"   "William" "Warren"  "Wilson"  "Ted"    
    [33] "Dan"     "Eric"    "Ernest"  "Fred"    "Jim"     "Ethan"   "Lance"   "Mitch"  
    [41] "Pete"    "John"   
    > grep("E",Student.names)
    [1]  5 34 35 38
  1. You have to create the variable ‘Student.names’ and type all the names above as a vector
  2. Next, you use the ‘grep’ function to determine which of the names start with “E” in the variable ‘Student.names’
  3. R tells by position or index which names start with ‘E’

Now you know where the names that start with ‘E’ are but you don’t know the actually names. Below is how you extract the names from the variable.

>  Student.names[grep("E", Student.names)]
[1] "Ed"     "Eric"   "Ernest" "Ethan"

Here is what happen

  1. You told the computer that you want a subset of all the names that start with “E” from the variable ‘Student.names’
  2. You used the ‘grep function to do this.
  3. R returned the names that start with ‘E’

Substituting Text

You can also substitute text in a vector. For example, let’s say you want to replace the name ‘Ed’ in the ‘Student.names’ variable with the more formal name of ‘Edward’ here is how it is done. Just so you know, ‘Ed’ was the 5th name in the list but below it will be replaced with ‘Edward.

> gsub("Ed", "Edward", Student.names)
 [1] "Andy"    "Billy"   "Chris"   "Darrin"  "Edward"  "Frank"   "Gabe"    "Hank"   
 [9] "Ivan"    "James"   "Karl"    "Larry"   "Matt"    "Norman"  "Oscar"   "Paul"   
[17] "Quinton" "Alex"    "Andre"   "Aron"    "Bob"     "Rick"    "Simon"   "Steve"  
[25] "Thomas"  "Tim"     "Victor"  "Vince"   "William" "Warren"  "Wilson"  "Ted"    
[33] "Dan"     "Eric"    "Ernest"  "Fred"    "Jim"     "Ethan"   "Lance"   "Mitch"  
[41] "Pete"    "John"
  1. In this example we used the ‘gsub’ function to replace the name ‘Ed’ with ‘Edward
  2. Using ‘gsub’ we tell R to find ‘Ed’ and replace it with ‘Edward in the variable ‘Student.names’
  3. R completes the code and prints the list as seen above

Hopefully, the information provided will give you ideas into using text in R

Advertisements

One thought on “Finding Text in a Vector in R

  1. Pingback: Finding Text in a Vector in R | Education and R...

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s