Monthly Archives: January 2025

Using Glue in R

The glue package in R provides a lot of great tools for using regular expressions and manipulating data. In this post, we will look at examples of using just the glue() function from this package.

Paste vs Glue

The paste() function is an older way of achieving the same things that we can achieve with the glue() function. paste() allows you to combine strings. Below we will load our packages and execute a command with the paste() function.

> library(glue)
> library(dplyr)

> people<-"Dan"
> paste("Hello",people)
[1] "Hello Dan"

In the code above, we load the glue and the dplyr package (we will need dplyr later). We then create an object called “people” that contains the string “Dan”. We then used the past function to combine the “people” vector with the string “Hello”. The output is at the bottom of the code.

Below is an example of the same output but using the glue() function

> glue("Hello {people}")
Hello Dan

Inside the glue() function everything is inside parentheses. However, the object “people” is inside curly braces and this indicates to the glue() function to look for what “people” represents. The printout is the same but without parentheses.

Multiple Strings

Below is an example of including multiple strings in the same glue() function

> people<-"Dan"
> people_2<-"Darrell"
> glue("Hello {people} and {people_2}")
Hello Dan and Darrell

In the first two lines above we make our objects. In line 3 we used the glue() function again and inside we included both objects in curly braces.

In another example using multiple strings we will replace text if it meets a certain criteria.

> people<-"Dan"
> people_2<-NA
> glue("Hi {people} and {people_2}",.na="What")
Hi Dan and What

In the code above we start by creating two objects. The second object (people_2) has stored NA. The code in the third line is the same with the exception of the “.na” argument. The “.na” argument is set to the string “What” which tells R to replace any NA values with the string “What”. The output is in the final line.

Temporary Variables

It is also possible to make variables that are temporary. The temporary variable can be named or unnamed. Below is an example with a named variable.

> glue("Dan is {height} cm tall.",height=175)
Dan is 175 cm tall.

The temporary variable “height” is inside the curly braces. The value for “height” is set inside the function to 175.

It is also possible to have unnamed variables inside the function. Below we will use a function inside the curly braces.

> glue("The average number is {mean(c(2,3,4,5))}")
The average number is 3.5

The example is self-explanatory. We used the mean() function inside the curly braces to get a calculated value. As you can see, the potential is endless.

Using Dataframes

In our last example, we will see how you can create a data frame and using input from one column to create a new column.

> df<-data.frame(column_1="Dan")
> df
  column_1
1      Dan
> df %>% mutate(new_column = glue("Hi {column_1}"))
  column_1 new_column
1      Dan     Hi Dan

Here is what we did.

We made a dataframe called df. This dataframe contains one column called column_1. In column_1 we have the string Dan.
In line 2 we display the values of the dataframe.
Next, we use the mutate() function to create a new column. Inside the mutate function we use the glue function and set it to create a string that uses the word “Hi” in front of the values of column_1.
Lastly, we print out the results.

Conclusion

The glue package provides many powerful tools for manipulating data. The examples provided here only focus on one function. As such, this package is sure to provide useful ways in which data analyst can modify their data.

Confusing Words for Small Children VIDEO

Leave a reply

Communicating with children is always difficult. However, sometimes it is the adult’s fault that children do not understand. The video below provides examples of terms adults love to use that can be hard for children to understand.

Regular Expression with R

Leave a reply

Regular expressions are used for a variety of reasons. One of the main reasons is for finding data in your dataset that meets specific criteria. In this post, we will use regular expressions for several different purposes.

Initial Setup

The only package we need is the stringr package. We will also create a vector of names that will serve as our data for the first few examples. Below is the code.

library(stringr)
people<-c("Bob","Brad","Dan","Jason","Tony","Tom")

Commonly Used Symbols

We are going to use the people vector for our data. The first function we will use is the str_detect() function. This function detects strings within your data that meet your criteria. The str_detect() function takes the data as the first argument and then a pattern for the second argument.

What we are going to do is subset the people vector using str_detect(). We want to find all words that start with the letter B. To tell R to look for words that start with by we must use the caret (^) symbol in front of the letter B in the pattern argument. Below is the code and the output.

> people[str_detect(people,pattern = "^B")]
[1] "Bob"  "Brad"

The code starts with the vector people. Next, we place all of the code for searching inside brackets. The brackets are used in this example for subsetting the data or for finding data that meets our criteria. Inside the brackets, we are using the str_detect() function. Inside the function is the data we are subsetting followed by the pattern argument. Inside the quotes, we have the caret symbol which means “at the beginning” followed by the letter B. Our output shows the two words that meet this criteria.

The caret symbol is used to indicate finding letters at the beginning. However, the dollar sign “$” is used to find letters at the end of a string. Below is the code and output for this symbol.

> people[str_detect(people,pattern = "n$")]
[1] "Dan"   "Jason"

The code is mostly the same as in the previous example. The only difference is the pattern which shows we want words that end with the letter “n”. The output shows two words that meet this criteria.

The next symbol we will learn is the period “.”. This is used when you want to find strings that have a particular word character anywhere inside the string. Below is the code and output.

> people[str_detect(people,pattern = "a.")]
[1] "Brad"  "Dan"   "Jason"

Again, the only difference is the pattern. We told R we want to find any words that have the letter “a” inside. By using the period we found three words that match this criteria.

Multiple Criteria

All of the previous examples were limited to looking for one character. However, there are several different shortcuts that allow you to look for multiple criteria when using regular expressions. For the next examples, we need to make a different vector of data and we will now be using the str_match_all() function which will find all strings that meet are criteria.

In the code below, we create a new vector that has words and numbers as data. Next, we will use the str_match_all() function to find all strings that contain numbers. To find numbers we will use the “\\d” expression.

> people_and_numbers<-c("Bob","Brad","Dan",1,2,3)
> str_match_all(people_and_numbers,"\\d")
[[1]]
     [,1]

[[2]]
     [,1]

[[3]]
     [,1]

[[4]]
     [,1]
[1,] "1" 

[[5]]
     [,1]
[1,] "2" 

[[6]]
     [,1]
[1,] "3"

The output is a little strange. The actual output is a list. Since there are six strings in our original vector there are six items in our list. The first three items in the list contain nothing because the first three entries in our vector do not contain any numbers. The last three items in the list each contain a number because these are the numbers contained in the original vector.

The next expression we will learn is for finding word characters, which is the “\\w” expression. This expression will find any word character or number. Below is an example.

> str_match_all(people_and_numbers,"\\w")
[[1]]
     [,1]
[1,] "B" 
[2,] "o" 
[3,] "b" 

[[2]]
     [,1]
[1,] "B" 
[2,] "r" 
[3,] "a" 
[4,] "d" 

[[3]]
     [,1]
[1,] "D" 
[2,] "a" 
[3,] "n" 

[[4]]
     [,1]
[1,] "1" 

[[5]]
     [,1]
[1,] "2" 

[[6]]
     [,1]
[1,] "3"

Notice how the output splits apart of the characters in each word. Besides this, the output is to be expected.

We can also indicate that we want only letters. This is done by using brackets and dashes. below is the code and output.

> str_match_all(people_and_numbers,"[A-Za-z]")
[[1]]
     [,1]
[1,] "B" 
[2,] "o" 
[3,] "b" 

[[2]]
     [,1]
[1,] "B" 
[2,] "r" 
[3,] "a" 
[4,] "d" 

[[3]]
     [,1]
[1,] "D" 
[2,] "a" 
[3,] "n" 

[[4]]
     [,1]

[[5]]
     [,1]

[[6]]
     [,1]

The output is mostly the same. The first three words are split apart. However, the last three items are empty because the numbers do not contain letters.

You can put almost anything inside the brackets. In the example below, we are only looking for vowels.

> str_match_all(people_and_numbers,"[aeiou]")
[[1]]
     [,1]
[1,] "o" 

[[2]]
     [,1]
[1,] "a" 

[[3]]
     [,1]
[1,] "a" 

[[4]]
     [,1]

[[5]]
     [,1]

[[6]]
     [,1]

Now only the items that contain vowels are included in the list.

Conclusion

These are just some of the amazing things that regular expression can allow you to do. Whenever you need to wrestle with text it is important to remember how regular expressions can help you.

Regular Expressions with R VIDEO

Leave a reply

The video below will provide examples of using regular expressions with R. Regular expressions are a great tool for finding various information within strings.