What are the different ways of executing code within a string of text in Python and R? This post looks into the good and arguably best forms of string interpolation in two of the most popular programming languages for data science.
In layman’s terms, string interpolation is executing code within a string (text).
Let’s keep this post short and to the point. We’ll look at some R code, then move on to Python. I’ll simply show how to use each method of string interpolation, and highlight my preferred method for each language.
paste
is a good way to paste together text and variables, although not my favorite.
name <- 'Avery'
age <- 24
paste('Hello! My name is', name, 'and I am', age, 'years old.')
[1] "Hello! My name is Avery and I am 24 years old."
Remember that R is vectorized, so no need for a for loop in cases like this:
name <- c('Avery', 'Susan', 'Joe')
age <- c(24, 20, 40)
paste('Hello! My name is', name, 'and I am', age, 'years old.')
[1] "Hello! My name is Avery and I am 24 years old."
[2] "Hello! My name is Susan and I am 20 years old."
[3] "Hello! My name is Joe and I am 40 years old."
The default separator in paste
is a space " "
, but obviously you can change that to something else.
x <- 25
y <- 15
paste('x + y', x + y, sep = ' = ')
[1] "x + y = 40"
Run ?paste
for more information.
paste
is good, but glue
is best. Ever since I discovered the glue
function from the glue
package, I rarely use paste
anymore.
Don’t forget to load the package:
glue
is easy to use. Just put code that you want to execute inside of braces { }
. Also, everything goes inside of quotes.
size <- c("Small", "Medium", "Large")
cyls <- sort(unique(mtcars$cyl)) # mtcars is a built-in dataset that comes with R
glue("{size} cars sometimes have {cyls} cylinders. But don't quote me, I'm not a car guy.")
Small cars sometimes have 4 cylinders. But don't quote me, I'm not a car guy.
Medium cars sometimes have 6 cylinders. But don't quote me, I'm not a car guy.
Large cars sometimes have 8 cylinders. But don't quote me, I'm not a car guy.
Personally, I find the glue
{ }
syntax cleaner, easier to read and type, and more intuitive than the base R paste
. For tidyverse
users, glue
style syntax is also popping up in other places in the tidyverse (for example, see the .names argument in the relatively new dplyr::across
function).
library(reticulate) # package for running Python within R
Similar to R’s paste
:
= 'Avery'
name = 24
age print('Hello! My name is ' + name + ' and I am ' + str(age) + ' years old!')
Hello! My name is Avery and I am 24 years old!
This method is also pretty clunky. Let’s try something better.
Using the format
method is not too shabby. Things are starting to look like R’s glue
.
print('Hello! My name is {name} and I am {age} years old!'.format(name = name, age = age))
Hello! My name is Avery and I am 24 years old!
Notice above how we specify name = name
inside of the format
method. The placeholders don’t actually represent our variables like you might think. You, the programmer, have to specify placeholder = some_variable
. You also don’t have to put anything inside of the {}
. If you leave the curly braces empty, Python relies on the order of the arguments that you put inside of the format
method.
= 'sad'
emotion print('I am sick and tired of {}! I am so {}.'.format('Covid', emotion))
I am sick and tired of Covid! I am so sad.
format
works fine, but I think Python really knocks it out of the park with something called f-strings
.
The syntax is almost exactly the same as glue
. Instead of writing glue('some text {code}')
, you just add the letter f before any string. This allows you to use the same curly brace syntax as before, easily executing the code within.
= 'French'
language = '3 years'
time
print(f'I have been speaking {language} for about {time}. I feel accomplished.')
I have been speaking French for about 3 years. I feel accomplished.
Careful though. Python isn’t vectorized like R is, so the following code might not work as expected.
= ['French', 'Spanish', 'English']
languages = ['3 years', '1 year', 'my entire life']
times
print(f'I have been speaking {languages} for {times}. I feel accomplished.')
I have been speaking ['French', 'Spanish', 'English'] for ['3 years', '1 year', 'my entire life']. I feel accomplished.
You have to do more work, which isn’t too terrible.
for (l, t) in zip(languages, times):
print(f'I have been speaking {l} for {t}. I feel accomplished.')
I have been speaking French for 3 years. I feel accomplished.
I have been speaking Spanish for 1 year. I feel accomplished.
I have been speaking English for my entire life. I feel accomplished.
Many experienced programmers would say that if you are using a for loop
, you probably shouldn’t be. There is usually a better option. Loops in generally are very error prone. Its probably not apparent with this toy example, but in case you were curious here is the same thing as above accomplished with map
and a lambda function.
list(
map(
lambda l, t: print(f'I have been speaking {l} for {t}. I feel accomplished.'),
languages, times
) )
I have been speaking French for 3 years. I feel accomplished.
I have been speaking Spanish for 1 year. I feel accomplished.
I have been speaking English for my entire life. I feel accomplished.
[None, None, None]
I won’t get into map
and lambda
here, but there are tons of great resources our there on the web. If you don’t understand the code above, just google “python map and lambda.”
Like I said, short and to the point. If you learned something here, especially if you didn’t know about glue
and f-strings
and you think they are useful, well then that is awesome. Thanks for reading. Stay safe and happy coding!
For attribution, please cite this work as
Robbins (2020, Oct. 6). Coding with Avery: String interpolation in Python and R (the best ways). Retrieved from https://codingwithavery.com/posts/2020-10-06-string-interpolation/
BibTeX citation
@misc{robbins2020string, author = {Robbins, Avery}, title = {Coding with Avery: String interpolation in Python and R (the best ways)}, url = {https://codingwithavery.com/posts/2020-10-06-string-interpolation/}, year = {2020} }