Writing verbatim SQL in pipe chains using {dbplyr}
Author
Charlie Hadley
Published
June 15, 2022
I’ve been trying to use the {dbplyr} package when working with SQL and R. It provides a really nice interface for writing tidyverse code against a database connection. I feel that {dbplyr} gives a generally nice interface than using {DBI} which is the approach used in db.rstudio.com, but it’s under documented.
Here’s a comparison of code using {DBI} and {dbplyr}
# Source: SQL [5 x 2]
# Database: sqlite 3.38.5 [:memory:]
clean_test mean_budget_2013
<chr> <dbl>
1 dubious 59202525.
2 men 45896513.
3 notalk 57963237.
4 nowomen 51891839.
5 ok 44990138.
Now, that’s great for when we want to write new queries. But I wanted to go in the other direction. There’s lots of existing SQL code that I don’t want to re-write into R code, so it would be nice to write verbatim SQL within the pipe syntax. I couldn’t figure it out from the documentation so asked the question on Twitter, and as is often the case Garrick Aden-Buie gave me the solution.
bechdel_tbl %>%remote_con() %>%tbl(sql("SELECT `clean_test`, AVG(`budget_2013`) AS `mean_budget_2013`FROM `bechdel`WHERE (`year` < 2000.0)GROUP BY `clean_test`"))
# Source: SQL [5 x 2]
# Database: sqlite 3.38.5 [:memory:]
clean_test mean_budget_2013
<chr> <dbl>
1 dubious 59202525.
2 men 45896513.
3 notalk 57963237.
4 nowomen 51891839.
5 ok 44990138.
Note that we’re still connected to the database here. If we want to return a local tibble we need to use collect() which means our entire workflow is as follows:
bechdel_tbl <-memdb_frame(bechdel, .name ="bechdel")bechdel_tbl %>%remote_con() %>%tbl(sql("SELECT `clean_test`, AVG(`budget_2013`) AS `mean_budget_2013`FROM `bechdel`WHERE (`year` < 2000.0)GROUP BY `clean_test`")) %>%collect()
# A tibble: 5 × 2
clean_test mean_budget_2013
<chr> <dbl>
1 dubious 59202525.
2 men 45896513.
3 notalk 57963237.
4 nowomen 51891839.
5 ok 44990138.