(More) Reproducible Data Analysis in R using {targets}
Warwick R User Group, University of Warwick, UK
Saranjeet Kaur Bhogal
Imperial College London
2024-12-05
Motivation
- What is reproducibility?
- Ability for others (including your future self) to reproduce your analysis.
- Reproducibility is not a binary concept.
- There is a scale from less reproducible to more reproducible.
- There are various tools and practices can help enhance it.
- {targets} is one such tool.
- Be your own best collaborator! (you are helping your future self!) π
Show of hands
- Who uses R?
- Who has heard of {targets}?
What does {targets} really do?
- Working with data analysis projects can get messy sometimes. π
- You might be working on an analysis and then have to stop and work on something else π β¦
β¦ What happens next?
- When you come back, there is a chance that you forgot what you were doing. π
- What would you do in this situation? Re-run the whole analysis? π€
β¦ What if I told you β¦
- There is a better way! {targets} can help you with this! π
- {targets} can support to immediately pick up where you left off without confusion or trying to remember what you were doing. π
- I will be demonstrating how to use {targets} in a data analysis project today! π»
Acknowledgements
- Allison Horst, Alison Presmanes Hill, Kristen Gorman: For the {palmerpenguins} package
- Joel Nitta: For his Carpentries workshop on {targets}
- Nick Tierney: For his talk on {targets} at RSECon24
- Will Landau: Developed and maintains {targets}
How to follow along
- Not designed as a code along (but, you are welcome to try!)
- Might be best to observe, take notes, ask questions
Data for the analysis: {palmerpenguins}
The three species of penguins in the palmerpenguins dataset. Artwork by @allison_horst
Some takeaways β¦
- Use only one active
_targets.R
file at a time in a given project.
- The
_targets.R
file should be placed at the top level/root directory of your project.
- Use
targets::tar_make()
to run and targets::tar_visnetwork()
to visualise the workflow.
- Even if you close your R session, then re-start it and use
targets::tar_load()
or targets::tar_read()
, you will still be able to read load/read the workflow objects. In other words, the workflow output is saved across R sessions.
Some more things to try
- How to work with external files? (
tarchetypes::tar_file()
)
- Organise the functions better, instead of a single
functions.R
file.
- Explore making reports using
tarchetypes::tar_quarto
.
Thanks
- Allison Horst, Alison Presmanes Hill, Kristen Gorman
- Joel Nitta
- Nick Tierney
- Will Landau
Learning more
About targets: https://books.ropensci.org/targets/
talk link: saranjeetkaur.github.io/reproducible-analysis-targets/
GitHub: SaranjeetKaur
Email: kaur.saranjeet3@gmail.com