(More) Reproducible Data Analysis in R using {targets}

Warwick R User Group, University of Warwick, UK

Saranjeet Kaur Bhogal

Imperial College London

2024-12-05

Motivation

  • What is reproducibility?
  • Ability for others (including your future self) to reproduce your analysis.
  • Reproducibility is not a binary concept.
  • There is a scale from less reproducible to more reproducible.
  • There are various tools and practices can help enhance it.
  • {targets} is one such tool.
  • Be your own best collaborator! (you are helping your future self!) πŸ˜ƒ

Show of hands

  • Who uses R?
  • Who has heard of {targets}?

What does {targets} really do?

  • Working with data analysis projects can get messy sometimes. πŸ˜•
  • You might be working on an analysis and then have to stop and work on something else πŸ›‘ …

… What happens next?

  • When you come back, there is a chance that you forgot what you were doing. πŸ˜•
  • What would you do in this situation? Re-run the whole analysis? πŸ€”

… What if I told you …

  • There is a better way! {targets} can help you with this! πŸ˜ƒ
  • {targets} can support to immediately pick up where you left off without confusion or trying to remember what you were doing. πŸŽ‰
  • I will be demonstrating how to use {targets} in a data analysis project today! πŸ’»

Acknowledgements

  • Allison Horst, Alison Presmanes Hill, Kristen Gorman: For the {palmerpenguins} package
  • Joel Nitta: For his Carpentries workshop on {targets}
  • Nick Tierney: For his talk on {targets} at RSECon24
  • Will Landau: Developed and maintains {targets}

Plan for today

How to follow along

  • Not designed as a code along (but, you are welcome to try!)
  • Might be best to observe, take notes, ask questions

Data for the analysis: {palmerpenguins}

palmerpenguins logo

palmerpenguins logo

palmerpenguins

The three species of penguins in the palmerpenguins dataset. Artwork by @allison_horst

The three species of penguins in the palmerpenguins dataset. Artwork by @allison_horst.

The three species of penguins in the palmerpenguins dataset. Artwork by @allison_horst.

Let’s get started!

Some takeaways …

  • Use only one active _targets.R file at a time in a given project.
  • The _targets.R file should be placed at the top level/root directory of your project.
  • Use targets::tar_make() to run and targets::tar_visnetwork() to visualise the workflow.
  • Even if you close your R session, then re-start it and use targets::tar_load() or targets::tar_read(), you will still be able to read load/read the workflow objects. In other words, the workflow output is saved across R sessions.

Some more things to try

  • How to work with external files? (tarchetypes::tar_file())
  • Organise the functions better, instead of a single functions.R file.
  • Explore making reports using tarchetypes::tar_quarto.

Thanks

  • Allison Horst, Alison Presmanes Hill, Kristen Gorman
  • Joel Nitta
  • Nick Tierney
  • Will Landau

Resources

Slides

Learning more

  • About targets: https://books.ropensci.org/targets/

  • talk link: saranjeetkaur.github.io/reproducible-analysis-targets/

  • GitHub: SaranjeetKaur

  • Email: kaur.saranjeet3@gmail.com

Thank you!