Reproducible Analysis Pipeline is akin to Cooking: A Simplified Guide

reproducibility
Beginner
cooking
Author

Soundarya Soundararajan

Published

December 2, 2024

You might have read earlier in my blog about how reproducibility relates to rasam. Today, I came up with another example after reading Reproducible Research in Practice (Kitzes, Turek, and Deniz 2018), an excellent book.

When you decide to cook something, what do you do?

  1. You buy groceries and bring them home.
  2. You prepare them.
  3. You cook them.

Similarly, reproducible pipelines have three major steps:

  1. Importing your data.
  2. Cleaning and preparing the data for analysis.
  3. Conducting the data analysis.

Reproducible data analysis is akin to cooking (Viz by Soundarya)

But it doesn’t stop there!
When you cook, there’s often a recipe—a written guide. This recipe is akin to writing scripts for how your analysis was conducted. With a recipe, you can recreate your dish. Similarly, with scripts, you can recreate your analysis.

Now, imagine doing this without a recipe. It’d be chaotic! But when you hand over the recipe card, others can reproduce your dish. Likewise, when you share your scripts, others can reproduce your analysis. Of course, you’d need to provide the same groceries too—just like reproducible analysis requires you to share the data alongside your scripts.

Scripts are recipe cards (Viz by Soundarya)

Lastly, let’s talk about tidying up. When cooking, you clear the clutter—cleaning the workspace so you’re ready for the next task. In data analysis, this is like closing your session with a clean workspace, leaving only the scripts and data ready for use. This ensures the next analysis uses the most recent data and scripts, much like making sure your meal uses fresh ingredients and the latest recipe.

References

Kitzes, Justin, Daniel Turek, and Fatma Deniz, eds. 2018. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. University of California Press.