This repository contains a reproducible exploratory data analysis (EDA) of the Edible Plants Database (EPD), conducted as a training exercise in data analysis and reproducibility for the KODAQS Data Quality Academy Certificate Program. The analysis uses R and Quarto within the Positron IDE, with Git for version control and GitHub for the public repository.
The Edible Plants Database originates from the GROW Observatory, a European Citizen Science project. It documents 140 edible plant species and their ideal growing conditions (sunlight, water, soil, pH, temperature, cultivation class, time to germination/harvest, and nutritional information). The dataset was curated for the TidyTuesday project (2026, Week 5) by Nicola Rennie, and is available here.
The analysis focuses on basic descriptive statistics and two questions proposed by TidyTuesday:
- Do plants that require more sunlight also require higher temperatures?
- What cultivation classes require the most water?
- TidyTuesday repository: https://github.com/rfordatascience/tidytuesday/tree/main/data/2026/2026-02-03
- Original source: Edible Plant Database — University of Dundee
- Direct CSV download: https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2026/2026-02-03/edible_plants.csv
kodaqs-d1/
├── data/ # Raw data (edible_plants.csv)
├── processed_data/ # Cleaned dataset (edible_plants_cleaned.csv)
├── src/ # Individual R source scripts
│ ├── 00_setup.R # Environment setup, packages, theme
│ ├── 01_data.R # Data import
│ ├── 02_codebook.R # Variable codebook definition
│ ├── 03_cleaning.R # Data cleaning and export
│ ├── 04_helpers.R # Helper functions (save tables/plots)
│ ├── 05_descriptive-univariate.R # Univariate statistics & plots
│ ├── 06_descriptive-bivariate.R # Bivariate statistics & plots
│ ├── 07_rq1-sunlight-temperature.R # RQ1: Sunlight vs Temperature
│ └── 08_rq2-cultivation-water.R # RQ2: Cultivation vs Water
├── results/ # Output tables (.html) and plots (.svg)
├── scripts/
│ └── analysis_script.qmd # Main Quarto analysis document
├── .gitignore
├── README.md
└── LICENSE
- Clone this repository.
- Check for
edible_plants.csvin thedata/folder, download it from the TidyTuesday GitHub yourself, or let the provided script download it automatically. - Open the project folder in Positron (or RStudio) so that
here::here()resolves paths correctly. - Packages install automatically via
pacmanon first run. - Render the analysis:
quarto render scripts/analysis_script.qmd
| Component | Version / Details |
|---|---|
| R | 4.5.1 |
| Quarto | 1.8.25 |
| IDE | Positron 2026.02.0 build 139 (with Git & GitHub integration) |
| Key Rpackages | tidyverse, here, pacman, janitor, skimr, modelsummary, tinytable, khroma, ggcorrplot, cowplot, knitr, sessioninfo |
A full session information printout (including exact package versions and platform details) is included at the end of the rendered analysis_script.qmd document.
This project is licensed under the CC-BY-NC-SA 4.0 — see LICENSE.
The underlying data is shared under a CC0 1.0 Universal (Public Domain) dedication via TidyTuesday. It contains no personal or sensitive information (only botanical and agricultural properties of edible plant species.)