A collection of reusable Python utilities for data loading, exploratory data analysis (EDA), and data cleaning — built as part of the Google Advanced Data Analytics Professional Certificate.
python-data-fundamentals/
├── src/
│ ├── data_loader.py # CSV loading, validation, dataset description
│ ├── eda_toolkit.py # Summary stats, distributions, correlations, outlier detection
│ └── data_cleaning.py # Deduplication, missing values, type conversion, outlier clipping
├── notebooks/
│ ├── 01_python_basics_demo.ipynb
│ └── 02_pandas_eda_walkthrough.ipynb
├── data/
│ └── c2_epa_air_quality.csv
├── requirements.txt
└── README.md
| Module | Highlights |
|---|---|
data_loader |
load_csv() with auto-preview, validate_dataframe() with missing-data thresholds, describe_dataset() |
eda_toolkit |
Extended summary_statistics() (IQR, skew, kurtosis), distribution/boxplot/correlation plots, IQR-based outlier detection |
data_cleaning |
Duplicate removal, 5 missing-value strategies (drop/mean/median/mode/ffill), dtype conversion, quantile-based outlier clipping |
EPA Air Quality Index (AQI) — 1,725 observations of air quality measurements across U.S. states and counties.
| Column | Description |
|---|---|
state_name |
U.S. state |
county_name |
County within the state |
aqi |
Air Quality Index value |
# Clone the repository
git clone https://github.com/asenabeshiktepeli/python-data-fundamentals.git
cd python-data-fundamentals
# Install dependencies
pip install -r requirements.txt
# Quick usage
python -c "
from src.data_loader import load_csv, validate_dataframe
df = load_csv('data/c2_epa_air_quality.csv')
print(validate_dataframe(df))
"- Python Basics Demo — Variables, control flow, functions, and list comprehensions demonstrated with real data
- Pandas EDA Walkthrough — End-to-end exploratory analysis of the EPA Air Quality dataset using the
src/utilities
- Python 3.10+
- pandas, NumPy
- matplotlib, seaborn
This project is for educational and portfolio purposes. The EPA Air Quality dataset is publicly available from the U.S. Environmental Protection Agency.