A production-ready MLOps structure for the waste classification project with:
- Modular pipelines in
src/ - Experiment tracking and artifacts via Weights & Biases (W&B)
- Type-safe configs via Pydantic
- W&B Hyperparameter Sweeps
- DVC pipeline scaffolding
- CI (GitHub Actions) with linting and optional training
- Reproducible environments with pinned
requirements.txt - Containerization via Docker
src/data/data_loader.py— dataset creation, class mapping, and augmentationssrc/model/model_builder.py— DenseNet121 model buildersrc/pipelines/train_pipeline.py— training with W&B logging and artifact exportsrc/pipelines/evaluation_pipeline.py— evaluation for Keras and TFLite models with W&B loggingsrc/pipelines/inference_pipeline.py— webcam inference using TFLite or Keras modelrequirements.txt— pinned dependenciesDockerfile— container for training.dockerignore— exclude local artifacts and data by default
- Python 3.10+ recommended
- For GPU (optional): Proper NVIDIA drivers + CUDA/cuDNN compatible with your TF version
- A W&B account: https://wandb.ai/
- Create a virtual environment and install dependencies:
python -m venv .venv
. .venv/Scripts/activate # Windows PowerShell: .venv\Scripts\Activate.ps1
pip install --upgrade pip
pip install -r requirements.txt- Login to W&B:
wandb login- Prepare data directory (expected default):
RealWaste/
├─ ClassA/
│ ├─ img1.jpg
│ └─ ...
├─ ClassB/
└─ ...
During training, original classes are mapped to 5 target classes: ['Organic','Inorganic','Metal','Electronics','Others'] (see DataHandler.class_mapping).
python -m src.pipelines.train_pipelineEnvironment variables (via W&B config) you can change directly in train_pipeline.py main:
data_dir(defaultRealWaste)img_size(default(224,224))batch_size(default32)epochs(default10)learning_rate(default1e-3)with_augmentation(defaultTrue)model_path(defaultwaste_classifier.h5)tflite_path(defaultwaste_classifier.tflite)quantize(defaultTrue)
Artifacts (H5 and TFLite) are logged to W&B automatically as Artifacts.
Config is type-safe via pydantic in src/config/schemas.py:
- Override parameters by editing code, via environment, or W&B overrides (e.g., when using sweeps).
Model selection/custom models:
- Select backbone by setting
model_namein config (densenet121,resnet50,mobilenetv2).
python -m src.pipelines.evaluation_pipelineLogs a classification report and confusion matrices for Keras and, if present, TFLite model to W&B.
python -m src.pipelines.inference_pipelineNotes:
- Default expects
waste_classifier.tflitein project root. - Preprocessing matches training: exported model includes
preprocess_input, so raw RGB is fed to the interpreter.
- Create sweep:
wandb sweep sweeps/wandb_sweep.yaml- Run an agent (repeat for parallelism):
wandb agent <SWEEP_ID>The sweep can vary learning_rate, batch_size, epochs, img_size_h/w, and model_name.
Initialize DVC in the repo (one-time):
dvc initRun stages:
dvc repro train
dvc repro evaluateBy default we do not version the raw RealWaste/ folder (see .dvcignore). If you want to track snapshots, add it as a DVC-tracked directory.
Workflow at .github/workflows/ci.yml runs on push/PR:
- Lints with flake8
- Optionally runs training if manually dispatched with
run-training: trueandWANDB_API_KEYsecret set.
To run training from the workflow_dispatch UI, add repository secret WANDB_API_KEY and trigger the workflow with run-training=true.
Build the image:
docker build -t waste-classifier:latest .Run training inside Docker (mount your data and pass W&B API key):
docker run --rm \
-e WANDB_API_KEY=YOUR_WANDB_API_KEY \
-v %cd%/RealWaste:/data/RealWaste \
-v %cd%:/app \
waste-classifier:latestThe default ENTRYPOINT runs src.pipelines.train_pipeline. Data is expected at /data/RealWaste; either change config.data_dir in the pipeline or mount accordingly.
If you want to run evaluation instead:
docker run --rm \
-e WANDB_API_KEY=YOUR_WANDB_API_KEY \
-v %cd%:/app \
waste-classifier:latest \
python -m src.pipelines.evaluation_pipeline- Replace
YOUR_ENTITYintrain_pipeline.pyandevaluation_pipeline.pywith your W&B entity/org. - All key hyperparameters are recorded and artifacts (H5/TFLite) are versioned as W&B Artifacts.
- Add CI/CD (GitHub Actions) to run lint/tests and optionally kick off training jobs
- Add dataset versioning via W&B Artifacts or DVC for raw data snapshots
- Add unit tests for
DataHandlerandcreate_model - Add model registry/promote best models by validation metric