Exploring GANs to generate synthetic images from Google Quick, Draw! (e.g., “coffee cup”). This repository is a lab/portfolio for MLOps and data science, covering data collection and processing, adversarial modeling, sample generation, experiment versioning with MLflow, and CI integration.
- Badge: CCDS Template
- https://cookiecutter-data-science.drivendata.org/
- Learn and demonstrate DS/MLOps project best practices.
- Train a simple DCGAN and generate synthetic images (class “coffee cup”).
- Track experiments with MLflow (local and optional Docker server).
- Ensure a minimum quality bar via tests and CI (GitHub Actions).
data/: data at different stages.docs/: additional documentation (optional mkdocs).mlflow-server/: Docker environment for MLflow + Postgres + MinIO.mlruns/: MLflow’s default local store (local tracking URI).models/: trained artifacts (pkl/pth).notebooks/: Jupyter notebooks (includesquick_start.ipynb).references/: supporting materials.reports/: generated outputs and figures.src/: project source code.tests/: automated tests (pytest)..github/workflows/ci.yaml: CI pipeline (installs and runs tests).params.yaml: hyperparameters (image size, epochs, etc.).Makefile: automation targets (install, test, lint, format…).pyproject.toml: project dependencies and config (Poetry/PEP 621).
- Python 3.10
- Poetry (environment and dependency management)
- Git
- Docker and Docker Compose (optional, for the MLflow server)
- CUDA GPU (optional; speeds up training)
# 1) Clone the repository
git clone https://github.com/moreira-and/gan-lab-quickdraw.git
cd gan-lab-quickdraw
# 2) Create/select the Poetry environment (Python 3.10)
poetry env use 3.10
# 3) Install dependencies
poetry install
# 4) (Optional) Activate Poetry shell
poetry shell
# 5) Validate installation (tests)
poetry run pytestThe notebook orchestrates the full flow: download data, process features, train the DCGAN, generate a figure, and log artifacts to MLflow (local tracking by default).
# Launch Jupyter
poetry run jupyter lab
# or
poetry run jupyter notebookOpen notebooks/quick_start.ipynb and run the cells in order.
- Load Raw Data: downloads ~N “coffee cup” drawings from Quick, Draw! and saves them to
data/raw/coffee_cup. - Features: transforms images into tensors (grayscale, resize) and saves them per class in
data/processed. - Model Training: trains the DCGAN (hyperparams in
params.yaml) and saves artifacts tomodels/. - Generate: produces a synthetic image and saves it to
reports/figures/generated_image.png. - Run Experiment: logs artifacts and parameters to MLflow (local tracking in
mlruns/).
data/raw/coffee_cup/: downloaded PNGs.data/processed/<class>/:.pttensors per class.models/: generator/discriminator artifacts.reports/figures/generated_image.png: generated sample.mlruns/: MLflow local directory (default tracking store).
-
params.yaml: key hyperparameters (e.g.,dataset.batch_size,model.generator.latent_dim,train.epochs). -
src/config.py: project paths and MLflow default tracking URI atmlruns/.Note: by default, tracking is local (
mlruns). To point to an external server, adjust the tracking URI in this file.
- The code sets
mlflow.set_tracking_uritomlruns/.
poetry run mlflow ui --backend-store-uri mlruns --host 0.0.0.0 --port 5000
# Access http://localhost:5000- Adjust variables in
mlflow-server/.envif needed. - Bring services up:
cd mlflow-server
docker compose up -d- Access the UI at
http://localhost:5000 - Point the project to the server (edit
src/config.py):
mlflow.set_tracking_uri("http://localhost:5000")- GitHub Actions pipeline in
.github/workflows/ci.yaml. - Triggers: push and pull request to
main. - Steps: checkout, setup Python 3.10.5, install with Poetry, run pytest.
make requirements # install dependencies (poetry install)
make test # run tests (pytest)
make lint # style checks (flake8, isort, black --check)
make format # format code (isort, black)
make data # download data via src/dataset.py-
data/data/raw/: original data (Quick, Draw!).data/processed/: transformed data (tensors per class).data/interim/,data/external/: reserved for other stages/sources.
-
mlflow-server/mlflow-server/compose.yaml: orchestrates Postgres, MinIO, and MLflow.mlflow-server/.env: config for credentials/ports/bucket.
-
notebooks/notebooks/quick_start.ipynb: end-to-end project flow.
-
src/src/config.py: sets paths, loadsparams.yaml, defines tracking URI.src/dataset.py: downloads and saves Quick, Draw! sketches.src/features.py: transforms and saves tensors per class.src/modeling/: models (Generator/Discriminator), training, and generation.src/plots.py: visualizations from processed data.
-
tests/: unit tests (e.g.,tests/test_noise.py). -
.github/workflows/ci.yaml: CI pipeline.
- Training: per
params.yaml(epochs=100 by default); GPU recommended. - Environment: project targets Python >= 3.10 and uses Poetry; the notebook adds
src/tosys.pathautomatically.
- MIT License (see
LICENSE).
- This repository is part of my portfolio. Questions or suggestions are welcome via issues/PRs.