GitHub - Danylo16/activity-classification-ml: Baseline machine-learning models for classifying human activity using wearable sensor features. Includes preprocessing, feature selection, evaluation metrics, and visualization

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebooks		notebooks
src		src
.gitignore		.gitignore
README		README

Repository files navigation

## Activity Classification (IMU-only Baseline)

This repository contains a clean baseline pipeline for human activity classification using wearable sensor data (accelerometer, gyroscope, and optionally heart rate / skin temperature).

The pipeline is written in Python, modular and reproducible, and implements a set of time-, frequency- and rhythm-based features, evaluated on synthetic wearable sessions.

---

## Project structure

├── data/
│   ├── raw/                # generated sensor sessions (CSV)
│   ├── features/           # extracted per-session features
│   └── dataset.csv         # merged dataset
├── notebooks/
│   ├── 01_signal_exploration.ipynb   # EDA сигналів (час/частота/ритм)
│   └── 02_modeling_baselines.ipynb   # моделі, спліти, метрики, важливості фіч
├── src/
│   ├── utils_signal.py     # feature functions (signal ops)
│   ├── features.py         # feature extraction CLI
│   ├── build_dataset.py    # dataset builder
│   ├── train_rf.py         # RandomForest training
│   └── eval.py             # evaluation script
└── artifacts/              # trained model, reports, feature importance




## ⚙️ How to reproduce

```bash
# 1. Generate synthetic sessions (if needed)
python ..\ml-wearable-data-prep\src\generate_data.py --minutes 20 --fs 50 --seed 1 --out data/raw/session_01.csv
python ..\ml-wearable-data-prep\src\generate_data.py --minutes 20 --fs 50 --seed 2 --out data/raw/session_02.csv
python ..\ml-wearable-data-prep\src\generate_data.py --minutes 20 --fs 50 --seed 3 --out data/raw/session_03.csv

# 2. Extract IMU-only features
python -m src.features --csv data/raw/session_01.csv --session_id s1 --imu_only --out data/features/s1.csv
python -m src.features --csv data/raw/session_02.csv --session_id s2 --imu_only --out data/features/s2.csv
python -m src.features --csv data/raw/session_03.csv --session_id s3 --imu_only --out data/features/s3.csv

## 3. Build dataset and train model
python -m src.build_dataset --glob "data/features/*.csv" --out data/dataset.csv
python -m src.train_rf --features data/dataset.csv --outdir artifacts --group_by session

# 4. Evaluate
python -m src.eval --model artifacts/rf_model.pkl --features data/dataset.csv

```
## RESULTS

| Class         | Precision | Recall   | F1-score |
| ------------- | --------- | -------- | -------- |
| running       | 0.88      | 1.00     | 0.94     |
| sitting       | 1.00      | 1.00     | 1.00     |
| stairs        | 0.79      | 0.54     | 0.64     |
| walking       | 1.00      | 0.98     | 0.99     |
| **Macro avg** | **0.92**  | **0.88** | **0.89** |
| **Accuracy**  | **0.96**  |          |          |

Macro F1: 0.66 (cross-session, group-wise split)
Weighted F1: 0.96
Model: RandomForest (400 trees, balanced weights, IMU-only features)

## Top 10 features

| Rank | Feature              | Importance |
| ---- | -------------------- | ---------- |
| 1    | `az_peak_freq`       | 0.062      |
| 2    | `mag_energy_1_3`     | 0.057      |
| 3    | `zcr_az`             | 0.053      |
| 4    | `jerk_z_std`         | 0.047      |
| 5    | `planar_over_z`      | 0.044      |
| 6    | `acf_first_peak_lag` | 0.041      |
| 7    | `varz_over_varm`     | 0.040      |
| 8    | `hurst_mag`          | 0.037      |
| 9    | `az_energy_1_3`      | 0.035      |
| 10   | `jm_std`             | 0.032      |

## NOTES

Evaluation is group-wise (by session) — prevents data leakage across overlapping windows.

Features include frequency, energy, rhythm, jerk, variance ratios and ACF-based periodicity.

Achieves robust separation for walking, running, sitting; stairs remains partially overlapping.

About

Baseline machine-learning models for classifying human activity using wearable sensor features. Includes preprocessing, feature selection, evaluation metrics, and visualization

machine-learning sklearn activity-recognition randomforest classification wearable signal-analysis

Readme