Danylo16/activity-classification-ml
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
## Activity Classification (IMU-only Baseline) This repository contains a clean baseline pipeline for human activity classification using wearable sensor data (accelerometer, gyroscope, and optionally heart rate / skin temperature). The pipeline is written in Python, modular and reproducible, and implements a set of time-, frequency- and rhythm-based features, evaluated on synthetic wearable sessions. --- ## Project structure ├── data/ │ ├── raw/ # generated sensor sessions (CSV) │ ├── features/ # extracted per-session features │ └── dataset.csv # merged dataset ├── notebooks/ │ ├── 01_signal_exploration.ipynb # EDA сигналів (час/частота/ритм) │ └── 02_modeling_baselines.ipynb # моделі, спліти, метрики, важливості фіч ├── src/ │ ├── utils_signal.py # feature functions (signal ops) │ ├── features.py # feature extraction CLI │ ├── build_dataset.py # dataset builder │ ├── train_rf.py # RandomForest training │ └── eval.py # evaluation script └── artifacts/ # trained model, reports, feature importance ## ⚙️ How to reproduce ```bash # 1. Generate synthetic sessions (if needed) python ..\ml-wearable-data-prep\src\generate_data.py --minutes 20 --fs 50 --seed 1 --out data/raw/session_01.csv python ..\ml-wearable-data-prep\src\generate_data.py --minutes 20 --fs 50 --seed 2 --out data/raw/session_02.csv python ..\ml-wearable-data-prep\src\generate_data.py --minutes 20 --fs 50 --seed 3 --out data/raw/session_03.csv # 2. Extract IMU-only features python -m src.features --csv data/raw/session_01.csv --session_id s1 --imu_only --out data/features/s1.csv python -m src.features --csv data/raw/session_02.csv --session_id s2 --imu_only --out data/features/s2.csv python -m src.features --csv data/raw/session_03.csv --session_id s3 --imu_only --out data/features/s3.csv ## 3. Build dataset and train model python -m src.build_dataset --glob "data/features/*.csv" --out data/dataset.csv python -m src.train_rf --features data/dataset.csv --outdir artifacts --group_by session # 4. Evaluate python -m src.eval --model artifacts/rf_model.pkl --features data/dataset.csv ``` ## RESULTS | Class | Precision | Recall | F1-score | | ------------- | --------- | -------- | -------- | | running | 0.88 | 1.00 | 0.94 | | sitting | 1.00 | 1.00 | 1.00 | | stairs | 0.79 | 0.54 | 0.64 | | walking | 1.00 | 0.98 | 0.99 | | **Macro avg** | **0.92** | **0.88** | **0.89** | | **Accuracy** | **0.96** | | | Macro F1: 0.66 (cross-session, group-wise split) Weighted F1: 0.96 Model: RandomForest (400 trees, balanced weights, IMU-only features) ## Top 10 features | Rank | Feature | Importance | | ---- | -------------------- | ---------- | | 1 | `az_peak_freq` | 0.062 | | 2 | `mag_energy_1_3` | 0.057 | | 3 | `zcr_az` | 0.053 | | 4 | `jerk_z_std` | 0.047 | | 5 | `planar_over_z` | 0.044 | | 6 | `acf_first_peak_lag` | 0.041 | | 7 | `varz_over_varm` | 0.040 | | 8 | `hurst_mag` | 0.037 | | 9 | `az_energy_1_3` | 0.035 | | 10 | `jm_std` | 0.032 | ## NOTES Evaluation is group-wise (by session) — prevents data leakage across overlapping windows. Features include frequency, energy, rhythm, jerk, variance ratios and ACF-based periodicity. Achieves robust separation for walking, running, sitting; stairs remains partially overlapping.