Can Espectre be extended to provide bounding boxes or skeleton tracking for detected motion? #73
-
|
Espectre provides strong motion and anomaly detection using spatio-temporal statistics, but it does not output spatial coordinates (bounding boxes or keypoints). I am exploring whether: Espectre can be extended to approximate person localization (e.g., bounding box via spatial variance maps), or Espectre is better used as a motion trigger combined with a downstream object detection or pose estimation model. My target use case is mobile camera input (non-VR), where a moving person is detected and visually tracked with a bounding box or skeleton overlay. Has anyone experimented with: Using Espectre’s spatial features as an ROI selector? Combining Espectre with MediaPipe / YOLO / OpenPose? Mathematical approaches to infer approximate position from spatio-temporal variance? Any insights or prior work would be appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Hi @sanjulaonline, this is a very interesting topic! That said, its spatio-temporal statistics can be interpreted as motion / saliency signals: For your use case, the most practical approach is: Espectre as motion / attention trigger → downstream CV model Without a learned spatial model, variance-based methods cannot reliably infer: So Espectre works well as an anomaly / motion detector or attention mechanism, but localization and pose estimation are better handled by standard vision models downstream. Note: if by “localization” you mean physical position estimation from RF signals, that is a different problem space. WiFi CSI–based localization typically requires ≥3 phase-synchronized devices to estimate Angle of Arrival (AoA) and triangulate 2D/3D coordinates: YouTube Video |
Beta Was this translation helpful? Give feedback.
Hi @sanjulaonline, this is a very interesting topic!
Espectre does not output explicit spatial coordinates (bounding boxes or keypoints), and it is not designed as a geometric localizer.
That said, its spatio-temporal statistics can be interpreted as motion / saliency signals:
• Spatial variance or energy maps may provide a very coarse ROI, e.g. via thresholding and connected-component clustering
• This is inherently noisy and unstable (background motion, lighting changes, overlapping motion), so it should be seen as a weak spatial prior, not a reliable localization method
For your use case, the most practical approach is:
Espectre as motion / attention trigger → downstream CV model
• Use…