-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Feature Request
The current implementation of discover_network computes pairwise distances via a full distance matrix, which scales as O(N^2 T log T).
This becomes a major bottleneck for large datasets. I propose adding an optional kd_tree=True parameter to leverage KD-Tree–based neighbor searches for both kNN and geometric-kNN causation entropy algorithms.
Mathematical Motivation
-
kNN and geometric-kNN causation entropy rely on identifying nearest neighbors in phase space or within a radius
r. -
Using a brute-force distance matrix requires evaluating all pairwise distances even though only local neighborhoods are needed.
-
A KD-Tree (or Ball Tree) data structure can reduce nearest-neighbor queries to approximately
O(N log N)in low-to-moderate dimensions.
This optimization is mathematically equivalent: the same neighborhood sets are found, but with much faster query times.
Proposed API Change
discover_network(data, method="knn", k=5, kd_tree=True, ...)
-
kd_tree=True: Use scipy.spatial.KDTree (or sklearn.neighbors.KDTree) for neighbor search. -
kd_tree=False: Fall back to the existing distance-matrix implementation.
Benefits
Significant speedup for large N, especially when repeatedly computing neighborhoods during causation entropy estimation.
Reduced memory footprint (no need to store the full distance matrix).
Backward compatible: default behavior remains unchanged.
Next Steps
Implement KD-Tree option inside neighbor search routines for kNN and geometric-kNN.
Add unit tests ensuring identical results between kd_tree=True and kd_tree=False.
Benchmark runtime scaling on synthetic datasets.