Skip to content

📣 Optimize the knn and geometric-knn causation entropy network discovery algorithms by using a KD-Tree instead of a distance matrix #6

@kslote1

Description

@kslote1

Feature Request

The current implementation of discover_network computes pairwise distances via a full distance matrix, which scales as O(N^2 T log T).

This becomes a major bottleneck for large datasets. I propose adding an optional kd_tree=True parameter to leverage KD-Tree–based neighbor searches for both kNN and geometric-kNN causation entropy algorithms.

Mathematical Motivation

  • kNN and geometric-kNN causation entropy rely on identifying nearest neighbors in phase space or within a radius r.

  • Using a brute-force distance matrix requires evaluating all pairwise distances even though only local neighborhoods are needed.

  • A KD-Tree (or Ball Tree) data structure can reduce nearest-neighbor queries to approximately O(N log N) in low-to-moderate dimensions.

This optimization is mathematically equivalent: the same neighborhood sets are found, but with much faster query times.

Proposed API Change

discover_network(data, method="knn", k=5, kd_tree=True, ...)

  • kd_tree=True: Use scipy.spatial.KDTree (or sklearn.neighbors.KDTree) for neighbor search.

  • kd_tree=False: Fall back to the existing distance-matrix implementation.

Benefits

Significant speedup for large N, especially when repeatedly computing neighborhoods during causation entropy estimation.

Reduced memory footprint (no need to store the full distance matrix).

Backward compatible: default behavior remains unchanged.

Next Steps

Implement KD-Tree option inside neighbor search routines for kNN and geometric-kNN.

Add unit tests ensuring identical results between kd_tree=True and kd_tree=False.

Benchmark runtime scaling on synthetic datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions