Skip to content

Graph embedding for centrality measures approximation

License

Notifications You must be signed in to change notification settings

sashakolpakov/graphem

 
 

Repository files navigation

graphem logo

Graph embedding and node influence maximization

License: MIT Python 3.8+ Pepy Total Downloads DOI badge

PyPI CI Docs Docs Status

Features

  • Graph Embedding: Laplacian-based layout with force-directed refinement
  • JAX Backend: GPU/TPU acceleration for large graphs
  • Influence Maximization: Novel embedding-based seed selection algorithm
  • Graph Generators: Standard models (Erdős–Rényi, Barabási–Albert, Watts-Strogatz, etc.)
  • Visualization: Interactive 2D/3D plots with Plotly
  • Benchmarking: Centrality correlation analysis and performance testing
  • Datasets: Built-in loaders for SNAP and Network Repository datasets

Installation

pip install graphem-jax

Note: For GPU or TPU acceleration, JAX needs to be specifically installed with hardware support. See the JAX documentation for more details on enabling GPU/TPU support.

From source:

pip install git+https://github.com/igorrivin/graphem.git

Quick Start Open in Colab

Graph Embedding

import graphem as ge

# Generate graph (returns sparse adjacency matrix)
adjacency = ge.generate_er(n=500, p=0.01)

# Create embedder
embedder = ge.GraphEmbedder(
    adjacency=adjacency,
    n_components=3
)

# Compute layout
embedder.run_layout(num_iterations=50)

# Visualize
embedder.display_layout()

Influence Maximization

# Select influential nodes
seeds = ge.graphem_seed_selection(embedder, k=10)

# Estimate influence spread
import networkx as nx
G = nx.from_scipy_sparse_array(adjacency)
influence, _ = ge.ndlib_estimated_influence(G, seeds, p=0.1)
print(f"Influence: {influence} nodes ({influence/500:.1%})")

Benchmarking

from graphem.benchmark import benchmark_correlations

# Compare embedding radii with centrality measures
results = benchmark_correlations(
    ge.generate_er,
    graph_params={'n': 200, 'p': 0.05},
    n_components=3,
    num_iterations=40
)

# Display correlation matrix
ge.report_full_correlation_matrix(
    results['radii'],
    results['degree'],
    results['betweenness'],
    results['eigenvector'],
    results['pagerank'],
    results['closeness'],
    results['node_load']
)

Key Components

Core Class

  • GraphEmbedder: Main embedding engine with Laplacian initialization and force-directed layout

Algorithms

  • Graph embedding: Spectral initialization + spring forces + intersection avoidance
  • Influence maximization: Radial distance-based seed selection vs traditional greedy
  • Generators: 12+ graph models including SBM, small-world, scale-free

Datasets

Built-in access to standard network datasets:

  • Stanford Network Analysis Project
  • Network Repository

Examples

The examples/ directory contains:

  • graph_generator_example.py - Generate and visualize various graph embeddings
  • random_regular_example.py - Random regular graph analysis with GraphEm
  • real_world_datasets_example.py - Work with real world datasets (based on Facebook, arXiv, and Wikipedia data)
  • graphem_notebook.ipynb - Interactive Jupyter notebook with examples and visualizations

Testing

GraphEm includes a comprehensive unit test suite that validates all core functionality using the built-in graph generators.

Running Tests

To run the full test suite:

python -m pytest tests/

For verbose output:

python -m pytest tests/ -v

Test Coverage

The test suite covers:

  • Graph Generators (test_generators.py): All built-in graph generators including Erdős-Rényi, Barabási-Albert, Watts-Strogatz, random regular, geometric, caveman, and stochastic block models
  • Graph Embedder (test_embedder.py): Core embedding functionality, layout algorithms, different dimensions, and large graph handling
  • Influence Maximization (test_influence.py): NDLib integration, seed selection, and influence estimation

Test Requirements

Tests require the same dependencies as GraphEm plus:

  • pytest (for running tests)
  • ndlib (for influence maximization tests)

All tests use deterministic seeds for reproducible results.

Benchmarking

Run comprehensive benchmarks:

python run_benchmarks.py

Generates performance tables and correlation analysis in Markdown and LaTeX formats.

Documentation

Full API documentation is available here.

Contributing

Quick start: See CONTRIBUTING.md for essential guidelines.

Detailed guide: contributing documentation for development setup, testing, and contribution guidelines.

Citation

If you use GraphEm in research, please cite our work arXiv

BibTeX:

@misc{kolpakov-rivin-2025fast,
  title={Fast Geometric Embedding for Node Influence Maximization},
  author={Kolpakov, Alexander and Rivin, Igor},
  year={2025},
  eprint={2506.07435},
  archivePrefix={arXiv},
  primaryClass={cs.SI},
  url={https://arxiv.org/abs/2506.07435}
}

APA Style:

Kolpakov, A., & Rivin, I. (2025). Fast Geometric Embedding for Node Influence Maximization. arXiv preprint arXiv:2506.07435.

License

MIT

About

Graph embedding for centrality measures approximation

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%