Skip to content

xlab-ub/saber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SABER: A SQL-Compatible Semantic Document Processing System Based on Extended Relational Algebra

SABER is a research system that integrates multiple semantic document processing frameworks (LOTUS, DocETL, Palimpzest) with a unified SQL-compatible interface.

Installation

Development Installation

git clone https://github.com/xlab-ub/saber.git
cd saber
pip install -e .[all]

Handling Dependency Conflicts

If you encounter dependency conflicts:

Use conda environments and force installation scripts:

conda create -n saber python=3.12 -y
conda activate saber
git clone https://github.com/xlab-ub/saber.git
cd saber
./scripts/install_all_force.sh

Running Examples

Semantic Operations Examples

Semantic WHERE

Filter rows based on semantic conditions rather than exact matches.

python examples/semantic_ops_examples/semantic_where.py

Semantic SELECT

Extract and transform columns using semantic understanding and natural language instructions.

python examples/semantic_ops_examples/semantic_select.py

Semantic JOIN

Join tables based on semantic relationships rather than exact key matches.

python examples/semantic_ops_examples/semantic_join.py

Semantic GROUP BY

Group records by semantic similarity or conceptual categories.

python examples/semantic_ops_examples/semantic_group_by.py

Semantic AGGREGATION

Perform aggregations with semantic understanding of the data.

python examples/semantic_ops_examples/semantic_aggregation.py

Semantic ORDER BY

Sort results based on semantic criteria like relevance, similarity, or conceptual ordering.

python examples/semantic_ops_examples/semantic_order_by.py

Semantic DISTINCT

Remove duplicates based on semantic similarity rather than exact matches.

python examples/semantic_ops_examples/semantic_distinct.py

Semantic INTERSECT (ALL) and EXCEPT (ALL)

Perform semantic (INTERSECT, EXCEPT) operations based on semantic relationships.

# Semantic INTERSECT - Find semantically overlapping records
python examples/semantic_ops_examples/semantic_intersect.py
python examples/semantic_ops_examples/semantic_intersect_all.py

# Semantic EXCEPT - Find semantically different records  
python examples/semantic_ops_examples/semantic_except.py
python examples/semantic_ops_examples/semantic_except_all.py

Unified Query Examples

Backend-Agnostic Semantic Query Rewriting

Demonstrates how SABER automatically rewrites backend-free semantic queries to work with different Semantic Data Processing Systems (LOTUS, DocETL, Palimpzest) without requiring users to modify their code.

python examples/unified_query_examples/unified_query.py

Citation

If you find this code useful, please consider citing our paper:

@misc{lee2025sabersqlcompatiblesemanticdocument,
      title={SABER: A SQL-Compatible Semantic Document Processing System Based on Extended Relational Algebra}, 
      author={Changjae Lee and Zhuoyue Zhao and Jinjun Xiong},
      year={2025},
      eprint={2509.00277},
      archivePrefix={arXiv},
      primaryClass={cs.DB},
      url={https://arxiv.org/abs/2509.00277}, 
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors