SABER is a research system that integrates multiple semantic document processing frameworks (LOTUS, DocETL, Palimpzest) with a unified SQL-compatible interface.
git clone https://github.com/xlab-ub/saber.git
cd saber
pip install -e .[all]If you encounter dependency conflicts:
Use conda environments and force installation scripts:
conda create -n saber python=3.12 -y
conda activate saber
git clone https://github.com/xlab-ub/saber.git
cd saber
./scripts/install_all_force.shFilter rows based on semantic conditions rather than exact matches.
python examples/semantic_ops_examples/semantic_where.pyExtract and transform columns using semantic understanding and natural language instructions.
python examples/semantic_ops_examples/semantic_select.pyJoin tables based on semantic relationships rather than exact key matches.
python examples/semantic_ops_examples/semantic_join.pyGroup records by semantic similarity or conceptual categories.
python examples/semantic_ops_examples/semantic_group_by.pyPerform aggregations with semantic understanding of the data.
python examples/semantic_ops_examples/semantic_aggregation.pySort results based on semantic criteria like relevance, similarity, or conceptual ordering.
python examples/semantic_ops_examples/semantic_order_by.pyRemove duplicates based on semantic similarity rather than exact matches.
python examples/semantic_ops_examples/semantic_distinct.pyPerform semantic (INTERSECT, EXCEPT) operations based on semantic relationships.
# Semantic INTERSECT - Find semantically overlapping records
python examples/semantic_ops_examples/semantic_intersect.py
python examples/semantic_ops_examples/semantic_intersect_all.py
# Semantic EXCEPT - Find semantically different records
python examples/semantic_ops_examples/semantic_except.py
python examples/semantic_ops_examples/semantic_except_all.pyDemonstrates how SABER automatically rewrites backend-free semantic queries to work with different Semantic Data Processing Systems (LOTUS, DocETL, Palimpzest) without requiring users to modify their code.
python examples/unified_query_examples/unified_query.pyIf you find this code useful, please consider citing our paper:
@misc{lee2025sabersqlcompatiblesemanticdocument,
title={SABER: A SQL-Compatible Semantic Document Processing System Based on Extended Relational Algebra},
author={Changjae Lee and Zhuoyue Zhao and Jinjun Xiong},
year={2025},
eprint={2509.00277},
archivePrefix={arXiv},
primaryClass={cs.DB},
url={https://arxiv.org/abs/2509.00277},
}