This document provides a reference for the key classes and functions in the AI4Org codebase.
Wrapper for the TinyLlama causal language model.
Initializes the generator model.
model_name: Hugging Face model identifier.device: 'cuda' or 'cpu'.
Generates text based on the prompt.
Wrapper for the DistilBERT binary classifier.
Initializes the discriminator model.
Returns the probability (0.0 to 1.0) of the positive class.
Handles semantic search.
Retrieves the top-k most relevant document chunks for the query.
Orchestrates the data generation process.
Executes the full pipeline on the input file.
Removes noise, headers, and footers from the text.
Splits text into overlapping chunks while respecting sentence boundaries.