Skip to content

Latest commit

 

History

History
19 lines (13 loc) · 1.09 KB

File metadata and controls

19 lines (13 loc) · 1.09 KB

Steps to Create a Synthetic Data Pipeline

With the graph node and edge YAML configuration, it's easy to set up a flow.
Example: glaive code assistant.

Basic steps:

  • Create a sub-directory under tasks for your use case.
  • Create a graph_config.yaml for your pipeline (nodes, edges, models, etc).
  • Create a task_executor.py for any custom logic or processing.
  • Execute with python main.py --task <your_task> ...
  • Results are stored in output.json in your sub-directory.

Resumable Execution:

In the event of a failure, the process can gracefully shut down and later resume execution from the point of interruption. To activate resumable execution, set the flag --resume True when running your command. For instance: python main.py --task <your_task> ... --resume True.

See the Graph Configuration Guide for detailed schema, examples, and best practices for defining graphs, tasks, and processors.