Deleted job causes unrelated datasets to appear in lineage graph #3086
srimathithangaraj
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
After deleting a job that connects two parts of a lineage graph, datasets and jobs from the disconnected part still appear in lineage queries. This happens because the lineage traversal uses the deleted job's I/O mappings even though the job itself is hidden.
Steps to Reproduce
Create a lineage chain:
d1 → job1 → d2 → job3 → d3 → job2 → d4
Where:
job1 produces d1 and d2
job3 consumes d2 and produces d3
job2 consumes d3 and produces d4
Delete job3
Query lineage for d1:
Expected Behavior
After deleting job3, the lineage for d1 should only show the directly connected portion:
d1 → job1 → d2
Since job3 (which connects d2 to d3) is deleted, there should be no path to d3, job2, or d4.
Actual Behavior
The lineage for d1 incorrectly includes:
d1→job1→d2
d3→job2→d4
Problem: d3, job2, and d4 appear in the graph even though:
job3 (the only connection from d2 to d3) is deleted
There's no visible path explaining how these nodes are related to d1
Beta Was this translation helpful? Give feedback.
All reactions