Replies: 10 comments
-
|
Hey! This is expected behavior from both the OpenLineage spec and Marquez, but you have a better option than A
The better option: JobEvent
We verified this works end-to-end. For your view chain (e.g., {
"eventTime": "2024-01-01T10:00:00.000Z",
"job": { "namespace": "your-namespace", "name": "view_a" },
"inputs": [{ "namespace": "your-namespace", "name": "raw_table" }],
"outputs": [{ "namespace": "your-namespace", "name": "view_a_dataset" }],
"producer": "https://your-system",
"schemaURL": "https://openlineage.io/spec/2-0-0/OpenLineage.json#/definitions/JobEvent"
}{
"eventTime": "2024-01-01T10:01:00.000Z",
"job": { "namespace": "your-namespace", "name": "view_b" },
"inputs": [{ "namespace": "your-namespace", "name": "view_a_dataset" }],
"outputs": [{ "namespace": "your-namespace", "name": "view_b_dataset" }],
"producer": "https://your-system",
"schemaURL": "https://openlineage.io/spec/2-0-0/OpenLineage.json#/definitions/JobEvent"
}The key detail is the The lineage is then queryable from any node:
Lineage in OpenLineage is always expressed through a job that connects inputs to outputs. But you don't need runs, Looking ahead: first-class static lineage supportWhile |
Beta Was this translation helpful? Give feedback.
-
|
Hey! I've been doing several tests and I have not been able to obtain the columnlineage using JobEvents. { The JSON contains both the the input schemas and the output schema, and also the columnlineage but,, the following request in Marquez returns an empty graph:
The view column lineage links do not appear: Even though the event contains the input and output schemas, Marquez does not show them: However, when we use this RunEvent: { The Marquez http://localhost:3000/api/v1/column-lineage?nodeId=dataset:book04:books_sales_details request returns a graph:
In the Marquez UI the view column lineage links are available: And all the datasets have the schema: As you can see, the only difference between the JSON of the JobEvent and the JSON of the RunEvent is the schemaURL, the event type, run and namespace but the input and output schemas and the column lineage of the views are the same. There is something wrong with the JobEvent I am creating? Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
good catch. I will do some tests. |
Beta Was this translation helpful? Give feedback.
-
|
Hi! Any news on this? Thanks. |
Beta Was this translation helpful? Give feedback.
-
|
yup, sorry. It might be a bug. Which version of marquez are you using ? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I using the 0.51.1 version. |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I have tried version 0.50.0 and the results have been the same. |
Beta Was this translation helpful? Give feedback.
-
|
yup, I'm investigating this. I'll try to publish a fix together with the next version. |
Beta Was this translation helpful? Give feedback.
-
|
check this version https://github.com/ilum-cloud/marquez or that docker image ilum/marquez:0.54.0 |
Beta Was this translation helpful? Give feedback.
-
|
This version works properly. Thanks!!!! |
Beta Was this translation helpful? Give feedback.






Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
We are trying to do static modeling of a data virtualization system with many views that depend on each other using DatasetEvent. When we make the request in Marquez using those DatasetEvents we cannot see the lineage of the views, however if we use EventRun we do see the lineage.
It surprises us that despite being a static model, we need to add Jobs and Runs to be able to visualize the lineage we want in Marquez.
Is this normal from OpenLineage's point of view? Or is it something that Marquez demands because it is not completely adapted to a static lineage?
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions