fix: escape Rich markup in node strings and use rich_format for catal...#5436
fix: escape Rich markup in node strings and use rich_format for catal...#5436Br1an67 wants to merge 2 commits intokedro-org:mainfrom
Conversation
ElenaKhaustova
left a comment
There was a problem hiding this comment.
Thank you @Br1an67 for your contribution!
I made a first pass, and the DataCatalog fix looks good to me.
However, the Node fix has a fundamental design flaw (see the comment), and some tests need to be updated as now they are causing CI failures: for example, tests/pipeline/test_pipeline.py test_full should fail as not updated for escaped brackets.
| def __str__(self) -> str: | ||
| def _set_to_str(xset: set | list[str]) -> str: | ||
| return f"[{';'.join(xset)}]" | ||
| return f"\\[{';'.join(xset)}\\]" |
There was a problem hiding this comment.
I think this might leak Rich syntax into all non-Rich consumers.
Node.__str__ is a general-purpose method used in many non-Rich contexts:
-
File log handlers —
Node.run()at line 531 logs"Running node: %s", str(self). Non-Rich file handlers will write Running node:identity(\[in]) -> \[out](with literal backslashes) to log files. -
Pipeline.describe(names_only=False)at pipeline.py:510 — returns a plain string that includesstr(node). Users and non-Rich log consumers will see escaped brackets. -
Any user code calling str(node) — debugging, notebooks, error messages, etc.
Here we should probably follow the same approach as with DataCatalog: keep __str__ returning [in] and handle Rich escaping at the log call site.
…og logging - Escape brackets in Node.__str__ to prevent RichHandler from interpreting them as markup tags, which was causing brackets to be swallowed in logs - Use rich_format extra attribute in DataCatalog load/save logging instead of manually adding markup to the message, preventing markup tags from leaking into non-Rich handlers like file logs - Remove _use_rich_markup attribute and unused imports from data_catalog.py - Update tests to expect escaped brackets in node string representation Signed-off-by: Br1an67 <932039080@qq.com>
Signed-off-by: Br1an67 <932039080@qq.com>
cbfa9bf to
5e22563
Compare
Fixes #5199
Description
This PR fixes two issues with Rich logging integration that were causing unexpected behavior in console and file logs:
Console log issue: Brackets around node inputs/outputs (e.g.,
[companies]) were being swallowed by RichHandler because they were interpreted as Rich markup tags.File log issue: Rich markup tags like
[dark_orange]X_train[/dark_orange]were leaking into file logs because the markup was being added directly to the log message, which was then sent to all handlers including non-Rich file handlers.Development notes
Changes made:
kedro/pipeline/node.py: Escape brackets inNode.__str__using\[and\]so that RichHandler renders them as literal brackets instead of interpreting them as markup.kedro/io/data_catalog.py:rich_formatextra attribute instead of manually adding markup to log messages_use_rich_markupattribute and unused imports (_format_rich,_has_rich_handler)rich_formatmechanism is handled by Kedro's customRichHandler, which formats only the specified arguments with colors, while non-Rich handlers receive the unformatted messageTests updated:
tests/pipeline/test_node.py: Updated to expect escaped brackets in node string representationtests/framework/project/test_logging.py: Updated test to reflect the new approach where markup is never in the message itselfHow it works:
Node.__str__returnsidentity(\[in\]) -> \[out\], RichHandler withmarkup=Truerenders this asidentity([in]) -> [out](correct display)extra={"rich_format": ["dark_orange"]}, only RichHandler applies the color formatting, while file handlers receive the plain dataset nameChecklist
RELEASE.mdfile