Skip to content

fix: escape Rich markup in node strings and use rich_format for catal...#5436

Open
Br1an67 wants to merge 2 commits intokedro-org:mainfrom
Br1an67:fix/issue-5199-rich-markup-escaping-2
Open

fix: escape Rich markup in node strings and use rich_format for catal...#5436
Br1an67 wants to merge 2 commits intokedro-org:mainfrom
Br1an67:fix/issue-5199-rich-markup-escaping-2

Conversation

@Br1an67
Copy link

@Br1an67 Br1an67 commented Mar 9, 2026

Fixes #5199

Description

This PR fixes two issues with Rich logging integration that were causing unexpected behavior in console and file logs:

  1. Console log issue: Brackets around node inputs/outputs (e.g., [companies]) were being swallowed by RichHandler because they were interpreted as Rich markup tags.

  2. File log issue: Rich markup tags like [dark_orange]X_train[/dark_orange] were leaking into file logs because the markup was being added directly to the log message, which was then sent to all handlers including non-Rich file handlers.

Development notes

Changes made:

  1. kedro/pipeline/node.py: Escape brackets in Node.__str__ using \[ and \] so that RichHandler renders them as literal brackets instead of interpreting them as markup.

  2. kedro/io/data_catalog.py:

    • Use the rich_format extra attribute instead of manually adding markup to log messages
    • Remove the _use_rich_markup attribute and unused imports (_format_rich, _has_rich_handler)
    • The rich_format mechanism is handled by Kedro's custom RichHandler, which formats only the specified arguments with colors, while non-Rich handlers receive the unformatted message
  3. Tests updated:

    • tests/pipeline/test_node.py: Updated to expect escaped brackets in node string representation
    • tests/framework/project/test_logging.py: Updated test to reflect the new approach where markup is never in the message itself

How it works:

  • When Node.__str__ returns identity(\[in\]) -> \[out\], RichHandler with markup=True renders this as identity([in]) -> [out] (correct display)
  • When DataCatalog logs use extra={"rich_format": ["dark_orange"]}, only RichHandler applies the color formatting, while file handlers receive the plain dataset name

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team
 kedro/io/data_catalog.py                | 11 ++++-------
 kedro/pipeline/node.py                  |  2 +-
 tests/framework/project/test_logging.py |  5 +++--
 tests/pipeline/test_node.py             | 22 +++++++++++-----------
 4 files changed, 19 insertions(+), 21 deletions(-)

Copy link
Contributor

@ElenaKhaustova ElenaKhaustova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Br1an67 for your contribution!

I made a first pass, and the DataCatalog fix looks good to me.

However, the Node fix has a fundamental design flaw (see the comment), and some tests need to be updated as now they are causing CI failures: for example, tests/pipeline/test_pipeline.py test_full should fail as not updated for escaped brackets.

def __str__(self) -> str:
def _set_to_str(xset: set | list[str]) -> str:
return f"[{';'.join(xset)}]"
return f"\\[{';'.join(xset)}\\]"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might leak Rich syntax into all non-Rich consumers.

Node.__str__ is a general-purpose method used in many non-Rich contexts:

  • File log handlers — Node.run() at line 531 logs "Running node: %s", str(self). Non-Rich file handlers will write Running node: identity(\[in]) -> \[out] (with literal backslashes) to log files.

  • Pipeline.describe(names_only=False) at pipeline.py:510 — returns a plain string that includes str(node). Users and non-Rich log consumers will see escaped brackets.

  • Any user code calling str(node) — debugging, notebooks, error messages, etc.

Here we should probably follow the same approach as with DataCatalog: keep __str__ returning [in] and handle Rich escaping at the log call site.

Br1an67 added 2 commits March 18, 2026 07:49
…og logging

- Escape brackets in Node.__str__ to prevent RichHandler from interpreting
  them as markup tags, which was causing brackets to be swallowed in logs
- Use rich_format extra attribute in DataCatalog load/save logging instead
  of manually adding markup to the message, preventing markup tags from
  leaking into non-Rich handlers like file logs
- Remove _use_rich_markup attribute and unused imports from data_catalog.py
- Update tests to expect escaped brackets in node string representation

Signed-off-by: Br1an67 <932039080@qq.com>
@Br1an67 Br1an67 force-pushed the fix/issue-5199-rich-markup-escaping-2 branch 2 times, most recently from cbfa9bf to 5e22563 Compare March 18, 2026 00:27
@noklam noklam self-requested a review March 18, 2026 17:17
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#4496

Is this a regression from the original fix? Or is this a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rich logging integration mishandles Kedro node brackets and markup escaping

3 participants