Skip to content

Fragility: Dependency on exact PyArrow error message format in CSVHandler #48

@debanshd

Description

@debanshd

The CSVHandler in src/croissant_maker/handlers/csv_handler.py uses a regular expression _ARROW_COL_RE to parse column indices from PyArrow parsing errors:

_ARROW_COL_RE = re.compile(r"In CSV column #(\d+): CSV conversion error to (\w+)")

This regex is highly dependent on the specific error message format of PyArrow. If PyArrow updates its error messages, this regex will fail to match, potentially leading to silent failures or incorrect error reporting to the user. It would be more robust to use structured error information if available from PyArrow or provide a fallback mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions