Skip to content

Strings encased in quotes are not properly unescaped/trimmed #79

@domints

Description

@domints

Example feed: https://otwartedane.metropoliagzm.pl/dataset/rozklady-jazdy-i-lokalizacja-przystankow-gtfs-wersja-rozszerzona/resource/290298ce-944b-4744-8f92-29ab2b786a33

Essentially CSV deserializer is not properly treating strings that are encased in quotation marks ("). I saw that was a problem with colors in version 1.7, in this 3.0 beta colors are fine, but now it's a problem with block_id field.
Yes, but maybe it doesn't always make sense to have quotation marks within ID, well, it's an ID, but also GTFS docs say:

ID - An ID field value is an internal ID, not intended to be shown to riders, and is a sequence of any UTF-8 characters.
Using only printable ASCII characters is recommended.

So it technically can contain it. Also, Busman, scheduling system widely used in Poland seems to encase any string in quotation marks, which breaks this lib.

I'd suggest treating any string-like field as a string, and if it's enclosed in quotation mark handle it properly. Doesn't this lib have reference to any well known, well tested CSV deserialization library?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions