Skip to content

Comma-delimited data within json elementΒ #22

@hatmandu

Description

@hatmandu

Thanks for your fantastically useful json2csv script - I've been using it to parse data from OpenLibrary dumps. It's working very well, even though the OL data is very inconsistently structured. One question, though, if I may...

In a case where there are commas within an item, eg

{"subjects": ["Books and reading -- Fiction.", "Storytelling -- Fiction.", "Death -- Fiction.", "Jews -- Germany -- History -- 1933-1945 -- Fiction."]}

json2csv appears to strip out the commas within the value, so the four different subjects all get merged into one. It comes out like this for -k subjects:

[Books and reading -- Fiction. Storytelling -- Fiction. Death -- Fiction. Jews -- Germany -- History -- 1933-1945 -- Fiction.]

Is there a straightforward way to get it to preserve those multiple items within a value? (I don't need them as separate fields in the CSV, but would like to preserve the distinction within the 'subjects' field, if you see what I mean - so they could be delimited by something other than a comma.)

(I tried using the -d flag to set a different field delimiter, e.g. semicolon, but it still stripped out the commas as above.)

Edit: another example...
"subject_places": ["United States", "China"]
comes out as
[United States China]
so it's not really practical to find some automated way of parsing that alas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions