Skip to content

Malformed Data in Courses JSON #688

@noschiff

Description

@noschiff

Our script to populate full-courses.json takes the data as is from the API, and this can lead to issues when our code tries to use these courses.

First, many courses don't have data in the way we expect it to for catalogWhenOffered, such as "Fall or Spring." or "Fall, spring." Second, some strings have NBSPs (Non-Breaking Space), particularly for the catalogComments field. NBSPs are almost never a good idea to use. At best, removing them will avoid formatting issues, and at worst, nothing would change.

I see two primary paths to take with fixing this. We can either implement "cleaning" into our courses-json-generator.ts script to (attempt to) fix the semesters offered and replace NBSPs with spaces, or we can improve our code to verify semesters to accept "weirder" data. I strongly prefer the former, at least for the NBSPs, because this will allow any code that uses the semesters offered to be free to do its job without checking for irregularities. We can also create a script to go through full-courses.json afterwards to clean the data, but it seems way simplier to just do it as we get it from the API.

One issue with trying to fix the semesters in the JSON is that there are some values that we can't handle yet, such as 7-week courses. This is good info to have for the user (I believe its used in the bottom bar?), so I wouldn't want to turn Fall (weeks 1-7) into Fall, but the additional information about the weeks messes with our semester validation. I think we could solve this by storing a field of just the season in full-courses.json while keeping the current catalogWhenOffered field for the user to see. That way, we can provide useful information without having to constantly handle bad data throughout the code. We should, of course, clean the catalogWhenOffered as best as we can to fix capitalzation and overall make it adhere to a standard.

If we make this change, we need to be careful to deal with courses already in a user's plan. It seems that courses already in a user's plan won't show the updated seasons after being changed in full-courses.json.

Before fixing the lowercase s in spring:
example course

After fixing it, the old course is unchanged, but a new one is right:
new
The warning for the fixed course is for it being a duplicate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions