Skip to content

Improve metadata processing #2

@VladimirShitov

Description

@VladimirShitov
  • In Stephenson dataset:
    • column Smoker has a value Not_known, should be NA
    • column Status_on_day_collection has levels with 1 and 2 observations, should be merged or set to NA (alternatively, this column can be dropped as there is Status_on_day_collection_summary)
    • column Outcome has level unknown, should be set to NA
  • In COPD dataset:
    • Column Bronchodilator_Use has small categories, should be renamed or merged
  • In HLCA dataset:
    • Column cause_of_death has a lot of small categories, should be grouped
    • Column lung_condition has some small categories
    • Column lung_condition has category Healthy (tumor adjacent), which should be renamed to Healthy with tumor information contained elsewhere
    • Column lung_condition contains COVID severity levels, might be better to group them together (done in disease) and move severity to a separate column
    • Column sequencing_platform has small categories and one mixture of two categories
    • Column smoking_status has a small category hist of marijuana use, which should probably be merged with active
    • Column assay has small categories
    • Column sex has category unknown, should be set to NA or predicted
    • Column self_reported_ethnicity has small categories, might make sense not to use it at all

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions