Skip to content

BDQ Review - Actions and Comments: Test Suite #319

@Tasilee

Description

@Tasilee

These are a list of actionable items from the BDQ Review 2025-12-31 Review for @Tasilee, @ArthurChapman, @tucotuco, @chicoreus and @ymgan to discuss and address before resubmission of BDQ.

Because Issue #309 was too large; Action items have been extracted and Moved into sub-issues. This forms one of a number of sub-issues from #309.

Basics: Base documents under https://github.com/tdwg/bdq/tree/master/tg2/_build_review.

A number of issues may be duplicated across topics and are listed at the bottom of #309.

Test Suite (Paul and Lee)

  1. (Addressed in Draft User's Guide) Sophie Parmelon (p. 18 Line 657) Provide more use cases (or more exhaustive ones?) Comment: AC: Partially addressed by adding some words in the User's Guide and making note on how to request new Use Case - Does this need to be added elsewhere as well (Implementer's Guide? Lee: I think the UG is the place.
  2. (Done - No change - explanation below) Sophie Parmelon (p. 19 Line 662) ISSUE_COORDINATES_CENTEROFCOUNTRY
    1. should there be more terms in the “Issue” type? (e.g. : issue_coordinates_centerof[administrativeregion], or issue_coordinates_centerof[grid], the latter being a common issue within gridded datasets)
      1. rukaya: specify
      2. melissa: common issue in Taiwan is that we have many records located in the center of the mountain but they are not mountain species. It might be because some foreigners came and collected some data but did not do the georeferencing, so they just choose a central location as the coordinates.
      3. all: explain why this test is necessary + add other geographical issues tests
        Comment AC: All this is doable, but there are many difficulties. We could discuss, but I think we just need a few words as to the difficulty of doing this. Center of Grid would be easiest. Probably suggest that could be passed to future tests i.e. Supplementary. Lee: In BDQ Tests and Assertions document: "Data quality’ is an evolving field, and there will always be “one more test” that could be conceived. To remain an effective standard, BDQ focuses on a suite of tests that have reached community consensus. By mapping these tests across key Darwin Core classes and terms, we provide high coverage for the data that researchers use most, while leaving the door open for future versions to incorporate new innovations.
  3. (Done - addressed in Supplementary Document - note added to Draft User's Guide and elsewhere) Sophie Parmelon (p. 19 Line 676) AMENDMENT_DATEIDENTIFIED_STANDARDIZED
    1. are non-latin alphabets taken into account?
    2. same question goes for unusual date values (e.g. French revolution months and days) AC: Note that this is addressed in Section 4.1 of the BDQ Supplementary Information document - "Countries and researchers have changed from the Julian calendar to the Gregorian calendar at different times. For example, Russia adopted the Gregorian Calendar on the date 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805.
  4. (Done - addressed in Supplementary Document - note added to Draft User's Guide and elsewhere) Sophie Parmelon (p. 19 Line 680) AMENDMENT_EVENT_FROM_EVENTDATE
    1. are non-latin alphabets taken into account?
    2. same question goes for unusual date values (e.g. French revolution months and days) AC: Note that this is addressed in Section 4.1 of the BDQ Supplementary Information document - "Countries and researchers have changed from the Julian calendar to the Gregorian calendar at different times. For example, Russia adopted the Gregorian Calendar on the ISO date 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805.
  5. (Done - addressed in Supplementary Document - note added to Draft User's Guide and elsewhere) Sophie Parmelon (p. 19 Line 684) VALIDATION_MONTH_STANDARD
    1. are non-latin alphabets taken into account?
    2. same question goes for unusual date values (e.g. French revolution months and days) AC: Note that this is addressed in Section 4.1 of the BDQ Supplementary Information document - "Countries and researchers have changed from the Julian calendar to the Gregorian calendar at different times. For example, Russia adopted the Gregorian Calendar on the ISO date 1918-02-14, the British Empire in 1752-09-14, different regions in France between 1582 and 1760, with France also adopting the French Republican Calendar 1793-1805.
    3. not sure how latin upper case months are managed (from botany & entomology in particular, e.g. 2025-IX-04); BDQ example is [dwc:month="10": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:month is in range"],
      [dwc:month="v": Response.status=RUN_HAS_RESULT, Response.result=NOT_COMPLIANT, Response.comment="dwc:month is ambiguous as "v" or "5""]
    4. But upper case Latin months might be consistently interpreted as indo-arabic numerals?
    5. Edit: answered here: https://github.com/FilteredPush/event_date_qc/blob/f224e5a1e6db81bc6ca725f520dd06a71fcfb54e/src/test/java/org/filteredpush/qc/date/DwcEventDQTest.java#L747
    6. (although I’d put zeros before 1-9 months to comply with the ISO8901 specifications)
  6. Sophie Parmelon (p. 19 Line 698) VALIDATION_DAY_STANDARD
    1. (I’d put zeros before 1-9 months to comply with the ISO8901 specifications)
  7. Sophie Parmelon (p. 20 Line 702) VALIDATION_BASISOFRECORD_STANDARD
    1. I feel like there should also be an ISSUE_BASISOFRECORD_OCCURRENCE term when BoR is “occurrence” (meaning we don’t know whether the occurrence is a specimen/literature or living individual
    2. cross this test with other terms values validation? (eg establishmentMeans)
    3. general discussion about the usefulness of BoR
  8. (DONE) Rukaya Johaadien (p. 13 Line 444) Why is VALIDATION_SCIENTIFICNAME_FOUND (also AMENDMENT_SCIENTIFICNAME_FROM_SCIENTIFICNAMEID, VALIDATION_TAXON_NOTEMPTY) in bdq:Spatial-Temporal_Patterns. Ok I think I see what’s happened, you sometimes have some tests which are ‘core’ maybe and should always be run no matter the use case? Like there are some location checks like VALIDATION_DECIMALLATITUDE_INRANGE are in bdq:Taxon-Management which just doesn’t make sense. Maybe you should make an explicit ‘core’ use case category instead. Response: #310
  9. (Addressed in Draft User';s Guide) From Minutes (p. 37 Line 1338) Stephen: unclear how to construct (new) use cases Comment: AC: A new section has been added to the User's Guide on requesting a new Use Case via BDQ Maintenance Group
  10. (Addressed by explanation - see comment below) Rukaya Johaadien (p. 13 Line 452) MEASURE_EVENTDATE_DURATIONINSECONDS - this is weird, all the other measures are other things, why have this? Response: #312 @chicoreus wrote: This is a very simple and CORE test. It allows the user to identify which records have enough temporal precision for their needs. Some uses, such as phenology, require temporal precision for observations/gatherings to about one day, some uses, such as examining patterns of global change over decades can have a much more relaxed temporal precision, of perhaps to about a year or better. It is an ideal example of a measure on a single record. In this it is odd in the set of tests that we have defined, but in the broader scheme of possible tests is is a very good exemplar of a single record measure.
  11. Rukaya Johaadien (p. 13 Line 454) In Notes for VALIDATION_BASISOFRECORD_STANDARD “... This Test must return NOT_COMPLIANT if there is leading or trailing whitespace or there are leading or trailing non-printing characters.” bit weird to have it only in Notes and not in ExpectedResponse or Description?
  12. Rukaya Johaadien (p. 13 Line 458) Examples for VALIDATION_OCCURRENCEID_NOTEMPTY Should not mention guid structure because iNat urls do not have GUIDs: - [dwc:occurrenceID="https://www.inaturalist.org/observations/43047701": Response.status=RUN_HAS_RESULT, Response.result=COMPLIANT, Response.comment="dwc:occurrenceID conforms to GUID structure"]...
  13. Rukaya Johaadien (p. 13 Line 463) amendment_month_standardized: Link to Specification Source Code mentions “Internals of recognized string values (roman numerals, month names and abbreviations in multiple languages) use a combination of event_date_qc's DateUtils.cleanMonth()” and the unit tests in FilteredPush do indeed check abbreviations of months + month names in multiple languages, but it doesn’t mention these requirements in the notes or description. So at the moment it’s a bit confusing, should the test propose these as amendments or not? If they should it should be in the notes and description. Also it’s weird that their example gives “October” as NOT_AMENDED when they say it should be “AMENDED” if “the value of dwc:month if it can be unambiguously interpreted as an integer between 1 and 12 inclusive” [dwc:month="October": Response.status=NOT_AMENDED, Response.result=, Response.comment="dwc:month contains an uninterpretable value"]. I can see why they’ve said that, but it’s disingenuous and kind of impractical and divorced from the real world. DwC says month = “The integer month in which the dwc:Event occurred”, so why not have October interpreted as 10?
  14. Rukaya Johaadien (p. 14 Line 492) My one comment regarding structure is that I think Response.status should be:
    1. AMENDED - Proposed standardized/corrected value
    2. NOT_AMENDED - No unambiguous amendment proposed
    3. FILLED_IN - Populated a missing value
    4. EXTERNAL_PREREQUISITES_NOT_MET - External service unavailable
    5. INTERNAL_PREREQUISITES_NOT_MET - Input missing/invalid for the test
    6. AMBIGUOUS - Inputs produce ambiguous outcome (no amendment)
    7. COMPLIANT - (with Response.result as None/empty)
    8. NOT_COMPLIANT - (with Response.result as None/empty)
    9. POTENTIAL_ISSUE - (with Response.result holding the potential issue, Response.comment should have any extra info in)
    10. NOT_ISSUE - (with Response.result as None/empty or with details of why it’s not an issue)
      I.e. not include RUN_HAS_RESULT - Completed run with a result because what’s the point? It doesn’t fit in anyway? Also it makes things slightly harder in terms of programmatic implementation.
  15. Rukaya Johaadien (p. 15 Line 515) Recommendation for ISSUE_COORDINATES_CENTEROFCOUNTRY to have more descriptive POTENTIAL_ISSUE comment, e.g. instead of “coordinates fall within buffered distance in the bdq:sourceAuthority for dwc:countryCode”, say “coordinates fall within buffered distance in the bdq:sourceAuthority for dwc:countryCode, which may mean coordinates have been generalised”
  16. @ArthurChapman Add "References" to the Tests and to the "Key to Vocabulary Terms"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions