FACTS Common Attributes: national ingest → public-facts (#299)#319
Open
kpdavi wants to merge 2 commits into
Open
FACTS Common Attributes: national ingest → public-facts (#299)#319kpdavi wants to merge 2 commits into
kpdavi wants to merge 2 commits into
Conversation
…b#299) Ingest the USFS FACTS Common Attributes layer (Forest Service Activity Tracking System) as a single national dataset merged from all 9 regions (01-06, 08, 09, 10), downloaded from the FS EDW on 2026-06-24. Pipeline (catalog/facts/k8s/common-attributes-2026-06/): - stage-raw.yaml setup public-facts bucket + mirror 9 GeoPackage zips to raw/ - schema-check.yaml verify schema consistency across regions (all 9 identical, 109 fields) + ADMIN_REGION reliability (1 value/file, 01..10) - merge.yaml convert+merge → national GeoParquet (7,324,720 rows), reproject NAD83→EPSG:4326, row-group-size 2000 (stoi-safe) - *-hex.yaml H3 res 10 (parents 9,8,0); chunk-size 10000 × 733 completions = full coverage of 7.32M rows (NOT the generator default 1000, which would have dropped ~5.3M features) - *-repartition.yaml merge chunks → hex/ by h0 (120Gi RAM + PVC scratch) - *-pmtiles.yaml PVC-backed; curated 21-field tile subset (tile-accurate boettiger-lab#283) - gen_stac.py/_codes.json STAC generator (passes lint-stac-categorical & lint-stac-pmtiles-fields) Region queryability uses the native ADMIN_REGION (verified 01..10, one value per source file) — no separate region column stamped. STAC/README published to NRP S3 (public-facts); registered public-facts as a top-level child of the root catalog. license = public-domain. Sync: - sync-public-facts.yaml MinIO private mirror (new bucket) - source-sync-facts.yaml + scope source.coop (public-domain, license-clear); cboettig/facts repo creation pending (see new-repos.md) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…value sets verify-stac CI flagged ingested values '0' and 'U' missing from the declared values arrays (HARD values-incomplete). Declared sets now match ingested data. Republished to s3://public-facts/common-attributes-2026-06/stac-collection.json. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #299.
Ingests the USFS FACTS Common Attributes layer as a single national dataset merged from all 9 USFS regions (01–06, 08, 09, 10), downloaded from the FS EDW on 2026-06-24. Data, STAC, and README are already published to NRP S3 (
public-facts) and registered in the root catalog; this PR commits the pipeline manifests + sync wiring.Result (live on
public-facts)common-attributes-2026-06.parquet— 7,324,720 rows, EPSG:4326 (reprojected from NAD83),ADMIN_REGION01–10common-attributes-2026-06.pmtiles— source-layercommon-attributes-2026-06, 21 curated tile fields (tile-accurate per STAC: PMTiles vector assets missing tile-level column schema + nodata hints (SVI, conservation-almanac) #283)license: public-domain; passeslint-stac-categorical+lint-stac-pmtiles-fieldson the live URL; root →public-facts→ dataset traversal verifiedKey decisions (recorded on #299)
ADMIN_REGION(verified one value per source file, exactly 01–10) — no separate column stamped--row-group-size 2000to avoid the DuckDB httpfsstoicrashPipeline (
catalog/facts/k8s/common-attributes-2026-06/)stage-raw→schema-check→merge→hex→repartition(+pmtiles), plusgen_stac.py/_codes.json(STAC generator).Sync
sync-public-facts.yaml— MinIO private mirror (new bucket)source-sync-facts.yaml+ scope/cron-config — source.coop (public-domain, license-clear).cboettig/factsrepo must be created in the source.coop web UI before the weekly cron mirrors it (seenew-repos.md).🤖 Generated with Claude Code