Update fars table by theja · Pull Request #8 · tooledesign/Safer-Streets-Priority-Finder

theja · 2023-08-29T16:34:01Z

@tariqshihadah @Jacob816
Here is the code for updating the FARS data in the SSPF database. I have not replaced the final table yet, but the updated FARS table is saved to scratch._tmp_fars_processed_2017_2021. Once you have a chance to look at this, I can copy it over to swap out the static.fars_processed table

Jacob816 · 2023-08-30T16:57:39Z

general/update_fars/01_download_fars.py

+import os
+
+# place to save data locally
+data_folder = "/mnt/c/Users/tputta/OneDrive - Toole Design/Desktop/SSPF/FARS"


It probably doesn't matter, but since this is a public repo, do you think that you should put your local filepath in here? Maybe instead make a data folder either in the top level repo directory or just in this folder, and save it there?

Since it is a one-time thing I didn't bother creating a data folder and adding it to gitignore.

Jacob816 · 2023-08-30T17:02:33Z

general/update_fars/01_download_fars.py

+    if not os.path.exists(f"{data_folder}/{yr}"):
+        os.mkdir(f"{data_folder}/{yr}")
+    request.urlretrieve(f"https://static.nhtsa.gov/nhtsa/downloads/FARS/{yr}/National/FARS{yr}NationalCSV.zip", f"{data_folder}/{yr}/FARS{yr}NationalCSV.zip")
+    yr += 1


Why are we incrementing yr here? Won't it just go to the next value once it goes back to the top of the loop?

I started with while loop and this is a vestige of that that I forgot to remove. It does not really harm anything in this case, but I will remove it

Jacob816 · 2023-08-30T17:25:57Z

general/update_fars/02_combine_csv.py

+from io import StringIO
+
+# get environment variables
+load_dotenv("rds_conn_vars.env")


Is there documentation somewhere that says how to create this? If not, should we add it here?

The .env file itself is not very different from how we define variables in a shell script. There are plenty of resources online for the proper syntax of it and it is pretty straightforward. I don't think we need to add any additional documentation details for this.

tariqshihadah · 2023-08-30T17:35:55Z

As new years of FARS data comes out, what would be the workflow for updating these scripts? Can year range variables be pulled out to the top of files or separated into adjacent JSON or .py files as global variables to simplify updates?

Jacob816 · 2023-08-30T17:41:05Z

As new years of FARS data comes out, what would be the workflow for updating these scripts? Can year range variables be pulled out to the top of files or separated into adjacent JSON or .py files as global variables to simplify updates?

Assuming that the URL format stays the same, maybe just test to see what the highest year's URL that is valid?

Safer-Streets-Priority-Finder/general/update_fars/01_download_fars.py

Line 14 in aa39eb2

    
           request.urlretrieve(f"https://static.nhtsa.gov/nhtsa/downloads/FARS/{yr}/National/FARS{yr}NationalCSV.zip", f"{data_folder}/{yr}/FARS{yr}NationalCSV.zip")

theja · 2023-08-30T17:48:49Z

As new years of FARS data comes out, what would be the workflow for updating these scripts? Can year range variables be pulled out to the top of files or separated into adjacent JSON or .py files as global variables to simplify updates?

Assuming that the URL format stays the same, maybe just test to see what the highest year's URL that is valid?

Safer-Streets-Priority-Finder/general/update_fars/01_download_fars.py

Line 14 in aa39eb2

request.urlretrieve(f"https://static.nhtsa.gov/nhtsa/downloads/FARS/{yr}/National/FARS{yr}NationalCSV.zip", f"{data_folder}/{yr}/FARS{yr}NationalCSV.zip")

Yes, we could set it up to be more dynamic, but since this is going to be no more than a once a year update, I just kept it simple. All we need is to just change the start_yr and end_yr variables in the python scripts and it should be good to go with minimal effort.

Jacob816 · 2023-08-30T18:46:05Z

general/update_fars/03_process_fars.py

+                        SELECT
+                            st_case,
+                            year,
+                            array_agg(per_typ::INT) AS per_typ


Might be overkill, but it could be worth replicating the logic shown in Table 3-39 on pg 560 of https://crashstats.nhtsa.dot.gov/Api/Public/ViewPublication/813417, since the per_type code changes little year to year.

Also, according to that table, we might want to assign 'motor vehicle' to the 'Driver' (1) and 'Passenger' types (2 and 9) and remove 3 as its listed as 'Other non-occupant'.

FYI, ran the numbers, the removing 3 and adding 9 to assign 'motor vehicle' removes all instances of 'other' and assigns them to 'motor vehicle'. For 2015-2019 this was 167 crashes (0.10% of all crashes), and for 2017-2021 it was 133 crashes (0.08% of all crashes). So pretty minor, but still might be worth changing.

Jacob816 · 2023-08-30T18:49:02Z

general/update_fars/03_process_fars.py

+            )
+            ;
+
+            ALTER TABLE automated.fars_processed_{start_yr}_{end_yr} ADD pkey SERIAL PRIMARY KEY;


Since the unique ID is comprised on st_case which is the 2 letter state code + sequential case number (which resets each year) and crash_year, should the primary key be (st_case, crash_year)?

Jacob816 · 2023-08-30T18:59:23Z

general/update_fars/04_backup_old_fars_data.py

+cur = conn.cursor()
+
+query = f"""
+        CREATE TABLE static.fars_processed_2015_2019_backup (LIKE static.fars_processed INCLUDING ALL);


If you use INCLUDING ALL when making the table, will that make static.fars_processed_2015_2019_backup a dependent and create an issue if static.fars_processed is deleted when the new dataset is uploaded in the next script?

Jacob816 · 2023-08-30T19:03:53Z

general/update_fars/05_copy_fars_from_tdg_server.sh

+# Export env vars
+export $(grep -v '^#' rds_conn_vars.env | xargs)
+
+ogr2ogr \


Is this script copything things from our server to the SSPF server just meant for testing, or is it meant to be how its going to be done going forward?

Jacob816 · 2023-08-30T19:06:19Z

@theja Okay, I finished reviewing my review of things. In gerenal it's good, but I had a few specific questions that I left comments about.

theja added 8 commits August 27, 2023 21:38

gitignore

81b339e

reqs

d0a4e18

download fars data

3e1a34a

combine csv and load to db

dd25446

process fars in db

41d335d

backup 2015_2019 FARS table

2d19296

TDG server to AWS

19f80c6

sql to python

462b150

theja requested review from Jacob816 and tariqshihadah August 29, 2023 16:34

Jacob816 reviewed Aug 30, 2023

View reviewed changes

HughKelley force-pushed the update_fars_table branch from 14f6d57 to 462b150 Compare May 13, 2025 11:28

HughKelley force-pushed the main branch from 6474afb to 766fae1 Compare May 13, 2025 11:28

Conversation

theja commented Aug 29, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tariqshihadah commented Aug 30, 2023

Uh oh!

Jacob816 commented Aug 30, 2023

Uh oh!

theja commented Aug 30, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jacob816 Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jacob816 Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jacob816 commented Aug 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jacob816 Aug 30, 2023 •

edited

Loading

Jacob816 Aug 30, 2023 •

edited

Loading