Update FEMA NRI analysis with new version of the data#879
Update FEMA NRI analysis with new version of the data#879pmarchand1 wants to merge 1 commit intodevelopfrom
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request updates the exploratory data analysis of FEMA's National Risk Index for San Francisco, incorporating a newer version of the data and providing a more detailed explanation of the earthquake risk estimation methodology. It also introduces a new analysis that aggregates earthquake risk data by San Francisco neighborhoods, correlating it with renter occupancy rates to offer a more localized and socially nuanced perspective on earthquake vulnerability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request significantly updates the EDA of FEMA earthquake risk index for San Francisco. The eda_fema_nri.Rmd document now includes more detailed documentation on the Hazus model for earthquake EAL estimation, introduces new per-capita and per-building-value loss ratios (EALP_ratio, EALB_ratio), and refines the commentary. A new R Markdown document, nri_neighborhoods.Rmd, has been added to aggregate the NRI data by San Francisco neighborhoods, integrate renter/owner occupancy data, and visualize the relationship between building loss ratio and renter percentage. The review comments point out several potential division-by-zero issues in the calculations for EALP_ratio, EALB_ratio, rent_pct, and EALB_PCT that could lead to Inf or NaN values, and suggest making the neighborhood data filename more maintainable.
| ### Expected damage to people | ||
|
|
||
| ```{r, fig.width = 10} | ||
| nri <- mutate(nri, EALP_ratio = ERQK_EALP / ERQK_EXPP) |
There was a problem hiding this comment.
The calculation of EALP_ratio could result in division by zero if ERQK_EXPP is 0 for any census tract (e.g., for parks). This would produce Inf values, which might cause issues in plotting or subsequent analyses. It's safer to handle this case explicitly.
nri <- mutate(nri, EALP_ratio = if_else(ERQK_EXPP > 0, ERQK_EALP / ERQK_EXPP, 0))| ### Expected damage to buildings | ||
|
|
||
| ```{r, fig.width = 10} | ||
| nri <- mutate(nri, EALB_ratio = ERQK_EALB / ERQK_EXPB) |
There was a problem hiding this comment.
The calculation of EALB_ratio could result in division by zero if ERQK_EXPB (building exposure value) is 0 for any census tract. This would produce Inf values. To prevent potential issues in subsequent steps, it's better to handle this division by zero.
nri <- mutate(nri, EALB_ratio = if_else(ERQK_EXPB > 0, ERQK_EALB / ERQK_EXPB, 0))| The file matching each census tract to one of 42 analysis neighborhoods was downloaded from DataSF ([link](https://data.sfgov.org/Geographic-Locations-and-Boundaries/Analysis-Neighborhoods-2020-census-tracts-assigned/sevw-6tgi)). | ||
|
|
||
| ```{r, message = FALSE} | ||
| neigh <- read_csv("Analysis_Neighborhoods_-_2020_census_tracts_assigned_to_neighborhoods_20251118.csv") |
| "own", "rent")) %>% | ||
| select(TRACT, variable, estimate) %>% | ||
| pivot_wider(names_from = variable, values_from = estimate) %>% | ||
| mutate(rent_pct = rent / (rent + own) * 100) |
There was a problem hiding this comment.
The calculation for rent_pct could result in division by zero if rent + own is 0 (i.e., a tract with no households). Although you filter for population, this doesn't guarantee households are present. It's safer to add a check to prevent NaN values.
mutate(rent_pct = if_else(rent + own > 0, rent / (rent + own) * 100, 0))| nri_neigh <- inner_join(nri, neigh) %>% | ||
| inner_join(own_rent) %>% | ||
| select(TRACT, neighborhood, POPULATION, RENT_PCT = rent_pct, ERQK_EALB, ERQK_EXPB) %>% | ||
| mutate(EALB_PCT = ERQK_EALB / ERQK_EXPB * 100) |
There was a problem hiding this comment.
| group_by(neighborhood) %>% | ||
| summarize(TOTAL_POP = sum(POPULATION), | ||
| RENT_PCT = weighted.mean(RENT_PCT, POPULATION), | ||
| EALB_PCT = sum(ERQK_EALB) / sum(ERQK_EXPB) * 100) |
There was a problem hiding this comment.
When summarizing by neighborhood, sum(ERQK_EXPB) could be zero if a neighborhood consists of tracts with no building value (e.g., a neighborhood that is entirely a park). This would cause a division-by-zero error. It's safer to add a check for this possibility.
EALB_PCT = if_else(sum(ERQK_EXPB) > 0, sum(ERQK_EALB) / sum(ERQK_EXPB) * 100, 0))
No description provided.