This project explores the relationship between SAT performance and the poverty level of students in California high schools. Poverty level is approximated by the percentage of students eligible for free and reduced-price lunch (FRPM). Using the provided datasets, the project analyzes correlations between SAT scores and poverty indicators and visualizes these relationships with various plots.
The analysis uses two main datasets:
- SAT Scores Data (2014): Contains SAT performance details, including the number of test takers and average scores in reading, writing, and math.
- FRPM Data (2014): Contains the percentage of students eligible for free and reduced-price lunch, a common proxy for poverty.
The script performs the following steps:
-
Loading Data:
- SAT and FRPM data are loaded into separate pandas DataFrames.
-
Filtering SAT Data:
- Schools with fewer than 20 SAT test takers are excluded from the analysis to ensure the reliability of the results.
-
Warmup Analysis:
- Correlation between SAT reading and writing scores is visualized using a scatter plot.
- Correlation between SAT writing and math scores is similarly visualized.
-
Merging Datasets:
- The SAT and FRPM datasets are merged on the
cdscolumn, which uniquely identifies each school.
- The SAT and FRPM datasets are merged on the
-
Analyzing Relationship Between Poverty and SAT Scores:
- A scatter plot is created to analyze the relationship between the percentage of students eligible for FRPM and the percentage of students scoring above 1500 on the SAT.
- A trendline is added to visualize the overall trend in the data.
-
Distribution Analysis:
- Histograms are plotted to visualize the distribution of SAT scores and FRPM eligibility across schools.
- A comparison of SAT scores is made between schools with less than 10% FRPM eligibility and those with 10% or more.
The following visualizations are generated:
- Scatter Plots:
- Reading vs. Writing SAT Scores: Highlights the strong correlation between these two scores.
- Scatter Plot with Trendline:
- Poverty vs. SAT Scores: Shows the relationship between poverty levels (FRPM eligibility) and the percentage of students scoring 1500 or higher on the SAT.
- Histograms:
- SAT Scores Distribution: Visualizes the distribution of SAT scores across schools.
-
Clone the Repository:
git clone <repository_url> cd <repository_folder>
-
Install Required Libraries:
pip install pandas matplotlib numpy
-
Run the Analysis:
- Open the script or notebook containing the analysis in your preferred Python IDE (e.g., Jupyter Notebook) and execute the code.
The analysis suggests a negative relationship between poverty levels and SAT performance in California high schools. Schools with higher percentages of students eligible for FRPM tend to have lower percentages of students scoring above 1500 on the SAT.





