GITHUB PAGE
https://ballaghjoshua.github.io/
Last Updated 11/24/2020
Last year, the Louisiana Department of Education released a statewide results one-pager noting positive trends in statistics meant to measure the strength of Louisiana's public school system. From 2012 to 2019, statewide ACT scores and graduation rates have been trending upwards (with the exception of 2019 ACT Scores, which saw a small dip).
The Department of Education states that Louisiana is well on track to meet its 2025 education goals, which are centered around raising the standards for 'A' rated schools. By 2025, such schools will have:
Given the Department's optimism at positive statewide trends in recent years, we ask:
That is, are certain groups, either geographic or demographic, improving more than others? If so, which ones? What variables might correlate with greater improvement?
Seeking to answer this question, we have analyzed data made available by the Louisiana Department of Education, as well as some economic data from the US Department of Agriculture. We will examine School Performance Scores (SPS) and ACT Scores in recent years across each of Louisiana's 64 parishes; we will then examine ACT scores over the years across a few key demographic subgroups. In doing so, we hope to determine whether or not any group has been improving at a greater rate than any other group in recent years.
See the project assignment on the course webpage here: https://nmattei.github.io/cmps3160/projects/FinalTutorial/
US Department of Agriculture - data on parish attributes: https://www.louisianabelieves.com/resources/library/data-center
Louisiana Department of Education - data on school metrics and student attributes: https://www.louisianabelieves.com/resources/library/data-center
US Census - data used to map Louisiana using Geopandas: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
I. Setup
A. Import Libraries
B. Loading/Tidying Parish Attribute Data
II. SPS Exploration
A. Loading/Tidying SPS Data
B. SPS Through the Years
C. Analyzing SPS By School
III. ACT Exploration
A. Loading/Tidying ACT Data
B. ACT Through the Years
C. ACT Correlation with Parish Characteristics
D. ACT Distributions
IV. Subgroup Exploration
A. Loading/Tidying Subgroup Data
B. Visualizing the Achievement Gap
V. Conclusion
A. Summary of Findings
B. Limitations and Further Work
In this project, we utilize the following libraries:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
import geopandas as geo
from geopandas import GeoDataFrame
In this section we will create two dataframes for later use in our analysis: 1) parishes_geo, a GeoFrame that maps Louisiana by parish, and 2) ParishAttrbitues, a dataframe that will store data from the USDA and Department of Education on each parish's median household income (2018) and percent of students classified as economically disadvantaged (2020).
Though these datasets are not from the same year, they are the most recent data we could find.
There are 64 parishes in Louisiana, but there are more school systems. Why is this? Well, every parish has its own school system, but not every school system belongs to a parish. Special schools that operate apart from a parish's school system, such as magnet schools like the Louisiana School for Math, Science, and the Arts (LSMSA, of which Josh is an alumni!), are coded as their own school systems in the state's datasets. For the most part, our analysis on a parish-by-parish basis will only consider schools within a parish school system. Though LSMSA is in Natchitoches, it will not appear as part of Natchitoches's data!
As per the Louisiana Board of Elementary and Secondary Education, a student is determined to be “Economically Disadvantaged” if (s)he meets any one of the following criteria:
i. Is eligible for Louisiana’s food assistance program for low-income families.
ii. Is eligible for Louisiana’s disaster food assistance program.
iii. Is eligible for Louisiana’s program for assistance to needy families with children to assist parents in becoming self-sufficient.
iv. Is eligible for Louisiana’s healthcare program for families and individuals with limited financial resources.
v. Is eligible for reduced price meals based on the latest available data.
vi. Is an English Language Learner.
vii. Is identified as homeless or migrant pursuant to the McKinney-Vento Homeless Children and Youth Assistance Act and the Migrant Education Program within the Elementary and Secondary Education Act.
viii. Is incarcerated with the Office of Juvenile Justice or in an adult facility.
ix. Has been placed into custody of the state
# Load in shapefile for geographic analysis later
parishes_geo = geo.read_file('cb_2018_22_unsd_500k/cb_2018_22_unsd_500k.shp')
parishes_geo['NAME'] = parishes_geo['NAME'].str[:-16]
# A look at our GeoFrame
parishes_geo
STATEFP | UNSDLEA | AFFGEOID | GEOID | NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|
0 | 22 | 00270 | 9700000US2200270 | 2200270 | Bossier Parish | 00 | 2176379782 | 70298833 | POLYGON ((-93.84522 32.95043, -93.84380 32.952... |
1 | 22 | 00360 | 9700000US2200360 | 2200360 | Caldwell Parish | 00 | 1371231229 | 29249902 | POLYGON ((-92.31191 32.07630, -92.31222 32.146... |
2 | 22 | 01710 | 9700000US2201710 | 2201710 | Tensas Parish | 00 | 1561204508 | 99471076 | POLYGON ((-91.57961 31.87189, -91.57569 31.875... |
3 | 22 | 01680 | 9700000US2201680 | 2201680 | Tangipahoa Parish | 00 | 2049488101 | 136678798 | POLYGON ((-90.56735 30.78548, -90.56717 30.824... |
4 | 22 | 02010 | 9700000US2202010 | 2202010 | Winn Parish | 00 | 2460711980 | 17448075 | POLYGON ((-92.97603 31.71242, -92.97302 31.720... |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
64 | 22 | 00180 | 9700000US2200180 | 2200180 | Beauregard Parish | 00 | 2997555226 | 21945878 | POLYGON ((-93.73827 30.40396, -93.73563 30.406... |
65 | 22 | 00120 | 9700000US2200120 | 2200120 | Assumption Parish | 00 | 877029986 | 67099711 | POLYGON ((-91.25988 29.98798, -91.25939 30.000... |
66 | 22 | 00039 | 9700000US2200039 | 2200039 | Zachary Community | 00 | 218168795 | 8034994 | POLYGON ((-91.31649 30.59000, -91.31528 30.594... |
67 | 22 | 00390 | 9700000US2200390 | 2200390 | Cameron Parish | 00 | 3327796107 | 1688162100 | POLYGON ((-93.92921 29.80295, -93.92799 29.809... |
68 | 22 | 01500 | 9700000US2201500 | 2201500 | St. James Parish | 00 | 625594223 | 42156577 | POLYGON ((-90.96369 30.06645, -90.93560 30.085... |
69 rows × 9 columns
Though this dataset contains many attributes, we are primarily interested in the % of Economically Disadvantaged Students in each Parish.
# load up 2020 Student Attributes data from the Department of Education
Students20Df = pd.read_excel('feb-2020-multi-stats-(total-by-site-and-school-system).xlsx')
# clean students attributes data:
# get rid of unnecessary rows
for x in range(0, 6):
Students20Df = Students20Df.reindex(Students20Df.index.drop(0)).reset_index(drop=True)
# rename the columns
Students20Df.columns = ['School System','School System Name','Sites Reporting','Total Enrollment',\
'% Female','% Male','American Indian','Asian','Black','Hispanic','Hawaiian/Pacific Islander',\
'White','Multiple Races (Non-Hispanic)','Minority','% Fully English Proficient',\
'% Limited English Proficiency','Infants (Sp Ed)','Pre-School (Sp Ed)','Pre-K (Reg Ed)',\
'Kindegarten','Grade 1','Grade 2','Grade 3','Grade 4','Grade 5','Grade 6','Grade 7',\
'Grade 8','Grade 9','Grade T9','Grade 10','Grade 11','Grade 12','Extension Academy',\
'% Economically Disadvantaged (2020)']
# now we will merge StudentsDf and ParishACTdf into a singular dataset, merging on School System
# first, set index of StudentsDf to be School System
#Students20Df = Students20Df.set_index('School System Name')
# Need to fix De Soto Parish and La Salle Parish to be consistent with naming schema in later dataframes.
Students20Df[Students20Df['School System Name'] == 'LaSalle Parish']
# De Soto at index 15, La Salle at index 29
Students20Df.at[15, 'School System Name'] = 'De Soto Parish'
Students20Df.at[29, 'School System Name'] = 'La Salle Parish'
This dataset includes unemployment data by parish for several years, but we are interested in Median Household Income by parish, for which the dataset only includes 2018.
# Employment and Income data from the USDA.
employment_df = pd.read_excel('UnemploymentReport.xlsx')
# The year columns are unemployment rates as a percent.
# The first row contains the numbers for the state as a whole; each row after that observes a single parish.
employment_df.columns = employment_df.loc[0] # Assign columns proper names
employment_df = employment_df[1:66] # Cut off extra rows
# To get rid of some annoying warnings that clutter the notebook:
pd.options.mode.chained_assignment = None # default='warn'
employment_df.reset_index(drop = True, inplace = True) # Reset our index
employment_df['Median Household Income (2018)'] = employment_df['Median Household Income (2018)'].astype('float64') # set the type to float for doing math later
#Fix up the 'Name' column so that the strings are consistent with the Students 2020 dataset
employment_df['Name'] = employment_df['Name'].str[:-4]
employment_df.at[0, 'Name'] = 'Louisiana'
This dataframe includes all of the data from the 2020 Student Attributes dataset, as well as 2018 Median Household Income by Parish from the USDA. We will use it for our Parish-level analysis of SPS and ACT Scores.
parishAttributes = pd.merge(employment_df.filter(['Name', 'Median Household Income (2018)', '% of State Median HH Income']), Students20Df,
how='left',
left_on=['Name'],
right_on = ['School System Name'])
# Fix up some dtypes for doing math later
parishAttributes['% Economically Disadvantaged (2020)'] = pd.to_numeric(parishAttributes['% Economically Disadvantaged (2020)'])
# Now let's see our new dataframe!
parishAttributes
Name | Median Household Income (2018) | % of State Median HH Income | School System | School System Name | Sites Reporting | Total Enrollment | % Female | % Male | American Indian | ... | Grade 6 | Grade 7 | Grade 8 | Grade 9 | Grade T9 | Grade 10 | Grade 11 | Grade 12 | Extension Academy | % Economically Disadvantaged (2020) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Louisiana | 48021.0 | 1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | Acadia Parish | 40484.0 | 0.843048 | 001 | Acadia Parish | 32 | 9738 | 0.489628 | 0.510372 | 15 | ... | 729 | 732 | 667 | 675 | 96 | 653 | 533 | 622 | 0 | 0.707127 |
2 | Allen Parish | 44395.0 | 0.924491 | 002 | Allen Parish | 12 | 4159 | 0.47848 | 0.52152 | 36 | ... | 332 | 349 | 306 | 270 | 28 | 253 | 274 | 264 | 0 | 0.682856 |
3 | Ascension Parish | 77758.0 | 1.61925 | 003 | Ascension Parish | 29 | 23253 | 0.483637 | 0.516363 | 55 | ... | 1729 | 1860 | 1772 | 1701 | 115 | 1768 | 1678 | 1361 | 0 | 0.568099 |
4 | Assumption Parish | 48120.0 | 1.00206 | 004 | Assumption Parish | 9 | 3292 | 0.500304 | 0.499696 | 9 | ... | 224 | 254 | 234 | 211 | 49 | 237 | 205 | 235 | 0 | 0.717801 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
60 | Webster Parish | 35070.0 | 0.730305 | 060 | Webster Parish | 14 | 5989 | 0.480715 | 0.519285 | 5 | ... | 488 | 492 | 446 | 451 | 5 | 455 | 400 | 395 | 0 | 0.706462 |
61 | West Baton Rouge Parish | 58205.0 | 1.21207 | 061 | West Baton Rouge Parish | 9 | 3962 | 0.487885 | 0.512115 | 7 | ... | 304 | 268 | 267 | 271 | 24 | 272 | 233 | 209 | 0 | 0.701413 |
62 | West Carroll Parish | 39332.0 | 0.819058 | 062 | West Carroll Parish | 5 | 1972 | 0.479209 | 0.520791 | 0 | ... | 166 | 154 | 148 | 145 | 1 | 138 | 134 | 127 | 0 | 0.717546 |
63 | West Feliciana Parish | 60296.0 | 1.25562 | 063 | West Feliciana Parish | 5 | 2212 | 0.49141 | 0.50859 | 3 | ... | 149 | 161 | 156 | 172 | 15 | 157 | 146 | 140 | 0 | 0.466998 |
64 | Winn Parish | 40133.0 | 0.835739 | 064 | Winn Parish | 6 | 2111 | 0.480815 | 0.519185 | 4 | ... | 164 | 164 | 152 | 157 | 30 | 152 | 142 | 136 | 0 | 0.747513 |
65 rows × 38 columns
For what data we are able, we would like to perform more granular analysis - analysis on a school-by-school basis. Here we will load another sheet from the same 2020 School Attributes dataset which we created the students20DF, this time using school as an observational unit instead of parish.
schoolAttributes = pd.read_excel('feb-2020-multi-stats-(total-by-site-and-school-system).xlsx', sheet_name = 'Total by Site')
# Name columns
schoolAttributes.columns = ['School System','School System Name','SIS Submit Site Code', 'Federal Reporting Site Code','Site Name','Total Enrollment',\
'% Female','% Male','American Indian','Asian','Black','Hispanic','Hawaiian/Pacific Islander',\
'White','Multiple Races (Non-Hispanic)','Minority','% Fully English Proficient',\
'% Limited English Proficiency','Infants (Sp Ed)','Pre-School (Sp Ed)','Pre-K (Reg Ed)',\
'Kindegarten','Grade 1','Grade 2','Grade 3','Grade 4','Grade 5','Grade 6','Grade 7',\
'Grade 8','Grade 9','Grade T9','Grade 10','Grade 11','Grade 12','Extension Academy',\
'% Economically Disadvantaged (2020)', 'Nonprofit Organization', 'Charter Type', 'School System Roll Up Type', 'Parish Code']
schoolAttributes = schoolAttributes[6:1406] # drop empty rows
# Fix up some dtypes for doing math later
schoolAttributes['% Economically Disadvantaged (2020)'] = pd.to_numeric(schoolAttributes['% Economically Disadvantaged (2020)'])
# Take a look at our new dataframe!
schoolAttributes
School System | School System Name | SIS Submit Site Code | Federal Reporting Site Code | Site Name | Total Enrollment | % Female | % Male | American Indian | Asian | ... | Grade T9 | Grade 10 | Grade 11 | Grade 12 | Extension Academy | % Economically Disadvantaged (2020) | Nonprofit Organization | Charter Type | School System Roll Up Type | Parish Code | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6 | 001 | Acadia Parish | 001001 | 001001 | Armstrong Middle School | 313 | 0.482428 | 0.517572 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0.830671 | NaN | NaN | NaN | 01 |
7 | 001 | Acadia Parish | 001002 | 001002 | Branch Elementary School | 314 | 0.509554 | 0.490446 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.566879 | NaN | NaN | NaN | 01 |
8 | 001 | Acadia Parish | 001003 | 001003 | Central Rayne Kindergarten School | 221 | 0.41629 | 0.58371 | 0 | 1 | ... | 0 | 0 | 0 | 0 | 0 | 0.796380 | NaN | NaN | NaN | 01 |
9 | 001 | Acadia Parish | 001004 | 001004 | Church Point Elementary School | 577 | 0.457539 | 0.542461 | 3 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0.887348 | NaN | NaN | NaN | 01 |
10 | 001 | Acadia Parish | 001005 | 001005 | Church Point High School | 527 | 0.481973 | 0.518027 | 0 | 1 | ... | 30 | 119 | 116 | 124 | 0 | 0.654649 | NaN | NaN | NaN | 01 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1401 | R36 | Orleans Parish | WBZ001 | WBZ001 | McDonogh 35 Senior High School | 168 | 0.625 | 0.375 | 0 | 0 | ... | 6 | 0 | 0 | 0 | 0 | 0.886905 | InspireNOLA Charter Schools | Type 1 | R36 | 36 |
1402 | R36 | Orleans Parish | WC2001 | WC2001 | Opportunities Academy | 69 | 0.289855 | 0.710145 | 0 | 4 | ... | 0 | 0 | 0 | 69 | 0 | 0.913043 | Collegiate Academies | Type 1 | R36 | 36 |
1403 | R36 | Orleans Parish | WC3001 | WC3001 | IDEA Oscar Dunn | 207 | 0.487923 | 0.512077 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0.971014 | IDEA Public Schools | Type 1 | R36 | 36 |
1404 | CHA | Type 2 Charters | WJ5001 | WJ5001 | Collegiate Baton Rouge | 399 | 0.516291 | 0.483709 | 13 | 1 | ... | 0 | 153 | 113 | 0 | 0 | 0.922306 | Collegiate Academies | Type 2 | CHA | 17 |
1405 | CHA | Type 2 Charters | WZ8001 | WZ8001 | GEO Prep Mid-City of Greater Baton Rouge | 707 | 0.506365 | 0.493635 | 2 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0.916549 | GEO Academies EBR | Type 2 | CHA | 17 |
1400 rows × 41 columns
For the first section of our exploration, we will examine School Performance Scores (SPS). As the Department of Education Explains, SPS are derived from a combination of metrics which differ for elementary, middle, and high schools. For example, for high schools, SPS are dependent on:
We understand that SPS is a function of the Louisiana Department of Education's own standards. Still, we find a reasonable question for the beginning of our analysis to be: by the state's own standards, have any parishes been improving at a greater rate than others? If so, does this improvement have anything to do with median household income or the percentage of the student body that is economically disadvantaged?
The Louisiana Department of Education makes available data on SPS from 1999 onward. We will load the separate datasets containing data as far back as 2012, then extract the data that we want and warehouse it in a single dataframe that we will use for analysis, grouped and indexed by Parish.
sps_2012_df = pd.read_excel('2012-school-performance-scores.xlsx')
sps_2013_2017_df = pd.read_excel('2013-2017-school-performance-score-summary.xlsx', sheet_name = '2013 to 2017 SPS Summary')
sps_2018_2019_df = pd.read_excel('2019-school-performance-scores.xlsx')
sps_2012_df.columns = sps_2012_df.loc[1] # Assign our columns names, pulling them from Row 1
sps_2012_df = sps_2012_df[2:] # cut off the rows at top filled with NaN's and non-values
sps_2012_df.reset_index(drop = True, inplace = True)
sps_2012_df
1 | Site Code | School | District | School Type (Elementary, Middle, High, Combination) | 2012 Letter Grade | 2012 Baseline School Performance Score | Top Gain School (Yes/No) | 2012 Growth Goal | 2012 Growth School Performance Score Actual | Point Gain from 2011 Baseline Performance Score to 2012 Growth Performance Score | Point Change 2011 to 2012 | Percent Change 2011 to 2012 | Point Change 2008 to 2012 | Percent Change 2008 to 2012 | 2008 Baseline School Performance Score | 2009 Baseline School Performance Score | 2010 Baseline School Performance Score | 2011 Baseline School Performance Score | Selective Admissions or Alternative School |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 001001 | Armstrong Middle School | Acadia Parish | Elementary/Middle School | D | 76.7 | No | 84.7 | 79.2 | 4.5 | 2 | 2.67738 | 0.8 | 1.05402 | 75.9 | 76.3 | 77 | 74.7 | |
1 | 001002 | Branch Elementary School | Acadia Parish | Elementary/Middle School | C | 100.4 | No | 108.4 | 98.3 | -4.5 | -2.4 | -2.33463 | -1.1 | -1.08374 | 101.5 | 101.9 | 101.8 | 102.8 | |
2 | 001003 | Central Rayne Kindergarten School | Acadia Parish | Elementary/Middle School | D | 89.3 | No | 98.6 | 79.9 | -8.8 | 0.6 | 0.676437 | 6.6 | 7.98065 | 82.7 | 85 | 83.2 | 88.7 | |
3 | 001004 | Church Point Elementary School | Acadia Parish | Elementary/Middle School | D | 78 | No | 88.8 | 80.8 | 2 | -0.8 | -1.01523 | -2 | -2.5 | 80 | 80.9 | 82.7 | 78.8 | |
4 | 001005 | Church Point High School | Acadia Parish | Combination School | C | 101.2 | Yes | 96.6 | 109.3 | 22.7 | 17.6 | 21.0526 | 17.3 | 20.6198 | 83.9 | 90.1 | 90.1 | 83.6 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1298 | 399002 | Arthur Ashe Charter School | RSD-FirstLine Schools, Inc. | Elementary/Middle School | D | 82 | No | 92 | 82.4 | 0.4 | 0 | 0 | NaN | NaN | NaN | 67.2 | 83.8 | 82 | |
1299 | 399003 | Joseph S. Clark Preparatory High School | RSD-FirstLine Schools, Inc. | High School | T | 55.8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | |
1300 | 399004 | John Dibert Community School | RSD-FirstLine Schools, Inc. | Elementary/Middle School | T | 73.8 | Yes | 74.7 | 88.6 | 23.9 | 9.1 | 14.0649 | NaN | NaN | NaN | NaN | NaN | 64.7 | |
1301 | A02002 | Riverside Alternative High School | Office of Juvenile Justice | Combination School | F | 38.2 | Yes | 29.3 | 39.2 | 19.9 | 23.2 | 154.667 | NaN | NaN | NaN | NaN | 29.3 | 15 | |
1302 | A02003 | Southside Alternative High School | Office of Juvenile Justice | Combination School | F | 42 | Yes | 32.7 | 47.2 | 24.5 | 26.5 | 170.968 | NaN | NaN | NaN | NaN | 25.6 | 15.5 |
1303 rows × 19 columns
sps_2013_2017_df = pd.read_excel('2013-2017-school-performance-score-summary.xlsx', sheet_name = '2013 to 2017 SPS Summary')
sps_2013_2017_df.columns = sps_2013_2017_df.loc[0] # Assign our columns names, pulling them from Row 0
sps_2013_2017_df = sps_2013_2017_df[1:] # cut off the row at top filled with NaN's and non-values
#some of these column names have annoying whitespace tacked on - let's fix that. Also, let's make the naming scheme consistent with the 2018-2019 df.
sps_2013_2017_df.rename(columns = {'2017 Annual SPS ':'2017 SPS',
'2016 Annual SPS ':'2016 SPS',
'2015 Annual SPS ':'2015 SPS',
'2014 Annual SPS ':'2014 SPS',
'2013 Annual SPS ':'2013 SPS'},
errors = 'raise',
inplace = True)
sps_2013_2017_df.reset_index(drop = True, inplace = True)
sps_2013_2017_df
Site Code | School | District | School Type (Elementary, Middle, High, Combination) | 2017\n Letter Grade | 2017 SPS | 2016\n Letter Grade | 2016 SPS | 2015\n Letter Grade | 2015 SPS | 2014\n Letter Grade | 2014 SPS | 2013\n Letter Grade | 2013 SPS | Point Difference Between 2013 to 2017 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 001001 | Armstrong Middle School | Acadia Parish | Elementary/Middle School | C | 64.3 | D | 66.5 | D | 53.5 | D | 64.5 | D | 67.1 | -2.8 |
1 | 001002 | Branch Elementary School | Acadia Parish | Elementary/Middle School | A | 104.2 | A | 102.7 | B | 99.8 | A | 100.2 | B | 94.6 | 9.6 |
2 | 001003 | Central Rayne Kindergarten School | Acadia Parish | Elementary/Middle School | D | 62.7 | B | 89.9 | C | 73.5 | C | 83.9 | C | 73.3 | -10.6 |
3 | 001004 | Church Point Elementary School | Acadia Parish | Elementary/Middle School | D | 62.4 | C | 67.8 | D | 60.9 | C | 77.3 | C | 72.9 | -10.5 |
4 | 001005 | Church Point High School | Acadia Parish | High School | B | 90 | C | 82.3 | C | 79.5 | C | 71.7 | C | 80.6 | 9.4 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1327 | WAL001 | JS Clark Leadership Academy | JS Clark Leadership Academy | Combination School | F | 41.1 | D | 63 | C | 71.7 | D | 68 | D | 54.2 | -13.1 |
1328 | WAR001 | Tangi Academy | Tangipahoa Charter School Association | Elementary/Middle School | D | 58 | D | 58.8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1329 | WAU001 | GEO Prep Academy of Greater Baton Rouge | GEO Prep Academy of Greater Baton Rouge | Elementary/Middle School | C | 77.2 | C | 77.2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1330 | WAV001 | Democracy Prep Louisiana Charter School | Recovery School District - Baton Rouge | Elementary/Middle School | C | 67.4 | C | 67.4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1331 | WAX001 | Baton Rouge College Prep | Recovery School District - Baton Rouge | Elementary/Middle School | C | 71.7 | C | 71.7 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1332 rows × 15 columns
sps_2018_2019_df.columns = sps_2018_2019_df.loc[2] # Assign our columns names, pulling them from Row 2
sps_2018_2019_df = sps_2018_2019_df[3:1270] # cut off the rows at top and bottom filled with NaN's and non-values
sps_2018_2019_df.rename(columns = {'2019 Letter Grade ': '2019 Letter Grade', '2019 SPS ' : '2019 SPS', '2018 Letter Grade ': '2018 Letter Grade', '2018 SPS ':'2018 SPS'}, errors = 'raise', inplace = True) #some of these column names have annoying whitespace tacked on - let's fix that
sps_2018_2019_df.reset_index(drop = True, inplace = True) # reset the index for convenience
sps_2018_2019_df # And let's see our beautiful, tidied-up table!
2 | Site Code | School | School System | School Type (Elementary, Middle, High, Combination) | 2019 Letter Grade | 2019 SPS | 2018 Letter Grade | 2018 SPS | 2019 K8 & High School Assessment Letter Grade Equivalent | 2019 K8 & High School Progress Letter Grade Equivalent | ... | 2018 K8 & High School \nProgress Index | 2018 K8 Assessment Index | 2018 K8 Progress Index | 2018 Dropout Credit Accumulation Index | 2018 High School Assessment Index | 2018 High School Progress Index | 2018 ACT Index | Strength of Diploma (Graduation Index) (2016-2017 Cohort) | Cohort Graduation Rate Index (Points Earned for Cohort Graduation Rate) (2016-2017 Cohort) | Cohort Graduation Rate (Actual Graduation Rate) (2016-2017 Cohort) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 001001 | Armstrong Middle School | Acadia Parish | Elementary/Middle School | C | 61.8 | D | 53.7 | D | C | ... | 67.5 | 43.6 | 67.5 | 126.3 | NaN | NaN | NaN | NaN | NaN | NaN |
1 | 001002 | Branch Elementary School | Acadia Parish | Elementary/Middle School | A | 92 | B | 87.8 | B | A | ... | 101.7 | 79.6 | 101.7 | 133.8 | NaN | NaN | NaN | NaN | NaN | NaN |
2 | 001003 | Central Rayne Kindergarten School | Acadia Parish | Elementary/Middle School | C | 68.5 | C | 73.9 | D | A | ... | 102.9 | 64.2 | 102.9 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | 001004 | Church Point Elementary School | Acadia Parish | Elementary/Middle School | C | 61.9 | C | 63.7 | D | B | ... | 77.6 | 59 | 77.6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | 001005 | Church Point High School | Acadia Parish | High School | B | 77.1 | B | 80.7 | D | C | ... | 71.8 | NaN | NaN | NaN | 62.9 | 71.8 | 62.1 | 93.9 | 99.4 | 89.5 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1262 | WBR001 | Athlos Academy of Jefferson Parish | Athlos Academy of Jefferson Parish | Elementary/Middle School | F | 43.6 | NaN | NaN | F | D | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1263 | WBU001 | Rosenwald Collegiate Academy | Orleans Parish | High School | B | 84.5 | NaN | NaN | C | A | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1264 | WBV001 | Dwight D. Eisenhower Charter School | Orleans Parish | Elementary/Middle School | C | 63.8 | NaN | NaN | F | A | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1265 | WJ5001 | Collegiate Baton Rouge | Collegiate Baton Rouge | High School | C | 64.1 | C | 64.4 | F | A | ... | 82.9 | NaN | NaN | NaN | 45.9 | 82.9 | NaN | NaN | NaN | NaN |
1266 | WZ8001 | GEO Prep Mid-City of Greater Baton Rouge | GEO Prep Mid-City of Greater Baton Rouge | Elementary/Middle School | T | 56 | T | 51 | F | A | ... | 92.2 | 30.4 | 92.2 | 134.2 | NaN | NaN | NaN | NaN | NaN | NaN |
1267 rows × 44 columns
Now that we have loaded and cleaned up the SPS datasets, let's integrate the data we want into a single dataframe grouped by school system. The SPS column, once grouped by school system, will represent the MEAN SPS for the entire system! So, '2018 SPS' for a given parish will represent that parish's mean SPS in 2018.
# create DF from 2018-2019 data
sps_df = sps_2018_2019_df.filter(['School','School System','2019 SPS', '2018 SPS'])
# Add SPS data from 2013-2017
sps_df = pd.merge(sps_df, sps_2013_2017_df.filter(['School',
'District',
'2017 SPS',
'2016 SPS',
'2015 SPS',
'2014 SPS',
'2013 SPS']),
how = 'left',
left_on = ['School', 'School System'],
right_on = ['School', 'District'])
# Get rid of the redundant District column now that we're done with the join
del sps_df['District']
# Add SPS data from 2012
sps_df = pd.merge(sps_df, sps_2012_df.filter(['School', 'District', '2012 Baseline School Performance Score']),
how = 'left',
left_on = ['School', 'School System'],
right_on = ['School', 'District'])
# A little more tidying
sps_df.rename(columns = {'2012 Baseline School Performance Score':'2012 SPS'}, errors = 'raise', inplace = True)
del sps_df['District']
sps_df = sps_df.astype({'2019 SPS' : 'float64','2018 SPS': 'float64','2017 SPS': 'float64','2016 SPS': 'float64','2015 SPS': 'float64','2014 SPS': 'float64','2013 SPS': 'float64','2012 SPS': 'float64'})
# before we group by school system, save the ungrouped df elsewhere
byschool_sps_df = sps_df
# And now, let's group by school system!
sps_df = sps_df.groupby('School System').mean()
sps_df.reset_index(inplace = True)
sps_df
School System | 2019 SPS | 2018 SPS | 2017 SPS | 2016 SPS | 2015 SPS | 2014 SPS | 2013 SPS | 2012 SPS | |
---|---|---|---|---|---|---|---|---|---|
0 | A.E. Phillips Laboratory School | 105.600000 | 108.300000 | 126.900000 | 126.700000 | 123.200000 | 125.600000 | 126.300000 | 149.500000 |
1 | Acadia Parish | 78.534615 | 78.338462 | 87.650000 | 92.511538 | 85.311538 | 86.592308 | 83.223077 | 98.426923 |
2 | Acadiana Renaissance Charter Academy | 91.600000 | 89.700000 | 106.600000 | 102.700000 | 94.400000 | NaN | NaN | NaN |
3 | Advantage Charter Academy | 53.700000 | 53.300000 | 68.000000 | 54.400000 | 54.800000 | NaN | NaN | NaN |
4 | Allen Parish | 84.745455 | 84.054545 | 96.500000 | 97.318182 | 92.854545 | 92.763636 | 91.536364 | 112.390909 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
107 | West Carroll Parish | 75.780000 | 74.900000 | 86.240000 | 88.820000 | 86.760000 | 85.660000 | 85.220000 | 105.060000 |
108 | West Feliciana Parish | 86.725000 | 84.775000 | 95.325000 | 100.300000 | 102.050000 | 98.950000 | 96.875000 | 120.775000 |
109 | Willow Charter Academy | 52.000000 | 54.200000 | 44.600000 | 42.800000 | 39.000000 | NaN | NaN | NaN |
110 | Winn Parish | 77.400000 | 78.750000 | 82.716667 | 84.550000 | 87.783333 | 80.466667 | 85.766667 | 102.925000 |
111 | Zachary Community School District | 92.150000 | 89.283333 | 106.700000 | 108.466667 | 112.150000 | 109.600000 | 108.966667 | 131.800000 |
112 rows × 9 columns
To examine change over time, we will use average rate of change over the period of our analysis, which for this section will be 2015-2019. Rate of change can be computed as, for a given start point and end point a and b of a period of time:
(f(b) - f(a)) / (b-a)
For our purposes, b = 2019, a = 2015, and the function f is a parish's mean SPS.
The parish attributes that will serve as independent variables in this section are Median Household Income (2018) and % of Students Economically Disadvantaged (2020).
# Merge our attributes dataframe and our SPS dataframe together so that we can compare values within the two
sps_df = pd.merge(sps_df, parishAttributes.filter(['Name', 'Median Household Income (2018)', '% Economically Disadvantaged (2020)']),
how = 'left',
left_on = ['School System'],
right_on = ['Name'])
# calculate avg. rate of change since 2015
sps_df['Avg. Rate of Change'] = (sps_df['2019 SPS'] - sps_df['2015 SPS']) / (2019-2015)
# Rate of Change as a function of median household income
sns.lmplot(x='Median Household Income (2018)',y='Avg. Rate of Change',data = sps_df,fit_reg=True)
ax = plt.gca()
ax.set_title("Median Household Income vs Avg Rate of Change in Mean SPS \n (by Parish, 2015-2019)")
print(sps_df['Median Household Income (2018)'].corr(sps_df['Avg. Rate of Change']))
# Rate of change as function of % of economically disadvantaged students
sns.lmplot(x='% Economically Disadvantaged (2020)',y='Avg. Rate of Change',data = sps_df,fit_reg=True)
ax = plt.gca()
ax.set_title("% Economically Disadvantaged vs Avg. Rate of Change in Mean SPS \n (by Parish, 2015-2019)")
print(sps_df['% Economically Disadvantaged (2020)'].corr(sps_df['Avg. Rate of Change']))
-0.19993419029085938 0.2192257638859381
As you can see in the above plots, there is virtually no relationship between Avg. Rate of Change in SPS and either of our variables. This may suggest that wealth of a parish or its student body do not have an effect on how much their overall SPS has changed during this four year period.
However, it is entirely possible that analysis at the parish level is not granular enough to determine a relationship between changes in SPS over time and the % of a given student body that is economically disadvantaged. There exist a range of schools in any given parish, some of which may have significantly greater performance than others, and some of which may have many more economically disadvantaged students than others.
Despite a lack of relationships in the above section, it is common wisdom that the wealthier a school district, the better its schools will perform. Here we will examine the relationship between each parish's mean SPS from 2019 and our dependent variables to see if this wisdom holds true for Louisiana.
# 2019 SPS as a function of median household income
sns.lmplot(x='Median Household Income (2018)',y='2019 SPS',data = sps_df,fit_reg=True)
ax = plt.gca()
ax.set_title("Median Household Income vs 2019 Mean SPS \n (by Parish)")
print(sps_df['Median Household Income (2018)'].corr(sps_df['2019 SPS']))
# 2019 SPS as a function of % economically disadvantaged
sns.lmplot(x='% Economically Disadvantaged (2020)',y='2019 SPS',data = sps_df,fit_reg=True)
ax = plt.gca()
ax.set_title("% Economically Disadvantaged vs 2019 Mean SPS \n (by Parish)")
print(sps_df['% Economically Disadvantaged (2020)'].corr(sps_df['2019 SPS']))
0.5572120061451805 -0.7728332692548592
The above plots show us much stronger relationships than we found before, with a 55.7% positive correlation between median household income and mean SPS and a 77.2% negative correlation between % of economically disadvantaged students and mean SPS. The latter of these two is the strongest relationship we have found so far.
At this point, we can state that:
This means that parishes with wealthier student bodies tend to have a higher SPS. However, this DOES NOT INDICATE that a parish with a wealthier student body has seen greater change in SPS than a parish with more economically disadvantaged students.
It seems that, at a Parish level, there is not a strong relationship between average rate of change in SPS and either of our dependent variables. However, we do have access to school-level data on both SPS and the percentage of each school's student body that is economically disadvantaged. As we stated in section II.B., there may exist a range of school wealth and school performance within a given school system. Due to this fact, our parish-level analysis may have been obscured by a problem of granularity.
Here, we will conduct the most granular analysis possible at a school level to determine whether or not the percentage of economically disadvantaged students in a school's student body has any relationship to how that school's SPS has changed from 2015 to 2019.
# merge for analysis
byschool_sps_df = pd.merge(byschool_sps_df, schoolAttributes.filter(['Site Name', '% Economically Disadvantaged (2020)']),
how = 'left',
left_on = ['School'],
right_on = ['Site Name'])
# add rate of change column
byschool_sps_df['Avg. Rate of Change'] = (byschool_sps_df['2019 SPS'] - byschool_sps_df['2015 SPS']) / (2019-2015)
byschool_sps_df
School | School System | 2019 SPS | 2018 SPS | 2017 SPS | 2016 SPS | 2015 SPS | 2014 SPS | 2013 SPS | 2012 SPS | Site Name | % Economically Disadvantaged (2020) | Avg. Rate of Change | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Armstrong Middle School | Acadia Parish | 61.8 | 53.7 | 64.3 | 66.5 | 53.5 | 64.5 | 67.1 | 76.7 | Armstrong Middle School | 0.830671 | 2.075 |
1 | Branch Elementary School | Acadia Parish | 92.0 | 87.8 | 104.2 | 102.7 | 99.8 | 100.2 | 94.6 | 100.4 | Branch Elementary School | 0.566879 | -1.950 |
2 | Central Rayne Kindergarten School | Acadia Parish | 68.5 | 73.9 | 62.7 | 89.9 | 73.5 | 83.9 | 73.3 | 89.3 | Central Rayne Kindergarten School | 0.796380 | -1.250 |
3 | Church Point Elementary School | Acadia Parish | 61.9 | 63.7 | 62.4 | 67.8 | 60.9 | 77.3 | 72.9 | 78.0 | Church Point Elementary School | 0.887348 | 0.250 |
4 | Church Point High School | Acadia Parish | 77.1 | 80.7 | 90.0 | 82.3 | 79.5 | 71.7 | 80.6 | 101.2 | Church Point High School | 0.654649 | -0.600 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1320 | Athlos Academy of Jefferson Parish | Athlos Academy of Jefferson Parish | 43.6 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Athlos Academy of Jefferson Parish | 0.873327 | NaN |
1321 | Rosenwald Collegiate Academy | Orleans Parish | 84.5 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Rosenwald Collegiate Academy | 0.906250 | NaN |
1322 | Dwight D. Eisenhower Charter School | Orleans Parish | 63.8 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | Dwight D. Eisenhower Charter School | 0.907463 | NaN |
1323 | Collegiate Baton Rouge | Collegiate Baton Rouge | 64.1 | 64.4 | NaN | NaN | NaN | NaN | NaN | NaN | Collegiate Baton Rouge | 0.922306 | NaN |
1324 | GEO Prep Mid-City of Greater Baton Rouge | GEO Prep Mid-City of Greater Baton Rouge | 56.0 | 51.0 | NaN | NaN | NaN | NaN | NaN | NaN | GEO Prep Mid-City of Greater Baton Rouge | 0.916549 | NaN |
1325 rows × 13 columns
# Plot it out!
sns.lmplot(x='% Economically Disadvantaged (2020)',y='Avg. Rate of Change',data = byschool_sps_df,fit_reg=True)
ax = plt.gca()
ax.set_title("% Economically Disadvantaged vs Avg Rate of Change in SPS \n (by School, 2015-2019)")
byschool_sps_df['% Economically Disadvantaged (2020)'].corr(byschool_sps_df['Avg. Rate of Change'])
0.21967140759273363
As the plot above confirms, there is virtually no relationship between a school's percentage of economically disadvantaged students and the rate of change in the school's SPS. Our prior analysis was in fact not obscured by aggregating schools by parish; there simply exists no relationship here.
In this section, we aim to explore how ACT scores have changed over time in the state of Louisiana. We will evaluate them primarily at the parish level, in order to examine whether changes seen at the parish level are reflective of certain attributes that belong to the parish. In doing so, we hope to answer our question: Are the overall changes uniform across parishes? Or are parishes seeing different amounts of change depending on their make-up? Specifically in this section we will compare average rate of change of ACT scores with the Median Household Income of parishes, as well as the percent of economically disadvantaged students that a parish has. In doing this, we attempt to see if a parish that has high percent of economically disadvantaged students is seeing a greater (or lesser) increase in ACT scoring than parishes that have less economically disadvantaged students.
Our reasoning for examining ACT scores is to address problems of granularity. We have decided to examine ACT scores as well as SPS because ACT scores are a subcomponent of SPS. Additionally, school assessment is part of SPS, which is reliant on many qualitative judgments. ACT scores are purely quantitative. We suspect that the relationships between parish attributes and the performance metric in question will be similar for ACT and SPS, but still find the analysis worth conducting in order to confirm our hunch.
The Louisiana Department of Education makes available data on ACT (https://www.louisianabelieves.com/resources/library/high-school-performance). Each year has a separate dataset, so we will load the separate datasets containing data as far back as 2015, then compile the separate years into a singular dataframe that is grouped and indexed by Parish.
## First, load up the separate datasets for each year (2015 - 2019)
# load up ACT19df
ACT19df = pd.read_excel('act-scores-class-of-2019.xlsx')
# clean up ACT19df
# first, we will rename the columns
ACT19df.columns = ['School System Code','School System','2019 Student Count','2019 Average ACT']
# get rid of the first few rows, which are unnecessary to us
for x in range(0, 4):
ACT19df = ACT19df.reindex(ACT19df.index.drop(0)).reset_index(drop=True)
# setting School System as index column
ACT19df.set_index(["School System"], inplace = True, drop = True)
# load up ACT18df
ACT18df = pd.read_excel('act-class-of-2018.xlsx')
# clean up ACT18df
# first, we will rename the columns
ACT18df.columns = ['School System Code','School System','2018 Student Count','2018 Average ACT']
# get rid of the first few rows, which are unnecessary to us
for x in range(0, 4):
ACT18df = ACT18df.reindex(ACT18df.index.drop(0)).reset_index(drop=True)
# setting School System as index column
ACT18df.set_index(["School System"], inplace = True, drop = True)
# load up ACT17df
ACT17df = pd.read_excel('act-scores---class-of-2017.xlsx')
# clean up ACT17df
# first, we will rename the columns
ACT17df.columns = ['School System Code','School System','2017 Student Count','2017 Average ACT']
# get rid of the first few rows, which are unnecessary to us
for x in range(0, 5):
ACT17df = ACT17df.reindex(ACT17df.index.drop(0)).reset_index(drop=True)
# setting School System as index column
ACT17df.set_index(["School System"], inplace = True, drop = True)
# load up ACT16df
ACT16df = pd.read_excel('act-best-composite-scores-for-2015-2016-seniors-by-parish-1.xlsx')
# clean up ACT16df
# first, we will rename the columns
ACT16df.columns = ['School System Code','School System','2016 Student Count','2016 Average ACT']
# get rid of the first few rows, which are unnecessary to us
for x in range(0, 5):
ACT16df = ACT16df.reindex(ACT16df.index.drop(0)).reset_index(drop=True)
# setting School System as index column
ACT16df.set_index(["School System"], inplace = True, drop = True)
# load up ACT15df
ACT15df = pd.read_excel('act-best-composite-scores-for-2014-2015-seniors-by-parish.xlsx')
# clean up ACT15df
# get rid of the first few rows, which are unnecessary to us
for x in range(0, 7):
ACT15df = ACT15df.reindex(ACT15df.index.drop(0)).reset_index(drop=True)
ACT15df.drop(columns=['Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6','Unnamed: 7',\
'Unnamed: 8', 'Unnamed: 9', 'Unnamed: 10','Unnamed: 11',\
'Unnamed: 12', 'Unnamed: 13', 'Unnamed: 14','Unnamed: 15','Unnamed: 16'], inplace=True)
# first, we will rename the columns
ACT15df.columns = ['School System Code','School System','2015 Student Count','2015 Average ACT']
# setting School System as index column
ACT15df.set_index(["School System"], inplace = True, drop = True)
Now that we have read in all the separate years, we will compile them into a singular dataframe:
# Now we will merge all of the separate years together into one dataframe
ACTdf = pd.merge(ACT19df, ACT18df, how='left', left_on=['School System','School System Code'], right_on = ['School System','School System Code'])
ACTdf = pd.merge(ACTdf, ACT17df, how='left', left_on=['School System','School System Code'], right_on = ['School System','School System Code'])
ACTdf = pd.merge(ACTdf, ACT16df, how='left', left_on=['School System','School System Code'], right_on = ['School System','School System Code'])
ACTdf = pd.merge(ACTdf, ACT15df, how='left', left_on=['School System','School System Code'], right_on = ['School System','School System Code'])
# We will drop columns that we do not need
ACTdf.drop(columns=['2019 Student Count','2018 Student Count',\
'2017 Student Count','2016 Student Count',\
'2015 Student Count'], inplace=True)
# Cast ACT scores into numeric so that we can do computations with them late
ACTdf['2019 Average ACT'] = pd.to_numeric(ACTdf['2019 Average ACT'])
ACTdf['2018 Average ACT'] = pd.to_numeric(ACTdf['2018 Average ACT'])
ACTdf['2017 Average ACT'] = pd.to_numeric(ACTdf['2017 Average ACT'])
ACTdf['2016 Average ACT'] = pd.to_numeric(ACTdf['2016 Average ACT'])
ACTdf['2015 Average ACT'] = pd.to_numeric(ACTdf['2015 Average ACT'])
# calculate '% Change' from the 2015 to 2019 ACT for each parish
ACTdf['% Change'] = (ACTdf['2019 Average ACT'] - ACTdf['2015 Average ACT']) / ACTdf['2015 Average ACT']
ACTdf.reset_index(inplace = True)
# call the dataframe
ACTdf
School System | School System Code | 2019 Average ACT | 2018 Average ACT | 2017 Average ACT | 2016 Average ACT | 2015 Average ACT | % Change | |
---|---|---|---|---|---|---|---|---|
0 | LOUISIANA STATE TOTAL\n(INCLUDES PUBLIC SCHOOL... | LA | 18.9 | 19.3 | 19.6 | 19.5 | 19.4 | -0.025773 |
1 | Acadia Parish | 001 | 18.0 | 18.8 | 18.5 | 18.9 | 17.9 | 0.005587 |
2 | Allen Parish | 002 | 19.2 | 18.9 | 19.6 | 19.3 | 19.3 | -0.005181 |
3 | Ascension Parish | 003 | 20.6 | 20.3 | 20.3 | 20.4 | 20.6 | 0.000000 |
4 | Assumption Parish | 004 | 17.7 | 18.3 | 18.8 | 18.3 | 17.4 | 0.017241 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
67 | Zachary Community School District | 067 | 21.1 | 21.9 | 21.4 | 21.4 | 20.7 | 0.019324 |
68 | City of Baker School District | 068 | 15.1 | 16.2 | 17.3 | 17.0 | 17.5 | -0.137143 |
69 | Central Community School District | 069 | 20.4 | 20.9 | 21.2 | 21.2 | 21.1 | -0.033175 |
70 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
71 | Recovery School District-Baton Rouge | RBR | 14.3 | 14.6 | NaN | NaN | NaN | NaN |
72 rows × 8 columns
We will first investigate how the current ACT scores relate to parish attributes, median household income (2018) and % of economically disadvantaged students (2020).
# Merge our attributes dataframe and our ACT dataframe together so that we can compare values within the two
ACTdf = pd.merge(ACTdf, parishAttributes.filter(['Name', 'Median Household Income (2018)', '% Economically Disadvantaged (2020)']),
how = 'left',
left_on = ['School System'],
right_on = ['Name'])
# as function of median household income
sns.lmplot(x='Median Household Income (2018)',y='2019 Average ACT',data = ACTdf,fit_reg=True)
ax = plt.gca()
ax.set_title("Median Household Income vs 2019 Mean ACT Score \n (by Parish)")
print(ACTdf['Median Household Income (2018)'].corr(ACTdf['2019 Average ACT']))
# as function of % of economically disadvantaged students
sns.lmplot(x='% Economically Disadvantaged (2020)',y='2019 Average ACT',data = ACTdf,fit_reg=True)
ax = plt.gca()
ax.set_title("% Economically Disadvantaged vs 2019 Mean ACT Score \n (by Parish)")
print(ACTdf['% Economically Disadvantaged (2020)'].corr(ACTdf['2019 Average ACT']))
0.7496603810911924 -0.8630099056051078
As with 2019 mean SPS in section II, 2019 mean ACT score has a very strong relationship with both dependent variables here. Notably, the relationship between median household income and mean ACT score is much stronger than the relationship between median household income and mean SPS, with a 75% positive correlation.
Additionally, the relationship between percentage of economically disadvantaged students in a parish and 2019 mean ACT is the strongest we have found so far, with a 86.3% negative correlation. This means that the greater the portion of a parish's students are economically disadvantaged, the lower its test scores will be.
# calculate avg. rate of change since 2015
ACTdf['Avg. Rate of Change'] = (ACTdf['2019 Average ACT'] - ACTdf['2015 Average ACT']) / (2019-2015)
# Rate of Change as a function of median household income
sns.lmplot(x='Median Household Income (2018)',y='Avg. Rate of Change',data = ACTdf,fit_reg=True)
ax = plt.gca()
ax.set_title("Median Household Income vs Avg Rate of Change in Mean ACT Score \n (by Parish, 2015-2019)")
print(ACTdf['Median Household Income (2018)'].corr(ACTdf['Avg. Rate of Change']))
# Rate of change as function of % of economically disadvantaged students
sns.lmplot(x='% Economically Disadvantaged (2020)',y='Avg. Rate of Change',data = ACTdf,fit_reg=True)
ax = plt.gca()
ax.set_title("% Economically Disadvantaged vs Avg. Rate of Change in Mean ACT Score \n (by Parish, 2015-2019)")
print(ACTdf['% Economically Disadvantaged (2020)'].corr(ACTdf['Avg. Rate of Change']))
0.305848217063877 -0.19494227557562813
From the figures above, we can see that there appears to not be a significant correlation between the change in ACT scores that a parish has experienced and either of our dependent variables. Does this mean that poorer parishes are experiencing the same increase in ACT scores as more wealthy parishes? Possibly, but we can't be sure.
This lack of relationships seems consistent with our findings in our analysis of SPS in section II.B.
So far, we have considered parishes as discrete units. However, despite having distinct school systems, parishes are not necessarily entirely discrete. Certain industries, groups of people, and cultures are distributed all over the state; their geographies cross parish borders. In order to understand regional trends in ACT scores, we will visualize the strongest relationship we have found so far on maps of Louisiana's parishes.
# We need to fix naming schema for De Soto Parish and La Salle Parish before we merge with the Geoframe
ACTdf['School System'].unique()
# DeSoto in row 16, La Salle in row 30
ACTdf.at[16,'School System'] = 'De Soto Parish'
ACTdf.at[30, 'School System'] = 'La Salle Parish'
# create a dataframe that contains both ACT data and geoframe data
ACTgeo = pd.merge(ACTdf, parishes_geo, how='left', left_on=['School System'], right_on = ['NAME'])
ACTgeo = GeoDataFrame(ACTgeo)
# create a dataframe that contains both Attributes data and geoframe data
AttributesGeo = pd.merge(parishAttributes, parishes_geo, how='left', left_on=['Name'], right_on = ['NAME'])
AttributesGeo = GeoDataFrame(AttributesGeo)
# tidy the data so that we can showcase it on the map
AttributesGeo['% Economically Disadvantaged (2020)'] = pd.to_numeric(AttributesGeo['% Economically Disadvantaged (2020)'],errors='coerce')
AttributesGeo['% Economically Disadvantaged (2020)'] = AttributesGeo['% Economically Disadvantaged (2020)'].fillna(0)
# set the range for the choropleth
vmin, vmax = 120, 220
# create figure and axes for Matplotlib
fig, (ax1,ax2) = plt.subplots(ncols=2, figsize=(10, 8))
# plot the geoframe data
ACTgeo.plot(column='2019 Average ACT', cmap='YlGnBu', linewidth=0.8, ax=ax1, edgecolor = '0.8', legend=True, legend_kwds = {'label': "2019 Average ACT", 'orientation': "horizontal"})
AttributesGeo.plot(column='% Economically Disadvantaged (2020)', cmap='YlGnBu_r', linewidth=0.8, ax=ax2, edgecolor = '0.8', legend=True, legend_kwds = {'label': "% Economically Disadvantaged Students", 'orientation': "horizontal"})
plt.show()
Above, with the maps we are able to visualize how parishes with low ACT scores have a high percent of economically disadvantaged students. It's important to visualize this in map form, as it allows us to not look at parishes as discrete objects. Instead we can see that regional trends exist that span across multiple parishes. For example, you can see in the top northeast corner of the state a collection of parishes that have low ACT scores and high percentages of economically disadvantaged students. The Southwest corner of the state appears to have higher average ACT scores and fewer economically disadvantaged students by comparison.
Lastly, for extra context, let's take a look at the distribution of ACT scores by school system in each year.
del ACTdf['% Economically Disadvantaged (2020)']
del ACTdf['Median Household Income (2018)']
del ACTdf['Avg. Rate of Change']
ax = ACTdf[ACTdf.columns.difference(['% Change'])].plot.kde(figsize=(12, 8),title="Average ACT Scores by School System (2015-2019)")
It appears as though the distribution of ACT scores has become increasingly centered as the years progress from 2015 to 2019.
In this section, we will examine on a parish-by-parish basis the ACT scores of key subgroups: Black or African American students, students with disabilities, and students that are economically disadvantaged. Our reasoning for this is twofold: 1) these subgroups' statewide performance are normally below the statewide mean performance and 2) we are missing data on other subgroups prior to 2018.
Our main method for understanding disparity is by computing the achievement gap within each parish. We define the achievement gap as PARISH MEAN ACT SCORE - SUBGROUP MEAN MEAN SCORE.
Given the subgroup data available, our analysis will cover the period from 2012-2018.
# loading subgroup data for 2012 to 2018
subACTdf = pd.read_excel('2012-2018-state-district-act-subgroup-performance (2).xlsx')
# clean the subgroup data
for x in range(0, 4):
subACTdf = subACTdf.reindex(subACTdf.index.drop(0)).reset_index(drop=True)
# now, I'll replace the '~' with NaNs
subACTdf = subACTdf.replace('~', np.nan)
subACTdf.columns = ['Code','School System','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15',\
'16','17','18','19','20','21','22','23','24','25','26','27','28','29','30',\
'31','32','33','34','35','36','37','38','39','40','41','42','43','44','45',\
'46','47','48','49','50','51','52','53','54','55','56','57','58','59','60',\
'61','62','63','64','65','66','67','68','69']
subACTdf.drop(['2','3','5','6','8','9','11','12','14','15','17','18','20','21','23','24','26',\
'27','29','30','32','33','35','36','38','39','41','42','44','45','47','48','50',\
'51','53','54','56','57','59','60','62','63','64','65','66','67','68','69'],axis=1,inplace=True)
subACTdf.columns = ['Code','School System','Black or African American (18)','Students With Disabilities (18)','Economically Disadvantaged (18)',\
'Black or African American (17)','Students With Disabilities (17)','Economically Disadvantaged (17)',\
'Black or African American (16)','Students With Disabilities (16)','Economically Disadvantaged (16)',\
'Black or African American (15)','Students With Disabilities (15)','Economically Disadvantaged (15)',\
'Black or African American (14)','Students With Disabilities (14)','Economically Disadvantaged (14)',\
'Black or African American (13)','Students With Disabilities (13)','Economically Disadvantaged (13)',\
'Black or African American (12)','Students With Disabilities (12)','Economically Disadvantaged (12)']
blackStudents = pd.DataFrame([['2012',subACTdf.iloc[0]['Black or African American (12)']], \
['2013', subACTdf.iloc[0]['Black or African American (13)']], \
['2014', subACTdf.iloc[0]['Black or African American (14)']], \
['2015', subACTdf.iloc[0]['Black or African American (15)']], \
['2016', subACTdf.iloc[0]['Black or African American (16)']], \
['2017', subACTdf.iloc[0]['Black or African American (17)']], \
['2018', subACTdf.iloc[0]['Black or African American (18)']],
],columns=['Year','Black or African American'])
blackStudents = blackStudents.set_index('Year')
disabilityStudents = pd.DataFrame([['2012',subACTdf.iloc[0]['Students With Disabilities (12)']], \
['2013', subACTdf.iloc[0]['Students With Disabilities (13)']], \
['2014', subACTdf.iloc[0]['Students With Disabilities (14)']], \
['2015', subACTdf.iloc[0]['Students With Disabilities (15)']], \
['2016', subACTdf.iloc[0]['Students With Disabilities (16)']], \
['2017', subACTdf.iloc[0]['Students With Disabilities (17)']], \
['2018', subACTdf.iloc[0]['Students With Disabilities (18)']],
],columns=['Year','Students with Disabilities'])
disabilityStudents = disabilityStudents.set_index('Year')
disadvStudents = pd.DataFrame([['2012',subACTdf.iloc[0]['Economically Disadvantaged (12)']], \
['2013', subACTdf.iloc[0]['Economically Disadvantaged (13)']], \
['2014', subACTdf.iloc[0]['Economically Disadvantaged (14)']], \
['2015', subACTdf.iloc[0]['Economically Disadvantaged (15)']], \
['2016', subACTdf.iloc[0]['Economically Disadvantaged (16)']], \
['2017', subACTdf.iloc[0]['Economically Disadvantaged (17)']], \
['2018', subACTdf.iloc[0]['Economically Disadvantaged (18)']],
],columns=['Year','Economically Disadvantaged'])
disadvStudents = disadvStudents.set_index('Year')
blackDis = pd.merge(disabilityStudents, blackStudents, left_index=True, right_index=True)
ACTbySubgroup = pd.merge(blackDis, disadvStudents, left_index=True, right_index=True)
Overall_Average = [20.3,19.5,19.2,19.4,19.5,19.6,19.3]
ACTbySubgroup['Overall Average'] = Overall_Average
ACTbySubgroup = ACTbySubgroup.astype(float)
We'll need to add a year column to our dataframe in order to analyze as a time series. Let's this DF out into dataframes by year, then change the observational unit. Instead of the observational unit being a school district, our observational unit will be a school district in a given year. That means that there will be separate rows for Orleans Parish in 2013, 2014, 2015, etc.
# FIXME - I think we (once again) have a problem with De Soto and La Salle parishes being name differently.
# In our ACTdf they are "De Soto Parish" and "La Salle Parish"
# It hasn't been consistent across all df's we've looked at. In this dataset, they appear to be stored as "DeSoto Parish" and "LaSalle Parish."
subACTdf[subACTdf['School System'] == 'DeSoto Parish']
subACTdf[subACTdf['School System'] == 'LaSalle Parish']
# De Soto at index 16, La Salle at index 30
subACTdf.at[16, 'School System'] = 'De Soto Parish'
subACTdf.at[30, 'School System'] = 'La Salle Parish'
# data for 2018
subACTdf_2018 = subACTdf.filter(['Code', 'School System', 'Black or African American (18)', 'Students With Disabilities (18)', 'Economically Disadvantaged (18)'])
subACTdf_2018['Year'] = 2018 # Add year column
subACTdf_2018.rename(columns = {'Black or African American (18)':'Black or African American', 'Students With Disabilities (18)':'Students With Disabilities', 'Economically Disadvantaged (18)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2018 = subACTdf_2018[['Year','Code', 'School System', 'Black or African American', 'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
# data for 2017
subACTdf_2017 = subACTdf.filter(['Code', 'School System', 'Black or African American (17)', 'Students With Disabilities (17)', 'Economically Disadvantaged (17)'])
subACTdf_2017['Year'] = 2017 # Add year column
subACTdf_2017.rename(columns = {'Black or African American (17)':'Black or African American', 'Students With Disabilities (17)':'Students With Disabilities', 'Economically Disadvantaged (17)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2017 = subACTdf_2017[['Year','Code', 'School System', 'Black or African American',
'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
# data for 2016
subACTdf_2016 = subACTdf.filter(['Code', 'School System', 'Black or African American (16)', 'Students With Disabilities (16)', 'Economically Disadvantaged (16)'])
subACTdf_2016['Year'] = 2016 # Add year column
subACTdf_2016.rename(columns = {'Black or African American (16)':'Black or African American', 'Students With Disabilities (16)':'Students With Disabilities', 'Economically Disadvantaged (16)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2016 = subACTdf_2016[['Year','Code', 'School System', 'Black or African American',
'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
# data for 2015
subACTdf_2015 = subACTdf.filter(['Code', 'School System', 'Black or African American (15)', 'Students With Disabilities (15)', 'Economically Disadvantaged (15)'])
subACTdf_2015['Year'] = 2015 # Add year column
subACTdf_2015.rename(columns = {'Black or African American (15)':'Black or African American', 'Students With Disabilities (15)':'Students With Disabilities', 'Economically Disadvantaged (15)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2015 = subACTdf_2015[['Year','Code', 'School System', 'Black or African American',
'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
# data for 2014
subACTdf_2014 = subACTdf.filter(['Code', 'School System', 'Black or African American (14)', 'Students With Disabilities (14)', 'Economically Disadvantaged (14)'])
subACTdf_2014['Year'] = 2014 # Add year column
subACTdf_2014.rename(columns = {'Black or African American (14)':'Black or African American', 'Students With Disabilities (14)':'Students With Disabilities', 'Economically Disadvantaged (14)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2014 = subACTdf_2014[['Year','Code', 'School System', 'Black or African American',
'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
# data for 2013
subACTdf_2013 = subACTdf.filter(['Code', 'School System', 'Black or African American (13)', 'Students With Disabilities (13)', 'Economically Disadvantaged (13)'])
subACTdf_2013['Year'] = 2013 # Add year column
subACTdf_2013.rename(columns = {'Black or African American (13)':'Black or African American', 'Students With Disabilities (13)':'Students With Disabilities', 'Economically Disadvantaged (13)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2013 = subACTdf_2013[['Year','Code', 'School System', 'Black or African American',
'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
# data for 2012
subACTdf_2012 = subACTdf.filter(['Code', 'School System', 'Black or African American (12)', 'Students With Disabilities (12)', 'Economically Disadvantaged (12)'])
subACTdf_2012['Year'] = 2012 # Add year column
subACTdf_2012.rename(columns = {'Black or African American (12)':'Black or African American', 'Students With Disabilities (12)':'Students With Disabilities', 'Economically Disadvantaged (12)':'Economically Disadvantaged'}, inplace = True) # Rename columns
subACTdf_2012 = subACTdf_2012[['Year','Code', 'School System', 'Black or African American',
'Students With Disabilities', 'Economically Disadvantaged']] # Rearrange columns, moving Year to front
Now let's concatenate them all together in a new dataframe called subgroups_df
subgroups_df = pd.concat([subACTdf_2018, subACTdf_2017, subACTdf_2016, subACTdf_2015, subACTdf_2014, subACTdf_2013, subACTdf_2012])
subgroups_df
Year | Code | School System | Black or African American | Students With Disabilities | Economically Disadvantaged | |
---|---|---|---|---|---|---|
0 | 2018 | LA | Louisiana Statewide | 17.3 | 15 | 17.8 |
1 | 2018 | 001 | Acadia Parish | 16.4 | 15.4 | 17.6 |
2 | 2018 | 002 | Allen Parish | 17.7 | 13.9 | 18.3 |
3 | 2018 | 003 | Ascension Parish | 17.9 | 15.1 | 18.3 |
4 | 2018 | 004 | Assumption Parish | 16.2 | 13.7 | 17 |
... | ... | ... | ... | ... | ... | ... |
67 | 2012 | 067 | Zachary Community School District | 19.4 | NaN | 19.3 |
68 | 2012 | 068 | City of Baker School District | 17.8 | NaN | 17.6 |
69 | 2012 | 069 | Central Community School District | 19.7 | NaN | 20.6 |
70 | 2012 | 017+RBR | East Baton Rouge Parish-EBR and RSD | 18.5 | 16.2 | 18.6 |
71 | 2012 | RBR | Recovery School District-Baton Rouge | 16.2 | NaN | 16.8 |
504 rows × 6 columns
# computing the achievement gap by year,parish
# compiling total scores for data for 2019
total2019df = ACTdf[["School System","2019 Average ACT"]].copy()
total2019df['Year'] = 2019 # Add year column
total2019df = total2019df.rename(columns = {'2019 Average ACT':'Overall Average ACT in Parish'})
total2019df
# compiling total scores for data for 2018
total2018df = ACTdf[["School System","2018 Average ACT"]].copy()
total2018df['Year'] = 2018 # Add year column
total2018df = total2018df.rename(columns = {'2018 Average ACT':'Overall Average ACT in Parish'})
total2018df
# compiling total scores for data for 2017
total2017df = ACTdf[["School System","2017 Average ACT"]].copy()
total2017df['Year'] = 2017 # Add year column
total2017df = total2017df.rename(columns = {'2017 Average ACT':'Overall Average ACT in Parish'})
total2017df
# compiling total scores for data for 2016
total2016df = ACTdf[["School System","2016 Average ACT"]].copy()
total2016df['Year'] = 2016 # Add year column
total2016df = total2016df.rename(columns = {'2016 Average ACT':'Overall Average ACT in Parish'})
total2016df
# compiling total scores for data for 2015
total2015df = ACTdf[["School System","2015 Average ACT"]].copy()
total2015df['Year'] = 2015 # Add year column
total2015df = total2015df.rename(columns = {'2015 Average ACT':'Overall Average ACT in Parish'})
total2015df
totalScoresdf = pd.concat([total2018df,total2017df,total2016df,total2015df])
# perform merge with subgroups df
subgroups_df = subgroups_df.merge(totalScoresdf, on=['School System', 'Year'])
# cast columns to numeric
subgroups_df["Black or African American"] = pd.to_numeric(subgroups_df["Black or African American"])
subgroups_df["Students With Disabilities"] = pd.to_numeric(subgroups_df["Students With Disabilities"])
subgroups_df["Economically Disadvantaged"] = pd.to_numeric(subgroups_df["Economically Disadvantaged"])
subgroups_df["Overall Average ACT in Parish"] = pd.to_numeric(subgroups_df["Overall Average ACT in Parish"])
# define achievement gap columns
subgroups_df['Black or African American Student Achievement Gap'] = subgroups_df['Black or African American'] - subgroups_df['Overall Average ACT in Parish']
subgroups_df['Students With Disabilities Achievement Gap'] = subgroups_df['Students With Disabilities'] - subgroups_df['Overall Average ACT in Parish']
subgroups_df['Economically Disadvantaged Achievement Gap'] = subgroups_df['Economically Disadvantaged'] - subgroups_df['Overall Average ACT in Parish']
# A look at our completed dataframe!
subgroups_df
Year | Code | School System | Black or African American | Students With Disabilities | Economically Disadvantaged | Overall Average ACT in Parish | Black or African American Student Achievement Gap | Students With Disabilities Achievement Gap | Economically Disadvantaged Achievement Gap | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 2018 | 001 | Acadia Parish | 16.4 | 15.4 | 17.6 | 18.8 | -2.4 | -3.4 | -1.2 |
1 | 2018 | 002 | Allen Parish | 17.7 | 13.9 | 18.3 | 18.9 | -1.2 | -5.0 | -0.6 |
2 | 2018 | 003 | Ascension Parish | 17.9 | 15.1 | 18.3 | 20.3 | -2.4 | -5.2 | -2.0 |
3 | 2018 | 004 | Assumption Parish | 16.2 | 13.7 | 17.0 | 18.3 | -2.1 | -4.6 | -1.3 |
4 | 2018 | 005 | Avoyelles Parish | 15.7 | NaN | 16.6 | 17.4 | -1.7 | NaN | -0.8 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
271 | 2015 | 066 | City of Bogalusa School District | 16.6 | NaN | 17.1 | 17.2 | -0.6 | NaN | -0.1 |
272 | 2015 | 067 | Zachary Community School District | 19.2 | 16.7 | 18.2 | 20.7 | -1.5 | -4.0 | -2.5 |
273 | 2015 | 068 | City of Baker School District | 17.4 | NaN | 17.3 | 17.5 | -0.1 | NaN | -0.2 |
274 | 2015 | 069 | Central Community School District | 18.4 | NaN | 20.2 | 21.1 | -2.7 | NaN | -0.9 |
275 | 2015 | RBR | Recovery School District-Baton Rouge | 14.2 | NaN | 14.4 | NaN | NaN | NaN | NaN |
276 rows × 10 columns
Overall_Average = [20.3,19.5,19.2,19.4,19.5,19.6,19.3]
ACTbySubgroup['Overall Average'] = Overall_Average
ACTbySubgroup = ACTbySubgroup.astype(float)
ACTbySubgroup.plot(y=["Students with Disabilities", "Black or African American","Economically Disadvantaged","Overall Average"])
<AxesSubplot:xlabel='Year'>
In the above graph we can see that statewide the trends across time of these specific subgroups are mainly reflective of the overall average trend. We do see a large drop following 2012 (which is when the mandate that all students take the ACT, not just students that are planning on attending college was put into place), but beyond that is a fairly steady upwards trend until 2017. We are unsure of what could have taken place between 2017 and 2018 to have made this drop occur.
Now we will move on to visualizing achievement gap over time of a particular subgroup: economically disadvantaged students.
# Filter out 2018
subgroups_2018 = subgroups_df[subgroups_df['Year'] == 2018]
subgroups_2018 = pd.merge(subgroups_2018, parishAttributes.filter(['Name', '% Economically Disadvantaged (2020)']),
how = 'left',
left_on = ['School System'],
right_on = ['Name'])
# Filter out 2015
subgroups_2015 = subgroups_df[subgroups_df['Year'] == 2015]
subgroups_2015 = pd.merge(subgroups_2015, parishAttributes.filter(['Name', '% Economically Disadvantaged (2020)']),
how = 'left',
left_on = ['School System'],
right_on = ['Name'])
# Create GeoDataFrames
subgroups_2018_geo = pd.merge(subgroups_2018, parishes_geo, how = 'left', left_on = ['School System'], right_on = ['NAME'])
subgroups_2018_geo = GeoDataFrame(subgroups_2018_geo)
subgroups_2015_geo = pd.merge(subgroups_2015, parishes_geo, how = 'left', left_on = ['School System'], right_on = ['NAME'])
subgroups_2015_geo = GeoDataFrame(subgroups_2015_geo)
# set the range for the choropleth
vmin, vmax = 120, 220
# create figure and axes for Matplotlib
fig, (ax1,ax2) = plt.subplots(ncols=2, figsize=(10, 8))
# Plot GeoDataFrame
subgroups_2018_geo.plot(column = 'Economically Disadvantaged Achievement Gap', cmap='YlGnBu_r', linewidth=0.8, ax=ax2, edgecolor = '0.8', legend=True, legend_kwds = {'label': "Achievement Gap Between \n Economically Disadvantaged Students' Mean Score \n and Parish Mean Score \n(2018)", 'orientation': "horizontal"})
subgroups_2015_geo.plot(column = 'Economically Disadvantaged Achievement Gap', cmap='YlGnBu_r', linewidth=0.8, ax=ax1, edgecolor = '0.8', legend=True, legend_kwds = {'label': "Achievement Gap Between \n Economically Disadvantaged Students' Mean Score \n and Parish Mean Score \n(2015)", 'orientation': "horizontal"})
<AxesSubplot:>
As we can see, some parishes saw their achievement gaps for this subgroup decrease drastically over the four-year period from 2015 to 2018. However, not all parishes see such changes, especially parishes that have an extremely high percentage of economically disadvantaged students. (As a thought experiment - if 80% of students in a parish are economically disadvantaged, the gap between their score and the parish's mean score will be virtually nonexistent unless the remaining 20% of students all score extremely high).
To illustrate how widely parishes' situations vary, look at the tables below. Morehouse Parish has seen a slight decrease in average ACT score, but has made even larger gains in closing the achievement gap for economically disadvantaged students. La Salle Parish's average scores have hardly changed, but its achievement gap for economically disadvantaged students has consistently widened. Vernon Parish (Josh's home parish!) has seen a fairly constant average score and achievement gaps that fluctuate with no real pattern.
# Look at Morehouse Parish in 2015 and 2018
mourehouse_df = subgroups_df[subgroups_df['School System'] == 'Morehouse Parish']
mourehouse_df
Year | Code | School System | Black or African American | Students With Disabilities | Economically Disadvantaged | Overall Average ACT in Parish | Black or African American Student Achievement Gap | Students With Disabilities Achievement Gap | Economically Disadvantaged Achievement Gap | |
---|---|---|---|---|---|---|---|---|---|---|
33 | 2018 | 034 | Morehouse Parish | 16.4 | 13.9 | 17.6 | 17.6 | -1.2 | -3.7 | 0.0 |
102 | 2017 | 034 | Morehouse Parish | 16.3 | 13.8 | 17.1 | 17.7 | -1.4 | -3.9 | -0.6 |
171 | 2016 | 034 | Morehouse Parish | 16.4 | 14.8 | 16.4 | 17.1 | -0.7 | -2.3 | -0.7 |
240 | 2015 | 034 | Morehouse Parish | 16.7 | NaN | 17.1 | 17.9 | -1.2 | NaN | -0.8 |
# Look at La Salle Parish in 2015 and 2018
lasalle_df = subgroups_df[subgroups_df['School System'] == 'La Salle Parish']
lasalle_df
Year | Code | School System | Black or African American | Students With Disabilities | Economically Disadvantaged | Overall Average ACT in Parish | Black or African American Student Achievement Gap | Students With Disabilities Achievement Gap | Economically Disadvantaged Achievement Gap | |
---|---|---|---|---|---|---|---|---|---|---|
29 | 2018 | 030 | La Salle Parish | 16.0 | NaN | 17.5 | 19.0 | -3.0 | NaN | -1.5 |
98 | 2017 | 030 | La Salle Parish | 17.8 | NaN | 18.6 | 20.4 | -2.6 | NaN | -1.8 |
167 | 2016 | 030 | La Salle Parish | 14.7 | NaN | 19.0 | 19.9 | -5.2 | NaN | -0.9 |
236 | 2015 | 030 | La Salle Parish | 17.2 | NaN | 18.5 | 18.8 | -1.6 | NaN | -0.3 |
vernon_df = subgroups_df[subgroups_df['School System'] == 'Vernon Parish']
vernon_df
Year | Code | School System | Black or African American | Students With Disabilities | Economically Disadvantaged | Overall Average ACT in Parish | Black or African American Student Achievement Gap | Students With Disabilities Achievement Gap | Economically Disadvantaged Achievement Gap | |
---|---|---|---|---|---|---|---|---|---|---|
56 | 2018 | 058 | Vernon Parish | 18.5 | 16.4 | 18.8 | 20.0 | -1.5 | -3.6 | -1.2 |
125 | 2017 | 058 | Vernon Parish | 18.3 | 16.0 | 19.2 | 20.2 | -1.9 | -4.2 | -1.0 |
194 | 2016 | 058 | Vernon Parish | 18.3 | 16.9 | 19.2 | 20.7 | -2.4 | -3.8 | -1.5 |
263 | 2015 | 058 | Vernon Parish | 18.1 | NaN | 19.1 | 20.4 | -2.3 | NaN | -1.3 |
In our project, we hoped to discover whether the overall upwards trend of school metrics in Louisiana were uniform across the board. If they weren't uniform, then we wanted to know which students (whether it be a particular group of parishes, or demographic groups) were driving the trend.
In our exploration of the data available we found no evidence that any particular subset of Louisiana students is driving recent statewide trends in school performance scores or ACT scores, no matter whether grouped students by Parish or by demographic characteristics. We were however able to look at the relationship between current performance and parish attributes. So while no meaningful conclusions can be drawn on whether or not certain parish attributes dictate a parish's propensity for increase in achievement, we can state that we saw significant correlation between SPS/ACT scores and median household income of a parish, as well as percent of economically disadvantaged students.
Furthermore, looking at subgroups within the state, we saw that trends of certain subgroups reflected the overall state trends. Diving further into our subgroup analysis, we analyzed the achievement gap and saw that there is a possibility of decrease in achievement gaps across the state, but we cannot be sure of this as this analysis is highly sensitive to individual parish conditions.