PythonCity School Analysis

Schools that spend between 585-615 dollars per student have a better percent passing reading, percent passing math, and percent overall passing rate than schools that spend more than 615 dollars per student. Also, those schools that spend less per student are smaller schools with 2,000 students or less.
It is also found that the schools with 2000 students or less have a higher percent passing reading, percent passing math, and percent overall passing rate than larger schools that have 2000 – 5000 students. Furthermore, the smaller the school, the less they spend per student.
With that being said,Charter schools have a higher percent passing reading, percent passing math, and percent overall passing rate than District schools. Thus being the top five performing schools based on percent passing rates, whereas the bottom five performing schools based on percent passing rate are District schools.
In all 15 schools, the Average Reading Score is higher than the Average Math Score.
Lasly, for the top 5 performing schools, the percent passing math is about 20 percent lower than the percent passing reading.

# Dependencies
import pandas as pd
import numpy as np

# The path to our csv file
schools_complete = "Resources/schools_complete.csv"
students_complete = "Resources/students_complete.csv"

# create dataframe using mapping
schools_complete_df = pd.read_csv(schools_complete)
students_complete_df = pd.read_csv(students_complete)
schools_complete_df.head()
students_complete_df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Student ID	name	gender	grade	school	reading_score	math_score
0	0	Paul Bradley	M	9th	Huang High School	66	79
1	1	Victor Smith	M	12th	Huang High School	94	61
2	2	Kevin Rodriguez	M	12th	Huang High School	90	60
3	3	Dr. Richard Scott	M	12th	Huang High School	67	58
4	4	Bonnie Ray	F	9th	Huang High School	97	84

District Summary

# finding total schools
total_schools = len(schools_complete_df['School ID'])

# The total number of students
total_students = schools_complete_df['size'].sum()

# Get the total budget
total_budget = schools_complete_df['budget'].sum()

# Get the Average Math Score
average_math_score = round(students_complete_df['math_score'].mean(),6)

# Get the Average Reading Score
average_reading_score = round(students_complete_df['reading_score'].mean(),5)

# Get the percentage of students passing math
students_passing_math = students_complete_df.loc[students_complete_df["math_score"] > 70,:]
percent_passing_math = round(float(students_passing_math['math_score'].count()/total_students)*100,6)

# Get the percentage of students passing reading
students_passing_reading = students_complete_df.loc[students_complete_df["reading_score"] >70,:]
percent_passing_reading = round(float(students_passing_reading["reading_score"].count()/total_students)*100,6)

# Get the overall passing rate for math and reading
overall_passing_rate =round((percent_passing_math + percent_passing_reading)/2,6)

# Create a disctric Summary Dataframe 
summary_df = pd.DataFrame({"Total Schools":[total_schools],
                          "Total Students":[total_students],
                          "Total Budget":[total_budget],
                          "Average Math Score":[average_math_score],
                          "Average Reading Score":[average_reading_score],
                          "Percent Passing Math":[percent_passing_math],
                          "Percent Passing Reading":[percent_passing_reading],
                          "Percent Overall Passing Rate":[overall_passing_rate]})
# summary_df
# to put it in the right format
district_summary_df = pd.DataFrame(summary_df, columns=["Total Schools",
                                                        "Total Students",
                                                        "Total Budget",
                                                        "Average Math Score",
                                                        "Average Reading Score",
                                                        "Percent Passing Math",
                                                        "Percent Passing Reading",
                                                       "Percent Overall Passing Rate"])
district_summary_df["Total Students"] = district_summary_df["Total Students"].map('{:,}'.format)
district_summary_df["Total Budget"] = district_summary_df["Total Budget"].map('${:,.2f}'.format)

district_summary_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Total Schools	Total Students	Total Budget	Average Math Score	Average Reading Score	Percent Passing Math	Percent Passing Reading	Percent Overall Passing Rate
0	15	39,170	$24,649,428.00	78.985371	81.87784	72.392137	82.971662	77.6819

School Summary

# Change header name from the dataframe from name to school
schools_complete_df = schools_complete_df.rename(columns={"name": "school"})
# schools_complete_df.columns

# Merge the two data frames together on = school 
schools_complete_data_df = pd.merge(schools_complete_df, students_complete_df, on='school')
#testing
# schools_complete_data_df.head()

# count number of students in each school
student_count = schools_complete_data_df['school'].value_counts()
# testing
# student_count.head()

# changing the school type into a str
school_type = schools_complete_data_df.groupby('school')['type'].unique()
school_type = school_type.str[0]
#testing
# school_type

# total budget for each school
budget_per_school = schools_complete_data_df.groupby('school')['budget'].count()
# budget_per_school.head()
# testing
# budget_per_school

# budget for each student
budget_per_student = schools_complete_df.set_index('school')['budget']/schools_complete_df.set_index('school')['size']
# budget_per_student.head()
# testing
# budget_per_student

# Average math and readin scores for each school
school_average_math = round(schools_complete_data_df.groupby('school')['math_score'].mean(),2)
school_average_reading = round(schools_complete_data_df.groupby('school')['reading_score'].mean(),2)

# dataframe with reading and math passing only scores
passing_df = schools_complete_data_df.loc[(schools_complete_data_df['math_score'] >=70) 
                                          & (schools_complete_data_df['reading_score'] >=70)]
passing_math_df = schools_complete_data_df.loc[(schools_complete_data_df['math_score'] >=70)]
passing_reading_df = schools_complete_data_df.loc[(schools_complete_data_df['reading_score'] >=70)]

# testing
# passing_math_df.head()
# passing_reading_df.head()

# percentage for students passing math and reading 
percent_passing_math = round((passing_math_df.groupby('school')['math_score'].count()/student_count)*100,1)
percent_passing_reading = round((passing_reading_df.groupby('school')['reading_score'].count()/student_count)*100,1)
# percent_passing_reading
# percent_passing_math

# overall passing percentage
overall_passing_percent = round((percent_passing_math + percent_passing_reading)/2, 2)
# overall_passing_percent

# school summary dataframe

school_summary_df = pd.DataFrame({"School Type":school_type,
                                "Total Students":student_count,
                                "Total School Budget":budget_per_school,
                                "Budget Per Student":budget_per_student,
                                "Average Math Score": school_average_math,
                                "Average Reading Score": school_average_reading,
                                "% Passing Math": percent_passing_math,
                                "% Passing Reading": percent_passing_reading,
                                "% Overall Passing Rate":overall_passing_percent},columns=["School Type",
                                                        "Total Students",
                                                        "Total School Budget",
                                                        "Budget Per Student",
                                                        "Average Math Score",
                                                        "Average Reading Score",
                                                        "% Passing Math",
                                                       "% Passing Reading",
                                                        "% Overall Passing Rate"])
#testing
# school_summary_df.head()
school_summary_df["Total Students"] = school_summary_df["Total Students"].map('{:,}'.format)
school_summary_df["Total School Budget"] = school_summary_df["Total School Budget"].map('${:,.2f}'.format)
school_summary_df["Budget Per Student"] = school_summary_df["Budget Per Student"].map('${:,.2f}'.format)
school_summary_df

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	School Type	Total Students	Total School Budget	Budget Per Student	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Bailey High School	District	4,976	$4,976.00	$628.00	77.05	81.03	66.7	81.9	74.30
Cabrera High School	Charter	1,858	$1,858.00	$582.00	83.06	83.98	94.1	97.0	95.55
Figueroa High School	District	2,949	$2,949.00	$639.00	76.71	81.16	66.0	80.7	73.35
Ford High School	District	2,739	$2,739.00	$644.00	77.10	80.75	68.3	79.3	73.80
Griffin High School	Charter	1,468	$1,468.00	$625.00	83.35	83.82	93.4	97.1	95.25
Hernandez High School	District	4,635	$4,635.00	$652.00	77.29	80.93	66.8	80.9	73.85
Holden High School	Charter	427	$427.00	$581.00	83.80	83.81	92.5	96.3	94.40
Huang High School	District	2,917	$2,917.00	$655.00	76.63	81.18	65.7	81.3	73.50
Johnson High School	District	4,761	$4,761.00	$650.00	77.07	80.97	66.1	81.2	73.65
Pena High School	Charter	962	$962.00	$609.00	83.84	84.04	94.6	95.9	95.25
Rodriguez High School	District	3,999	$3,999.00	$637.00	76.84	80.74	66.4	80.2	73.30
Shelton High School	Charter	1,761	$1,761.00	$600.00	83.36	83.73	93.9	95.9	94.90
Thomas High School	Charter	1,635	$1,635.00	$638.00	83.42	83.85	93.3	97.3	95.30
Wilson High School	Charter	2,283	$2,283.00	$578.00	83.27	83.99	93.9	96.5	95.20
Wright High School	Charter	1,800	$1,800.00	$583.00	83.68	83.96	93.3	96.6	94.95

Top Performing Schools by Passing Rate

# the top schools ( only top 5 )
top_performing_schools= school_summary_df.sort_values("% Overall Passing Rate", ascending=False, inplace=False)
top_performing_schools.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	School Type	Total Students	Total School Budget	Budget Per Student	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Cabrera High School	Charter	1,858	$1,858.00	$582.00	83.06	83.98	94.1	97.0	95.55
Thomas High School	Charter	1,635	$1,635.00	$638.00	83.42	83.85	93.3	97.3	95.30
Griffin High School	Charter	1,468	$1,468.00	$625.00	83.35	83.82	93.4	97.1	95.25
Pena High School	Charter	962	$962.00	$609.00	83.84	84.04	94.6	95.9	95.25
Wilson High School	Charter	2,283	$2,283.00	$578.00	83.27	83.99	93.9	96.5	95.20

Bottom Performing Schools (By Passing Rate)

# only showing bottom 5
Bottom_performing_schools= school_summary_df.sort_values("% Overall Passing Rate", ascending=True, inplace=False)
Bottom_performing_schools.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	School Type	Total Students	Total School Budget	Budget Per Student	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
Rodriguez High School	District	3,999	$3,999.00	$637.00	76.84	80.74	66.4	80.2	73.30
Figueroa High School	District	2,949	$2,949.00	$639.00	76.71	81.16	66.0	80.7	73.35
Huang High School	District	2,917	$2,917.00	$655.00	76.63	81.18	65.7	81.3	73.50
Johnson High School	District	4,761	$4,761.00	$650.00	77.07	80.97	66.1	81.2	73.65
Ford High School	District	2,739	$2,739.00	$644.00	77.10	80.75	68.3	79.3	73.80

Math Scores by Grade

from pandas.api.types import CategoricalDtype
#Reset the grade order in the original students data frame "students_complete_df". 
students_complete_df["grade"] = students_complete_df['grade'].astype(CategoricalDtype(["9th", "10th","11th","12th"]))
# students_complete_df

math_scores_grade = round(students_complete_df.pivot_table(index="school", columns="grade", values="math_score"),2)
math_scores_grade.index.name = None 
math_scores_grade

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

grade	9th	10th	11th	12th
Bailey High School	77.08	77.00	77.52	76.49
Cabrera High School	83.09	83.15	82.77	83.28
Figueroa High School	76.40	76.54	76.88	77.15
Ford High School	77.36	77.67	76.92	76.18
Griffin High School	82.04	84.23	83.84	83.36
Hernandez High School	77.44	77.34	77.14	77.19
Holden High School	83.79	83.43	85.00	82.86
Huang High School	77.03	75.91	76.45	77.23
Johnson High School	77.19	76.69	77.49	76.86
Pena High School	83.63	83.37	84.33	84.12
Rodriguez High School	76.86	76.61	76.40	77.69
Shelton High School	83.42	82.92	83.38	83.78
Thomas High School	83.59	83.09	83.50	83.50
Wilson High School	83.09	83.72	83.20	83.04
Wright High School	83.26	84.01	83.84	83.64

Reading scores by grade

#Reset the grade order in the original students data frame "students_complete_df". 
students_complete_df['grade'] = students_complete_df['grade'].astype(CategoricalDtype(["9th", "10th","11th","12th"]))
# students_complete_df

reading_scores_grade = round(students_complete_df.pivot_table(index="school", columns="grade", values="reading_score"),2)
reading_scores_grade.index.name = None
reading_scores_grade

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

grade	9th	10th	11th	12th
Bailey High School	81.30	80.91	80.95	80.91
Cabrera High School	83.68	84.25	83.79	84.29
Figueroa High School	81.20	81.41	80.64	81.38
Ford High School	80.63	81.26	80.40	80.66
Griffin High School	83.37	83.71	84.29	84.01
Hernandez High School	80.87	80.66	81.40	80.86
Holden High School	83.68	83.32	83.82	84.70
Huang High School	81.29	81.51	81.42	80.31
Johnson High School	81.26	80.77	80.62	81.23
Pena High School	83.81	83.61	84.34	84.59
Rodriguez High School	80.99	80.63	80.86	80.38
Shelton High School	84.12	83.44	84.37	82.78
Thomas High School	83.73	84.25	83.59	83.83
Wilson High School	83.94	84.02	83.76	84.32
Wright High School	83.83	83.81	84.16	84.07

Scores by School Spending

# Create the bins ( school spending )
spending_bins = [0, 585, 615, 645, 675]

# Names for the bins
spending_range = ["<$585", "$585-615", "$615-645", "$645-675"]

#schools_spending_df = school_summary_df
school_summary_df["Spending Ranges(Per Student)"] = pd.cut(budget_per_student, bins=spending_bins, labels=spending_range)
school_summary_df.head()

spending_math_score = school_summary_df.groupby(["Spending Ranges(Per Student)"])["Average Math Score"].mean()
spending_reading_score = school_summary_df.groupby(["Spending Ranges(Per Student)"])["Average Reading Score"].mean()
spending_passing_math = school_summary_df.groupby(["Spending Ranges(Per Student)"])["% Passing Math"].mean()
spending_passing_reading = school_summary_df.groupby(["Spending Ranges(Per Student)"])['% Passing Reading'].mean()
overall_passing_rate = (spending_passing_math + spending_passing_reading)/2

# Creating dataframe
scores_by_spending = pd.DataFrame ({"Average Math Score": spending_math_score,
    "Average Reading Score": spending_reading_score,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "Overall Passing Rate": overall_passing_rate})
scores_by_spending

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	% Passing Math	% Passing Reading	Average Math Score	Average Reading Score	Overall Passing Rate
Spending Ranges(Per Student)
<$585	93.450000	96.600000	83.452500	83.935000	95.025000
$585-615	94.250000	95.900000	83.600000	83.885000	95.075000
$615-645	75.683333	86.083333	79.078333	81.891667	80.883333
$645-675	66.200000	81.133333	76.996667	81.026667	73.666667

Scores by Schools Size

# Create the bins ( school size)
size_bins = [0, 1000, 2000, 5000]

# Names for the bins
group_names = ["small (<1000)", "Medium(1000-2000)", "Large(2000-5000)"]

school_size_df = school_summary_df
school_size_df["School Size"] = pd.cut(student_count, bins=size_bins, labels=group_names)
school_size_df

spending_math_score = school_size_df.groupby(["School Size"])["Average Math Score"].mean()
spending_reading_score = school_size_df.groupby(["School Size"])["Average Reading Score"].mean()
spending_passing_math = school_size_df.groupby(["School Size"])["% Passing Math"].mean()
spending_passing_reading = school_size_df.groupby(["School Size"])['% Passing Reading'].mean()
overall_passing_rate = (spending_passing_math + spending_passing_reading)/2

# Creating dataframe
scores_by_size = pd.DataFrame ({"Average Math Score": spending_math_score,
    "Average Reading Score": spending_reading_score,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "Overall Passing Rate": overall_passing_rate})
scores_by_size

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	% Passing Math	% Passing Reading	Average Math Score	Average Reading Score	Overall Passing Rate
School Size
small (<1000)	93.5500	96.10	83.820	83.92500	94.82500
Medium(1000-2000)	93.6000	96.78	83.374	83.86800	95.19000
Large(2000-5000)	69.9875	82.75	77.745	81.34375	76.36875

Scores by School Type

#* Repeat the above breakdown, but this time group schools based on school type (Charter vs. District)*
scores_school_type = school_summary_df[["School Type","Average Math Score",
                                        "Average Reading Score", 
                                        "% Passing Math", 
                                        "% Passing Reading", 
                                        "% Overall Passing Rate"]]
scores_school_type = scores_school_type.groupby('School Type').mean()
scores_school_type

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	Average Math Score	Average Reading Score	% Passing Math	% Passing Reading	% Overall Passing Rate
School Type
Charter	83.472500	83.897500	93.625000	96.575000	95.100000
District	76.955714	80.965714	66.571429	80.785714	73.678571

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
PyCityAnalysis.ipynb		PyCityAnalysis.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PythonCity School Analysis

District Summary

School Summary

Top Performing Schools by Passing Rate

Bottom Performing Schools (By Passing Rate)

Math Scores by Grade

Reading scores by grade

Scores by School Spending

Scores by Schools Size

Scores by School Type

About

Releases

Packages

Languages

vst17/Python-Pandas

Folders and files

Latest commit

History

Repository files navigation

PythonCity School Analysis

District Summary

School Summary

Top Performing Schools by Passing Rate

Bottom Performing Schools (By Passing Rate)

Math Scores by Grade

Reading scores by grade

Scores by School Spending

Scores by Schools Size

Scores by School Type

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages