Skip to content

PyCity Schools is a project that consists using Jupiter notebook to analyze large data.

Notifications You must be signed in to change notification settings

vst17/Python-Pandas

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PythonCity School Analysis

  1. Schools that spend between 585-615 dollars per student have a better percent passing reading, percent passing math, and percent overall passing rate than schools that spend more than 615 dollars per student. Also, those schools that spend less per student are smaller schools with 2,000 students or less.
  2. It is also found that the schools with 2000 students or less have a higher percent passing reading, percent passing math, and percent overall passing rate than larger schools that have 2000 – 5000 students. Furthermore, the smaller the school, the less they spend per student.
  3. With that being said,Charter schools have a higher percent passing reading, percent passing math, and percent overall passing rate than District schools. Thus being the top five performing schools based on percent passing rates, whereas the bottom five performing schools based on percent passing rate are District schools.
  4. In all 15 schools, the Average Reading Score is higher than the Average Math Score.
  5. Lasly, for the top 5 performing schools, the percent passing math is about 20 percent lower than the percent passing reading.
# Dependencies
import pandas as pd
import numpy as np
# The path to our csv file
schools_complete = "Resources/schools_complete.csv"
students_complete = "Resources/students_complete.csv"
# create dataframe using mapping
schools_complete_df = pd.read_csv(schools_complete)
students_complete_df = pd.read_csv(students_complete)
schools_complete_df.head()
students_complete_df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Student ID name gender grade school reading_score math_score
0 0 Paul Bradley M 9th Huang High School 66 79
1 1 Victor Smith M 12th Huang High School 94 61
2 2 Kevin Rodriguez M 12th Huang High School 90 60
3 3 Dr. Richard Scott M 12th Huang High School 67 58
4 4 Bonnie Ray F 9th Huang High School 97 84

District Summary

# finding total schools
total_schools = len(schools_complete_df['School ID'])

# The total number of students
total_students = schools_complete_df['size'].sum()

# Get the total budget
total_budget = schools_complete_df['budget'].sum()

# Get the Average Math Score
average_math_score = round(students_complete_df['math_score'].mean(),6)

# Get the Average Reading Score
average_reading_score = round(students_complete_df['reading_score'].mean(),5)

# Get the percentage of students passing math
students_passing_math = students_complete_df.loc[students_complete_df["math_score"] > 70,:]
percent_passing_math = round(float(students_passing_math['math_score'].count()/total_students)*100,6)

# Get the percentage of students passing reading
students_passing_reading = students_complete_df.loc[students_complete_df["reading_score"] >70,:]
percent_passing_reading = round(float(students_passing_reading["reading_score"].count()/total_students)*100,6)

# Get the overall passing rate for math and reading
overall_passing_rate =round((percent_passing_math + percent_passing_reading)/2,6)

# Create a disctric Summary Dataframe 
summary_df = pd.DataFrame({"Total Schools":[total_schools],
                          "Total Students":[total_students],
                          "Total Budget":[total_budget],
                          "Average Math Score":[average_math_score],
                          "Average Reading Score":[average_reading_score],
                          "Percent Passing Math":[percent_passing_math],
                          "Percent Passing Reading":[percent_passing_reading],
                          "Percent Overall Passing Rate":[overall_passing_rate]})
# summary_df
# to put it in the right format
district_summary_df = pd.DataFrame(summary_df, columns=["Total Schools",
                                                        "Total Students",
                                                        "Total Budget",
                                                        "Average Math Score",
                                                        "Average Reading Score",
                                                        "Percent Passing Math",
                                                        "Percent Passing Reading",
                                                       "Percent Overall Passing Rate"])
district_summary_df["Total Students"] = district_summary_df["Total Students"].map('{:,}'.format)
district_summary_df["Total Budget"] = district_summary_df["Total Budget"].map('${:,.2f}'.format)

district_summary_df
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Total Schools Total Students Total Budget Average Math Score Average Reading Score Percent Passing Math Percent Passing Reading Percent Overall Passing Rate
0 15 39,170 $24,649,428.00 78.985371 81.87784 72.392137 82.971662 77.6819

School Summary

# Change header name from the dataframe from name to school
schools_complete_df = schools_complete_df.rename(columns={"name": "school"})
# schools_complete_df.columns
# Merge the two data frames together on = school 
schools_complete_data_df = pd.merge(schools_complete_df, students_complete_df, on='school')
#testing
# schools_complete_data_df.head()
# count number of students in each school
student_count = schools_complete_data_df['school'].value_counts()
# testing
# student_count.head()
# changing the school type into a str
school_type = schools_complete_data_df.groupby('school')['type'].unique()
school_type = school_type.str[0]
#testing
# school_type
# total budget for each school
budget_per_school = schools_complete_data_df.groupby('school')['budget'].count()
# budget_per_school.head()
# testing
# budget_per_school
# budget for each student
budget_per_student = schools_complete_df.set_index('school')['budget']/schools_complete_df.set_index('school')['size']
# budget_per_student.head()
# testing
# budget_per_student
# Average math and readin scores for each school
school_average_math = round(schools_complete_data_df.groupby('school')['math_score'].mean(),2)
school_average_reading = round(schools_complete_data_df.groupby('school')['reading_score'].mean(),2)
# dataframe with reading and math passing only scores
passing_df = schools_complete_data_df.loc[(schools_complete_data_df['math_score'] >=70) 
                                          & (schools_complete_data_df['reading_score'] >=70)]
passing_math_df = schools_complete_data_df.loc[(schools_complete_data_df['math_score'] >=70)]
passing_reading_df = schools_complete_data_df.loc[(schools_complete_data_df['reading_score'] >=70)]

# testing
# passing_math_df.head()
# passing_reading_df.head()
# percentage for students passing math and reading 
percent_passing_math = round((passing_math_df.groupby('school')['math_score'].count()/student_count)*100,1)
percent_passing_reading = round((passing_reading_df.groupby('school')['reading_score'].count()/student_count)*100,1)
# percent_passing_reading
# percent_passing_math
# overall passing percentage
overall_passing_percent = round((percent_passing_math + percent_passing_reading)/2, 2)
# overall_passing_percent
# school summary dataframe

school_summary_df = pd.DataFrame({"School Type":school_type,
                                "Total Students":student_count,
                                "Total School Budget":budget_per_school,
                                "Budget Per Student":budget_per_student,
                                "Average Math Score": school_average_math,
                                "Average Reading Score": school_average_reading,
                                "% Passing Math": percent_passing_math,
                                "% Passing Reading": percent_passing_reading,
                                "% Overall Passing Rate":overall_passing_percent},columns=["School Type",
                                                        "Total Students",
                                                        "Total School Budget",
                                                        "Budget Per Student",
                                                        "Average Math Score",
                                                        "Average Reading Score",
                                                        "% Passing Math",
                                                       "% Passing Reading",
                                                        "% Overall Passing Rate"])
#testing
# school_summary_df.head()
school_summary_df["Total Students"] = school_summary_df["Total Students"].map('{:,}'.format)
school_summary_df["Total School Budget"] = school_summary_df["Total School Budget"].map('${:,.2f}'.format)
school_summary_df["Budget Per Student"] = school_summary_df["Budget Per Student"].map('${:,.2f}'.format)
school_summary_df
 
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
School Type Total Students Total School Budget Budget Per Student Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Bailey High School District 4,976 $4,976.00 $628.00 77.05 81.03 66.7 81.9 74.30
Cabrera High School Charter 1,858 $1,858.00 $582.00 83.06 83.98 94.1 97.0 95.55
Figueroa High School District 2,949 $2,949.00 $639.00 76.71 81.16 66.0 80.7 73.35
Ford High School District 2,739 $2,739.00 $644.00 77.10 80.75 68.3 79.3 73.80
Griffin High School Charter 1,468 $1,468.00 $625.00 83.35 83.82 93.4 97.1 95.25
Hernandez High School District 4,635 $4,635.00 $652.00 77.29 80.93 66.8 80.9 73.85
Holden High School Charter 427 $427.00 $581.00 83.80 83.81 92.5 96.3 94.40
Huang High School District 2,917 $2,917.00 $655.00 76.63 81.18 65.7 81.3 73.50
Johnson High School District 4,761 $4,761.00 $650.00 77.07 80.97 66.1 81.2 73.65
Pena High School Charter 962 $962.00 $609.00 83.84 84.04 94.6 95.9 95.25
Rodriguez High School District 3,999 $3,999.00 $637.00 76.84 80.74 66.4 80.2 73.30
Shelton High School Charter 1,761 $1,761.00 $600.00 83.36 83.73 93.9 95.9 94.90
Thomas High School Charter 1,635 $1,635.00 $638.00 83.42 83.85 93.3 97.3 95.30
Wilson High School Charter 2,283 $2,283.00 $578.00 83.27 83.99 93.9 96.5 95.20
Wright High School Charter 1,800 $1,800.00 $583.00 83.68 83.96 93.3 96.6 94.95

Top Performing Schools by Passing Rate

# the top schools ( only top 5 )
top_performing_schools= school_summary_df.sort_values("% Overall Passing Rate", ascending=False, inplace=False)
top_performing_schools.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
School Type Total Students Total School Budget Budget Per Student Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Cabrera High School Charter 1,858 $1,858.00 $582.00 83.06 83.98 94.1 97.0 95.55
Thomas High School Charter 1,635 $1,635.00 $638.00 83.42 83.85 93.3 97.3 95.30
Griffin High School Charter 1,468 $1,468.00 $625.00 83.35 83.82 93.4 97.1 95.25
Pena High School Charter 962 $962.00 $609.00 83.84 84.04 94.6 95.9 95.25
Wilson High School Charter 2,283 $2,283.00 $578.00 83.27 83.99 93.9 96.5 95.20

Bottom Performing Schools (By Passing Rate)

# only showing bottom 5
Bottom_performing_schools= school_summary_df.sort_values("% Overall Passing Rate", ascending=True, inplace=False)
Bottom_performing_schools.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
School Type Total Students Total School Budget Budget Per Student Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
Rodriguez High School District 3,999 $3,999.00 $637.00 76.84 80.74 66.4 80.2 73.30
Figueroa High School District 2,949 $2,949.00 $639.00 76.71 81.16 66.0 80.7 73.35
Huang High School District 2,917 $2,917.00 $655.00 76.63 81.18 65.7 81.3 73.50
Johnson High School District 4,761 $4,761.00 $650.00 77.07 80.97 66.1 81.2 73.65
Ford High School District 2,739 $2,739.00 $644.00 77.10 80.75 68.3 79.3 73.80

Math Scores by Grade

from pandas.api.types import CategoricalDtype
#Reset the grade order in the original students data frame "students_complete_df". 
students_complete_df["grade"] = students_complete_df['grade'].astype(CategoricalDtype(["9th", "10th","11th","12th"]))
# students_complete_df

math_scores_grade = round(students_complete_df.pivot_table(index="school", columns="grade", values="math_score"),2)
math_scores_grade.index.name = None 
math_scores_grade
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
grade 9th 10th 11th 12th
Bailey High School 77.08 77.00 77.52 76.49
Cabrera High School 83.09 83.15 82.77 83.28
Figueroa High School 76.40 76.54 76.88 77.15
Ford High School 77.36 77.67 76.92 76.18
Griffin High School 82.04 84.23 83.84 83.36
Hernandez High School 77.44 77.34 77.14 77.19
Holden High School 83.79 83.43 85.00 82.86
Huang High School 77.03 75.91 76.45 77.23
Johnson High School 77.19 76.69 77.49 76.86
Pena High School 83.63 83.37 84.33 84.12
Rodriguez High School 76.86 76.61 76.40 77.69
Shelton High School 83.42 82.92 83.38 83.78
Thomas High School 83.59 83.09 83.50 83.50
Wilson High School 83.09 83.72 83.20 83.04
Wright High School 83.26 84.01 83.84 83.64

Reading scores by grade

#Reset the grade order in the original students data frame "students_complete_df". 
students_complete_df['grade'] = students_complete_df['grade'].astype(CategoricalDtype(["9th", "10th","11th","12th"]))
# students_complete_df

reading_scores_grade = round(students_complete_df.pivot_table(index="school", columns="grade", values="reading_score"),2)
reading_scores_grade.index.name = None
reading_scores_grade
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
grade 9th 10th 11th 12th
Bailey High School 81.30 80.91 80.95 80.91
Cabrera High School 83.68 84.25 83.79 84.29
Figueroa High School 81.20 81.41 80.64 81.38
Ford High School 80.63 81.26 80.40 80.66
Griffin High School 83.37 83.71 84.29 84.01
Hernandez High School 80.87 80.66 81.40 80.86
Holden High School 83.68 83.32 83.82 84.70
Huang High School 81.29 81.51 81.42 80.31
Johnson High School 81.26 80.77 80.62 81.23
Pena High School 83.81 83.61 84.34 84.59
Rodriguez High School 80.99 80.63 80.86 80.38
Shelton High School 84.12 83.44 84.37 82.78
Thomas High School 83.73 84.25 83.59 83.83
Wilson High School 83.94 84.02 83.76 84.32
Wright High School 83.83 83.81 84.16 84.07

Scores by School Spending

# Create the bins ( school spending )
spending_bins = [0, 585, 615, 645, 675]

# Names for the bins
spending_range = ["<$585", "$585-615", "$615-645", "$645-675"]

#schools_spending_df = school_summary_df
school_summary_df["Spending Ranges(Per Student)"] = pd.cut(budget_per_student, bins=spending_bins, labels=spending_range)
school_summary_df.head()

spending_math_score = school_summary_df.groupby(["Spending Ranges(Per Student)"])["Average Math Score"].mean()
spending_reading_score = school_summary_df.groupby(["Spending Ranges(Per Student)"])["Average Reading Score"].mean()
spending_passing_math = school_summary_df.groupby(["Spending Ranges(Per Student)"])["% Passing Math"].mean()
spending_passing_reading = school_summary_df.groupby(["Spending Ranges(Per Student)"])['% Passing Reading'].mean()
overall_passing_rate = (spending_passing_math + spending_passing_reading)/2

# Creating dataframe
scores_by_spending = pd.DataFrame ({"Average Math Score": spending_math_score,
    "Average Reading Score": spending_reading_score,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "Overall Passing Rate": overall_passing_rate})
scores_by_spending
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
% Passing Math % Passing Reading Average Math Score Average Reading Score Overall Passing Rate
Spending Ranges(Per Student)
<$585 93.450000 96.600000 83.452500 83.935000 95.025000
$585-615 94.250000 95.900000 83.600000 83.885000 95.075000
$615-645 75.683333 86.083333 79.078333 81.891667 80.883333
$645-675 66.200000 81.133333 76.996667 81.026667 73.666667

Scores by Schools Size

# Create the bins ( school size)
size_bins = [0, 1000, 2000, 5000]

# Names for the bins
group_names = ["small (<1000)", "Medium(1000-2000)", "Large(2000-5000)"]

school_size_df = school_summary_df
school_size_df["School Size"] = pd.cut(student_count, bins=size_bins, labels=group_names)
school_size_df

spending_math_score = school_size_df.groupby(["School Size"])["Average Math Score"].mean()
spending_reading_score = school_size_df.groupby(["School Size"])["Average Reading Score"].mean()
spending_passing_math = school_size_df.groupby(["School Size"])["% Passing Math"].mean()
spending_passing_reading = school_size_df.groupby(["School Size"])['% Passing Reading'].mean()
overall_passing_rate = (spending_passing_math + spending_passing_reading)/2

# Creating dataframe
scores_by_size = pd.DataFrame ({"Average Math Score": spending_math_score,
    "Average Reading Score": spending_reading_score,
    "% Passing Math": spending_passing_math,
    "% Passing Reading": spending_passing_reading,
    "Overall Passing Rate": overall_passing_rate})
scores_by_size
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
% Passing Math % Passing Reading Average Math Score Average Reading Score Overall Passing Rate
School Size
small (<1000) 93.5500 96.10 83.820 83.92500 94.82500
Medium(1000-2000) 93.6000 96.78 83.374 83.86800 95.19000
Large(2000-5000) 69.9875 82.75 77.745 81.34375 76.36875

Scores by School Type

#* Repeat the above breakdown, but this time group schools based on school type (Charter vs. District)*
scores_school_type = school_summary_df[["School Type","Average Math Score",
                                        "Average Reading Score", 
                                        "% Passing Math", 
                                        "% Passing Reading", 
                                        "% Overall Passing Rate"]]
scores_school_type = scores_school_type.groupby('School Type').mean()
scores_school_type
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Average Math Score Average Reading Score % Passing Math % Passing Reading % Overall Passing Rate
School Type
Charter 83.472500 83.897500 93.625000 96.575000 95.100000
District 76.955714 80.965714 66.571429 80.785714 73.678571

About

PyCity Schools is a project that consists using Jupiter notebook to analyze large data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published