TIMSS 2019 Modeling

Sources

Trends in Math and Science Study (2019)

2019 TIMSS Database

The following analysis uses publicly available data from the TIMSS official website. The program is written to analyze all of the available data, but the scope of this repository is limited to 10 countries for the purposes of storage and run time.

Purpose

Questions of Interest

How does a student's environment at home, in the classroom, and at school affect academic understanding?
Are there specific teacher behaviors that lead to improved understanding in specific disciplines?
What can students and teachers do to improve student academic understanding?

Scope of Purpose

The project will cover a variety of analyses regarding data collected. The project will include:

prediction of student score based on student attitudes and demographics, school characteristics
prediction of student scores based on teacher attitudes and practices
recommendation engine for additional study problems for a given student (or group of students from a school or country)

Methods

Packages

pandas
numpy
pyreadstat
matplotlib
seaborn
glob
random
re
statsmodels
sklearn
xgboost

Data Gathering and Processing Technique

gather and organize data for variety of SPSS files (glob, pandas.read_spss)
gather item information and metadata (pandas.read_excel)
drop unnecessary/irrelevant columns (pandas.drop)
simplify and combine assessment scores for easier application
average student assessment scores to a single number
explore data structure (pandas.info(), pandas.describe())
visualize basic relationships (matplotlib, seaborn)
record initial observation and plan for data preparation
perform data preparation on each dataset
- drop irrelevant information for that specific data (pandas.drop)
- convert datatypes for appropriate analysis (pandas.astype)
- give columns more descriptive names (pandas.rename)
- perform any necessary transformations (pandas.apply)
- average columns into like groups (pandas.groupby)
- join data from other tables when helpful to the analysis (pandas.join)
store processed data (pandas.to_csv)

Data Models

Regression

identifying variables of interest
create dummy variables for string objects (pandas.get_dummies)
split the data for training and testing (sklearn.model_selection.train_test_split)
create and train the linear regression models (statsmodels.api.OLS)
explore relative weights of variables in regression models
create predictions and confidence intervals for test dataset (statsmodels.api.OLS.predict, statsmodels.api.OLS.get_prediction)
evaluate effectiveness of models and confidence intervals
visualize models (seaborn, matplotlib)

Evaluation

Most models accounted for ~50% of the variation in average test scores for the subject of interest (student, school, teacher). This was adequate considering that school and teacher variables were not included in the student regression model; therefore, it would be nearly impossible for one of the models to account for all variables contributing to the achievement of a student, school, or teacher.

All 80% confidence intervals achieved the appropriate accuracy. This means that ranges of assessment scores generated by the model are accurate for approximately 80% of the data. The other 20% of the data are not captured due to being extreme or unlikely results. For example, most of the students score around 500, so it is difficult to generate representative models for students who score closer to 300 or 700 (the relative extremes).

Classification

identifying and gathering variables of interest
split the data for training and testing (sklearn.model_selection.train_test_split)
create base models
- sklearn.ensemble.RandomForestClassifier
- xgboost.XGBClassifier
- sklearn.neighbors.KNeighborsClassifier
set up parameter grids for optimization
fit and optimize base model with grid search of parameters (sklearn.model_selection.GridSearchCV)
display results and accuracy (sklearn.metrics.confusion_matrix)
compare similarity between different models

Evaluation

The relatively vague nature of the classifications of school socio-background left some interpretation issues for the models. For example, there are no clear objective boundaries between a classification of "More Affluent" and "Neither More Affluent nor More Disadvantaged", so it is unclear when the model should jump between those two classifications.

In the end, the optimized XGBoost Classifier had the highest accuracy (54%) and the lowest number of significant errors (misidentifying "More Affluent" as "More Disadvantaged" and vice versa only 22 times out of 236 total test values).

Overall, the model could be improved by including relevant averages for student demographics within each school (immigration and academic language data, home resources) to provide a richer set of inputs on which to base the classification.

Recommendation

gather assessment item information
drop irrelevant information to construct user_item dataframe
create collection of helper functions to:
- identify similar users (find_similar_users)
- gather specific item information (get_item_info)
- gather items assessed by a user (get_user_items)
- create ranked user-to-user recommendations for new items (user_user_recs)
- implement FunkSVD to estimate score on previously unknown items (FunkSVD)
- estimate scores for unassessed items for a user (user_item_predict)
- implement custom sigmoid activation function for smoother predictions (sig)
test recommendation engine

Evaluation

The recommendation system takes into account user assessment history to find users with similar experience. From there, it finds new assessment items from the similar users and ranks them based on prevalence (number of users who assessed the item) and difficulty (average assessment score). The system then selects 10 items to recommend to the user as new study items.

With the additional analysis of FunkSVD and a sigmoid activation function, the recommendations estimate the user's performance on the recommended assessment items. The user can then decide how to best use their resources to prepare for future assessments. A user could focus on reviewing items with high predictions (items with a high chance of success), low predictions (items with a low chance of success), or items with an unclear prediction (items on which a user could have more influence on the outcome).

The recommended items are significant and clearly based on numerical evidence, but the predicted scores are not as reliable. Due to the unpredictable nature of estimating unknowns with a random sample of 99 other students with the FunkSVD algorithm, the predicted scores for the 10 recommended assessment items can vary significantly depending on the sample. For example, in the tests run for comparison, there were 3 items out of 10 that had significantly different predictions. Therefore, users should use the predictions of the recommendation system loosely as they decide which problems to study first.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
T19_G8_Codebooks		T19_G8_Codebooks
T19_G8_Item Information		T19_G8_Item Information
T19_G8_SPSS Data		T19_G8_SPSS Data
data		data
.gitignore		.gitignore
README.md		README.md
TIMSS Exploration.ipynb		TIMSS Exploration.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIMSS 2019 Modeling

Sources

Trends in Math and Science Study (2019)

Purpose

Questions of Interest

Scope of Purpose

Methods

Packages

Data Gathering and Processing Technique

Data Models

Evaluation

Evaluation

Evaluation

About

Releases

Packages

Languages

sjhaluck/TIMSS-Data-Science

Folders and files

Latest commit

History

Repository files navigation

TIMSS 2019 Modeling

Sources

Trends in Math and Science Study (2019)

Purpose

Questions of Interest

Scope of Purpose

Methods

Packages

Data Gathering and Processing Technique

Data Models

Evaluation

Evaluation

Evaluation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages