Objective:
- Prediction of the winner of an international matches Prediction results are "Win / Lose / Draw" or "goal difference"
- Apply the model to predict the result of FIFA world cup 2018.
Data: Data are assembled from multiple sources, most of them are from Kaggle, others come from FIFA website / EA games.
Feature Engineering: To determine who will more likely to win a match, based on my knowledge, I come up with 4 main groups of features as follows:
- head-to-head match history between 2 teams
- recent performance of each team (10 recent matches), aka "form"
- bet-ratio before matches
- squad strength (from FIFA video game)
Feature list reflects those factors.
Lifecycle
Check the Full Report to gain more insight about this Project. The report contains:
- Exploratory Data Analysis: Investigate correlations, importance of features to results, hypothesis interesting
- Methodology: How I carried out this project, which experiments I did.
- Models: baseline model, logistic regression, random forest, gradient boosting tree, ADA boost tree, Neural Network.
- Evaluation Criteria: F1, 10-fold cross validation accuracy
- Results and Conclusion
- EDA: Data Exploratory Analysis
- LE: saved model for Label Encoder
- data: completed dataset
- save_model: saved Machine Learning model after training
The dataset are from all international matches from 2000 - 2018, results, bet odds, ranking, squad strengths
- FIFA World Cup 2018
- International match 1872 - 2018
- FIFA Ranking through Time
- Bet Odd
- Bet Odd 2
- Squad Strength - Sofia
- Squad Strength - FIFA index
- *difference: team1 - team2
- *form: performance in 10 recent matches
Feature Name | Description | Source |
---|---|---|
team_1 | Nation Code (e.g US, NZ) | 1 & 2 |
team_2 | Nation Code (e.g US, NZ) | 1 & 2 |
date | Date of match yyyy - mm - dd | 1 & 2 |
tournament | Friendly,EURO, AFC, FIFA WC | 1 & 2 |
h_win_diff | Head2Head: win difference | 2 |
h_draw | Head2Head: number of draw | 2 |
form_diff_goalF | Form: difference in "Goal For" | 2 |
form_diff_goalA | Form: difference in "Goal Against" | 2 |
form_diff_win | Form: difference in number of win | 2 |
form_diff_draw | Form: difference in number of draw | 2 |
odd_diff_win | Betting Odd: difference bet rate for win | 4 & 5 |
odd_draw | Betting Odd: bet rate for draw | 4 & 5 |
game_diff_rank | Squad Strength: difference in FIFA Rank | 3 |
game_diff_ovr | Squad Strength: difference in Overall Strength | 6 |
game_diff_attk | Squad Strength: difference in Attack Strength | 6 |
game_diff_mid | Squad Strength: difference in Midfield Strength | 6 |
game_diff_def | Squad Strength: difference in Defense Strength | 6 |
game_diff_prestige | Squad Strength: difference in prestige | 6 |
game_diff_age11 | Squad Strength: difference in age of 11 starting players | 6 |
game_diff_ageAll | Squad Strength: difference in age of all players | 6 |
game_diff_bup_speed | Squad Strength: difference in Build Up Play Speed | 6 |
game_diff_bup_pass | Squad Strength: difference in Build Up Play Passing | 6 |
game_diff_cc_pass | Squad Strength: difference in Chance Creation Passing | 6 |
game_diff_cc_cross | Squad Strength: difference in Chance Creation Crossing | 6 |
game_diff_cc_shoot | Squad Strength: difference in Chance Creation Shooting | 6 |
game_diff_def_press | Squad Strength: difference in Defense Pressure | 6 |
game_diff_def_aggr | Squad Strength: difference in Defense Aggression | 6 |
game_diff_def_teamwidth | Squad Strength: difference in Defense Team Width | 6 |
python experiment1-W-D-L.py
python experiment2-GoalDiff.py
python experiment3-WorldCup.py
- A machine learning framework for sport result prediction
- t-test definition
- Confusion Matrix Multi-Label example
- Precision-Recall Multi-Label example
- ROC curve example
- Model evaluation
- Tuning the hyper-parameters of an estimator
- Validation curves
- Understand Bet odd format
- EURO 2016 bet odd
Complete
- Add prediction for Matchday 2
- Add feature Importance
- Add feature of squad and player info
- Build a web crawler for Squad each team
- Build a web crawler for FIFA game player
- Add a simple classification based on "bet odd".
- Add feature group 1
- Add h_win_diff, h_draw
- Add rank_diff, title_diff
- Add features group 2
- Add features group 3
- Simple EDA and a small story
- Add features group 4
- Prepare framework for running classifiers
- Add evaluation metrics and plot
- Add accuracy, precision, recall, F1
- Add ROC curves
- Build a data without player rating and squad value
- Generate data and preform prediction for EURO 2016, ok now my story is more interesting
- Create more data, "teamA vs teamB -> win" is equivalent to "teamB vs teamA -> lose"