EECS731Project2

Analyzing Shakespeare Play data set downloaded from https://www.kaggle.com/kingburrito666/shakespeare-plays.

Part 1

The first part mainly focuses on analyzing the data set by pandas. We count the number of different plays, the number of players in each play, the number of playlines of each player in each play and etc.

Part 2

The second part focuses on the classification of player based on the features in other columns. We propose four diffirent models to do this classification task.

Challenge and idea

There is a challenge for this task is the string data in most columns of this data set. So we first convert those string data into int data. We propose to transfer the playline into the number of words and average number of characters in each word for each player.

Classification Models

The first model is decision tree. The second model is random forrest. The third model is support vector machine. The fourth model is non-bayesian regressor.

Conclusion

As the simulation shows, random forrest achieves the best performance at a classification rate of 0.65 and non-bayesian regressor achieves the worst performance at a classification rate of 0.05. We also found that support vector machine with linear kernel cannot be applied to multi-group classification problme.

For the details, please refer to the code.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
Shakespeare_data.csv		Shakespeare_data.csv
Shakespeares.ipynb		Shakespeares.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EECS731Project2

Part 1

Part 2

Challenge and idea

Classification Models

Conclusion

About

Releases

Packages

Languages

xionggj001/EECS731Project2

Folders and files

Latest commit

History

Repository files navigation

EECS731Project2

Part 1

Part 2

Challenge and idea

Classification Models

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages