Skip to content

xionggj001/EECS731Project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

EECS731Project2

Analyzing Shakespeare Play data set downloaded from https://www.kaggle.com/kingburrito666/shakespeare-plays.

Part 1

The first part mainly focuses on analyzing the data set by pandas. We count the number of different plays, the number of players in each play, the number of playlines of each player in each play and etc.

Part 2

The second part focuses on the classification of player based on the features in other columns. We propose four diffirent models to do this classification task.

Challenge and idea

There is a challenge for this task is the string data in most columns of this data set. So we first convert those string data into int data. We propose to transfer the playline into the number of words and average number of characters in each word for each player.

Classification Models

The first model is decision tree. The second model is random forrest. The third model is support vector machine. The fourth model is non-bayesian regressor.

Conclusion

As the simulation shows, random forrest achieves the best performance at a classification rate of 0.65 and non-bayesian regressor achieves the worst performance at a classification rate of 0.05. We also found that support vector machine with linear kernel cannot be applied to multi-group classification problme.

For the details, please refer to the code.

About

Analyzing Shakespeare Play data set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published