Building a Knowledge Graph about Games, Requirements and Purchase source!
Project Demo: The youtube video about our project demo can be found here: https://www.youtube.com/watch?v=wa_C4xqBmjo
Project Presentation Slides: The project presentation slides can be found here: Slides
Project Report: The project report can be found here: Report
The Games Knowledge Graph that we build looks something like this:
We crawled different types of information from multiple data sources as listed below,
- Games information was crawled from IGDB.com
- Information about the system specifications required to play a particular game, the cheapest purchase source was crawled from G2A.com
- The details about all the CPU's and GPU's was crawled from Techpowerup.com
- The baseline information about the performance scores for the CPU and GPU was crawled from Passmark.com
The code for all these crawling tasks can be found here
We performed multiple entity resolution tasks as listed below,
- The first entity resolution task that we handled was mapping the games crawled from IGDB to the games crawled from G2A. This mapping was necessary to enrich the games with information like the device specifications, cheapest purchase source. Code can be found here
- The second entity linking task that we did was to map the CPU and GPU information from G2A to the CPU, and GPU information crawled from Techpowerup. The entity linking code for the CPU can be found here. Code for GPU linking can be found here
- Code for linking the CPU information from Techpowerup to get the benchmark score from Passmark can be here. Similar code for the GPU can be found here
We designed our ontology. We identified seven classes. We inherited some of the classes and properties from schema.org and customized others according to our needs. The entire ontology file can be found here
Since CPUs and GPUs have many features (both numerical and categorical), it is hard to implement a comparison function manually. We found benchmark performance scores for some of the CPUs and GPUs online at https://www.cpubenchmark.net and https://www.videocardbenchmark.net, respectively. We used this as the ground truth data and trained our Random Forest Regressor model to predict the benchmark score for both CPUs and GPUs.
The most important GPU features that determines the G3D mark score are as follows,
- Pixel Shader
- ROPs
- GPU Clock Speed
- Memory Clock Speed
- TMUs
The most important CPU features that determines the CPU mark score are as follows,
- Process
- TDP
- Socket
- Number of Cores
- Clock Speed
The code snippet for building the KG can be found here and example queries we used for querying our KG can be found here.
We evaluated the Entity resolution tasks by manually labelling 100 random samples for each of the tasks and used them to evaluate. The metrics for the ER tasks are mentioned in Table 1 below,
We evaluated our CPU and GPU comparison models using two types of tasks. The first one is a regression task where we predict the benchmark score for a given CPU (or) GPU, and this is evaluated using the R2 metric. The second one is a classification task where we classify the given pair of CPUs (or) GPUs into two classes. The first class indicates if the first one is better than or similar to the second one and the second class indicates if the first one is not better than the second one. This task is evaluated using the F1 measure. The results can be found in Table 2 and Table 3 below,
We compute the embeddings for each of the game nodes using the fastText pre-trained embeddings as shown in the below figure,
- First, we calculate the embeddings for the game's components like name, description, genre, theme, and game mode.
- Then, the game node embedding is created by the weighted average of the individual game components' embeddings. The weights were determined heuristically to build a game recommendation system.
We build a personalized game recommendation system using the game node embeddings.
For a given source game and a user device, to recommend the top-5 similar games, we follow the steps below,
- First, we filter only the games with rating >= 80 among all the games (except the source game).
- We then apply a second filter to retain only the games that the user can play on his device (i.e., the game works on the user device)
- Finally, we rank those filtered games by the cosine similarity score between their game node embedding and the source game node embedding and display the top-5 recommendations.
We created a Flask web application connected to the Apache Jena Graph database and used SPARQLWrapper for querying. The logical flow of using our system is as follows:
- First, the user should enter his device configuration, and the system at the backend would map the processor and graphics to a score, respectively.
- Then, the user can search our knowledge base using 14 attributes and can also filter games supported by his system. The Game page displays all the attributes of the game and has a link to the cheapest seller for that game. It also has links to the Top-5 recommended game pages.
- The user can also use the visualization page to visualize the various properties of the Game class and other classes using Plotly.js.
The user interface of our web application is shown in the below figures,