One of the ways to compare movies and understand which one is a better choice for you, is through websites with this purpose and using appropriate information retrieval methods.
In this phase of project, we begin our journey towards building an information retrieval system for IMDb website. In this phase, we crawl the required datas from IMDb and do some preprocessing on them. IMDb has one of the reachest datasets of movies (with their ratings, comments, actors and etc.).
You should make a private repository in your personal github account in order to push your answers and also, for TAs being able to track your work. Please choose Use this Template
in this repository and choose Create a new repository
. Make sure you make your repository private.
In order to be able to get the new changes and files from our main repository into your own repository, you should add this repository as a remote:
git remote add template [URL of the template repo]
and then, you can simply run git fetch
to update the changes whenever you want:
git fetch --all
The project contains 2 main modules: Logic and UI. The Logic
module is responsible for doing the main tasks of the project and the UI
module is responsible for providing a user interface for the user to interact with the system. In each task, you will be told to implement a part or a whole file in one of these modules. Please read the comments for each file and functions inside it to understand what you need to do.
You can find raw crawled data for IMDB movies (which you should've done in Phase 1) here.
Please create a new issue whenever you find a problem ir you had any suggestions regarding this project. Also, you can create PRs for issues with "student" label.
The project contains 2 main modules: Logic and UI. The Logic
module is responsible for doing the main tasks of the project and the UI
module is responsible for providing a user interface for the user to interact with the system. In each task, you will be told to implement a part or a whole file in one of these modules. Please read the comments for each file and functions inside it to understand what you need to do.
- first of all it is necessary to download all files. to download them follow these instructions
- find file links in this path 'Logic/tests/file links'
- then go download all links in the file
- then place each file in its proper place that mentioned in the text file
- after first step, now run the python file test_phase1.py to test everything in the first phase
- also to run the UI for first phase pls run these two command in the terminal
$env:PYTHONPATH += ";E:\MIR\Project\MIR-Project"
streamlit run UI\main.py