Toolkit for Op.GG data mining, including crawling pages. The project have educational propose: You might be better of using Riot API than reusing data from aggregator websites like Op.GG.
- Install Node (LTS should be fine). Make sure they are accessible via
PATH
environment variable. - Clone the repository, then navigate with command prompt to the project root directory.
- Use
npm install
to install dependencies. - Run it (multiple options):
- You can use
npm run cli:ts
to run it as Typescript (ts-node
mode), passing params should look like:npm run cli:ts --- --help
. - You can compile it (to JavaScript) by running
npm run build
once, then you can usenpm run cli:js
in similar fashion as above.
- You can use
- (Optional) Use
npm link
(with admin privileges) to make the tool available asopgg --help
.
# To collect games for certain user, outputs `games.json`
opgg history euw Azzapp
# To collect data for all users (infinite process), see `cache` folder; stop with Ctrl+C
opgg spider euw Azzapp
# and to continue after crash/stopping
opgg spider continue
- Fix wiki scrapper:
- Shen & Kennen names are bugged/empty
- Nunu is named differently in OpGG static data
- progress bars
- handle URLs
- regex:
/(?:(\w+)\.)?op\.gg\/summoners?\/(?:(\w+)\/)?(?:userName=)?([^?#\/\s]*)/i
handles well:op.gg/summoners/euw/Azzapp
https://www.op.gg/summoners/euw/Azzapp
https://euw.op.gg/summoner/userName=AgainPsychoX
https://www.op.gg/summoners/euw/Azzapp/matches/ewOhykeZdeeskvBSovvxqie5BuF8-a1Z515jCKtAw2I%3D/1686681922000
- regex:
- distribute work over multiple proxies to avoid 429 Too Many Requests
- ...