This project is a Chrome Extension similar to Selenium IDE designed to automate web scraping processes.
This project is a work in progress and is part of my Thesis "Herramienta Interactiva Para Automatizar Los Procesos De Extraccion De Información Web".
I, unfortunately, was not able to include the part of Selenium IDE that allowed me to interact with the page alone without expending too much time trying to understand how it all worked, I discovered that for the browser extension they tried to reuse some code from the main Selenium project and they were importing that code with something called closure-loader
which made things for me quite difficult setup.
Either way, I didn't want to get my project bloated, so I thought it might be better to implement the interactions by myself. It might be a good idea to create a library to do just that!
You should have NODE
and yarn
installed.
Once you have those installed just install the dependencies doing yarn install
.
To build the project run yarn run build
.
To install the extension go to your Chrome Extensions tab and click "Load unpacked" and select the dist
folder that was generated on the build step.
- Improve commands UX to remove
- Switch to manifest v3
- Improve webpack config file
- Migrate project to TypeScript
- Allow to drag and drop commands
- Add property on command to set the name of the data being extracted
- Add reducers to change command status
- Add visual clues to understand the extraction is running
- Add play, pause and stop functionality
- Show errors on UI
- Add command autocomplete
- Add preview result
- Add way to run single command
- Add recipes to Redux state
- Change Command Panel to be Recipe panel
- Create default test recipe
- Redesign commnad parameters to be only on one component
- Fix current tab not refreshing
- Add export feature
- Add recipe url inputs
- Add button to clear output
- Add CSV support to export
- Show field name on command list
- Add import / export recipe
- Write tests
- Improve select element on document
- Allow to add more recipes
- Show matches count
- Transform extracted data (Compute url relative paths)
- Implement ability detect a selector shared in multiple elements
- Highlight elements being selected by query
- Show sample of data extracted below selector parameters
- Changed current tab select mechanism
- Allow commands to be skipped
- Add button to add current URL to recipe inputs