Skip to content

Latest commit

 

History

History
91 lines (63 loc) · 3.34 KB

README.md

File metadata and controls

91 lines (63 loc) · 3.34 KB

scientific-purpose-harvester

Crawl Scientific Webpages for relevant papers. The easisets way to start your journey in the scientific jungle.

  • The SPH Team

You can try the SPH here.

This service / repo is build for only educational purposes!

The Scientific-Purpose-Harvester (SPH) aims to build a dynamic possibility of crawling scientific webpages for relevant content, matching your questions.

Landing-Page

Vision

Check Google Scholar for the best scientific results for your question, with the help of an easy-to-use Graphical User Interface.
In the long run we might connect additional data sources like https://dblp.uni-trier.de/ (conferences not yet peer reviewed...).

How to start

Video Introduction

Quickly start with a Video Tutorial for the SPH! (Click Image to go to Youtube)

Introduction

Online

The easiest way to access the SPH.

Simply open SPH, hosted by an SPH-Teammember. This Website uses the svelte-Version of the SPH.

Offline (Localy)

  1. Clone the Repro
git clone https://github.com/SimonScapan/scientific-purpose-harvester.git
  1. Navigate into harvester
cd harvester
  1. Uncomment Lines 22-24 in api.py
  2. Start the api.py to start the harvester
python api.py
  1. Open Local Website in your Browser
  2. Shutdown Local Website with Using CTRL+C in your terminal

How to use

  1. Enter your question Question
  2. Hit the search button and wait for results.
  3. Get a quick Overview of the best scientific papers for your questions. Follow a Link to get directly to the paper. Result

Used technology / Interesting Facts

  • Scraper-API allows us to crawl Google Scholar (or other Websites) without getting blacklisted.
    • A Free Plan of ScraperAPI is used. It allows 1000 free Requests per Month
    • If there is a Problem with the used API Key
      • Get your own free API-Key on the ScraperAPI Website
      • Replace the given API-Key with your personal API-Key in the harvester_scholar.py file (Line 34)
  • Svelte allows us to use python file within the website
  • The Papers are ranked by citation count

Future Extensions

Here are some Idead for future extions. Feel free to fork this Project and add some of these, or your own Ideas!

  • Free Text based NLP Training --> Q&A Pair generation --> Feed Fancy Flash Cards with Content
  • Build a Network of the cited articles. Who cited who? Where are conections?
  • Build an Integration to some more Scientific Search Engines, like IEEE, arxiv, ...
  • Generate with the Help of NLP abstracts for each Paper

Thank you

Thank you for using the SPH. If you have questions, feel free to reach out for the SPH-Team: