Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to resolve "./scraperDrivers/allDrivers.js" #1

Closed
rosevinnur opened this issue Aug 16, 2018 · 1 comment
Closed

Unable to resolve "./scraperDrivers/allDrivers.js" #1

rosevinnur opened this issue Aug 16, 2018 · 1 comment

Comments

@rosevinnur
Copy link

rosevinnur commented Aug 16, 2018

Unable to resolve "./scraperDrivers/allDrivers.js" from "utils/api.js"

@agilgur5
Copy link
Owner

agilgur5 commented Aug 17, 2018

Yes, I haven't completed documenting this as there's still a lot of WIP (less than a week old), but per the .gitignore located in the utils/ directory and the last commit in that directory, there is intentionally no site-specific scraping code committed.

You can either find some of this code elsewhere or write some scraping code for your site of choice on your own. I was planning to eventually split out the scraping API/spec as a separate library/spec that folks can implement scraper plugins against (i.e. just return data formatted a certain way and it will work). A site-specific scraper currently does not take very much code to implement -- the one I'm using is ~30 LoC, see an example below.

I'll document this at a later point, but if you want to use this immediately, here's some reference code below. Note that the metadata per item and the "api"/layout of different sites can vary widely, so the spec is also a WIP.

For now, allDrivers.js is just a wrapper file that looks like:

import * as site1Driver from './site1Driver.js'
import * as site2Driver from './site2Driver.js'

const drivers = {
  site1Driver,
  site2Driver
}

export default drivers

and a site-specific scraper driver might look like:

export const latestURL = 'http://sitelinkhere/?page='

export function getLatest ($) {
  return $('.classname')
    .map((index, el) => ({
      title: $(el).find...,
      cover: $(el).find('img').attr('src'),
      link: $(el).find('a').attr('href'),
      release: $(el).find...
    }))
    .get()
}

export function getChapters ($) {
  const title = $('selector1').text().trim()
  const chapters = $('selector2')
    .map((index, el) => ({
      link: $(el).attr('href').replace('//', 'http://'),
      title: $(el).text().match(/[0-9]+/)[0] || '0'
    }))
    .get()
  const tags = $('selector3')
    .map((index, el) => $(el).text().trim())
    .get()
  const summary = $('.some-classname').text().trim()
  return { title, chapters, tags, summary }
}

export function getPages ($) {
  return $('selector1')
    .map((index, el) =>
      $(el).attr('value').replace('//', 'http://')
    )
    .get()
}

export function getImage ($) {
  return $('imgselector').attr('src')
}

This repo in its WIP state currently only works as a plain latest reader client of a single site with no bells or whistles (i.e. no storage/caching besides that of the network layer, no search, etc)

@agilgur5 agilgur5 changed the title Unable to resolve Unable to resolve "./scraperDrivers/allDrivers.js" May 22, 2019
@agilgur5 agilgur5 changed the title Unable to resolve "./scraperDrivers/allDrivers.js" Unable to resolve "./scraperDrivers/allDrivers.js" Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants