Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding in Greenhouse.io & hire.withgoogle.com #48

Open
rpmullig opened this issue Oct 18, 2018 · 1 comment
Open

Adding in Greenhouse.io & hire.withgoogle.com #48

rpmullig opened this issue Oct 18, 2018 · 1 comment

Comments

@rpmullig
Copy link

Sorry, I'm new to open source projects but wanted to jump in on this one as I'm currently active job hunting.

I have found when I do google searches like "site:hire.withgoogle.com" and "site:greenhous.io" along with a position and a location in the query, I can yield those employers that are utilizing those services. Often times, Google is indexing them rather slowly and the jobs will be taken off.

My thought is that there is a way to web crawl and web scrape those two sites on a weekly basis or something for new job postings by their respective employer clientele.

Thoughts? Thanks for reading!

@karllhughes
Copy link
Member

Hey @rpmullig thanks for hopping in!

It's probably possible to crawl Greenhouse and Google hiring results, but doing it is outside the scope of this project. Scrapers are not trivial to build, and I would prefer not to bundle and maintain one within this project for the following reasons:

  • Because you're not using a documented API, you have to update the scraper as often as their website changes.
  • Scraping is usually against these sites' terms of service, so there are legal issues.
  • Scraping can require complex user agent and authentication spoofing. Again, there are legal considerations here, but also complex technical requirements that would require a lot of maintenance.

To give you some insight into the internals, this project uses documented APIs and XML feeds given by job boards for the express purpose of data distribution. In other words, this project works with existing data that is meant to be accessed programmatically.

If Greenhouse and Google Hiring offer APIs (and you can get the requisite access to them), we can explore integrating them with JobsToMail. Otherwise, I'm afraid we won't be able to add them 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants