Skip to content

AWS-based data store for DoL and other public record data on employers, for CDM

License

Notifications You must be signed in to change notification settings

ResearchActionDesign/migrant-employer-data-hub

Repository files navigation

Migrant Employer Data Hub

Source code for a Django, PostGres & AWS Lambda - based tool for scraping data from seasonaljobs.dol.gov, as well as deduplicating employer records and importing data from other sources..

Code (c) Research Action Design, LLC. Originally produced for Centro de los Derechos del Migrante, Inc.

Released under a GPL v3 license, see LICENSE file for specific text of license.

Creating migrations

Create a migration using alembic by running

python -m alembic revision --autogenerate -m "<MESSAGE>"

Migrate the db by running

python -m alembic upgrade head

Deploy to AWS Sam

The lambda function is built within the container specified by lambda.Dockerfile.

Initial deploy

Build the lambda function with sam build.

On initial deploy you will need to do the following:

  1. Play around with approaches to fixing the circular dependency problem, see https://aws.amazon.com/blogs/mt/resolving-circular-dependency-in-provisioning-of-amazon-s3-buckets-with-aws-lambda-event-notifications/
  2. If deployment does not succeed initially, you will be stuck in a ROLLBACK state on initial deployment and need to run sam delete before re-trying the initial deployment.

Subsequent deployments

  1. Build the lambda function with sam build.
  2. Run sam deploy --profile cdm

Note: The CloudFormation template is not reliably setting up S3 bucket triggers. You may find it easier to just add those via the AWS console UI.

Running the dedupe UI locally

  1. Start up postgres by running docker-compose up
  2. Run pipenv run python interactive_dedupe_session.py

About

AWS-based data store for DoL and other public record data on employers, for CDM

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages