Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure S3 to store Harvester Records #4335

Open
6 tasks
btylerburton opened this issue May 26, 2023 · 2 comments
Open
6 tasks

Configure S3 to store Harvester Records #4335

btylerburton opened this issue May 26, 2023 · 2 comments
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0

Comments

@btylerburton
Copy link
Contributor

btylerburton commented May 26, 2023

User Story

In order to perform operations on the data records reliably, datagov wants an interface to interact with S3 or the localstack equivalent.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

sourceId = UUID generated by controller on creation of new harvest source
jobId = UUID generated by controller when a new job is intiated
recordId = UUID generated by extract service to track status of record within pipeline

  • GIVEN I would like to save the extracted record from a harvest source
    AND I have a prefix defined by the schema <feature>/<sourceId>/<jobId>/<recordId>
    THEN I want a utility to PUT that object using that prefix

  • GIVEN I would like to retrieve a previously saved record from S3
    AND I have a prefix defined by the schema <feature>/<sourceId>/<jobId>/<recordId>
    THEN I want a utility to GET the object associated with that prefix

  • GIVEN I would like to delete a previously saved record
    AND I have a prefix defined by the schema <feature>/<sourceId>/<jobId>/<recordId>
    THEN I want a utility to DELETE the object associated with that prefix

  • GIVEN I would like to query the count of added/updated/deleted records with a <jobId>
    AND I have a prefix defined by the schema <feature>/<sourceId>/<jobId>
    THEN I want a utility to GET the objects associated with that <jobId> and return them.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Data.gov would like all Boto / S3 references contained within a single module so that any upgrades to the service would happen simultaneously.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • Create a module to interact with S3 client
  • Create helper methods to abstract away all details except name (<feature>/<sourceId>/<jobId>/<recordId>) and value (the record to store)
@btylerburton btylerburton moved this to 📟 Sprint Backlog [7] in data.gov team board May 26, 2023
@btylerburton btylerburton added H2.0/Harvest-General General Harvesting 2.0 Issues H2.0/controller labels May 26, 2023
@btylerburton btylerburton changed the title WIP Setup Interface for S3 Setup Interface for S3 May 30, 2023
@rshewitt
Copy link
Contributor

the PUT object is currently included in the work near completion in the extract ticket #4257

@hkdctol hkdctol moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Jul 20, 2023
@hkdctol hkdctol moved this from 🏗 In Progress [8] to 📟 Sprint Backlog [7] in data.gov team board Jul 20, 2023
@robert-bryson robert-bryson moved this from 📟 Sprint Backlog [7] to 🏗 In Progress [8] in data.gov team board Aug 11, 2023
@robert-bryson robert-bryson moved this from 🏗 In Progress [8] to New Dev in data.gov team board Aug 25, 2023
@robert-bryson
Copy link
Contributor

Moving back to new dev as work superseded by Airflow work.

@robert-bryson robert-bryson removed their assignment Sep 5, 2023
@btylerburton btylerburton changed the title Setup Interface for S3 WIP Setup Interface for S3 Nov 27, 2023
@btylerburton btylerburton changed the title WIP Setup Interface for S3 WIP Use S3 to store XCom objects Nov 27, 2023
@btylerburton btylerburton assigned FuhuXia and unassigned FuhuXia Nov 27, 2023
@btylerburton btylerburton changed the title WIP Use S3 to store XCom objects Configure S3 to store XCom objects Dec 6, 2023
@btylerburton btylerburton removed the H2.0/Harvest-General General Harvesting 2.0 Issues label Dec 13, 2023
@btylerburton btylerburton moved this to 🧊 Icebox in data.gov team board Feb 16, 2024
@btylerburton btylerburton changed the title Configure S3 to store XCom objects Configure S3 to store Harvester Records Feb 16, 2024
@btylerburton btylerburton added H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0 and removed H2.0/Airflow labels Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0
Projects
Status: 🧊 Icebox
Development

No branches or pull requests

4 participants