Skip to content

Latest commit

 

History

History
97 lines (76 loc) · 5.52 KB

data.md

File metadata and controls

97 lines (76 loc) · 5.52 KB

Queue Data

Queue Data

Space Considerations

ZooKeeper was not designed to be a general database or large object store. Instead, it manages coordination data. This data can come in the form of configuration, status information, rendezvous, etc. A common property of the various forms of coordination data is that they are relatively small: measured in kilobytes. The ZooKeeper client and the server implementations have sanity checks to ensure that znodes have less than 1M of data, but the data should be much less than that on average. 1

Final vs Volatile data fields

  • As we write to zookeeper, should be distinguish our static fields (submitter, file name) from the volatile fields (status, space_needed, last update)?

Batches and Jobs

Zookeeper Node Path Node Data Type Fields Created By Modified By Comment
/batches/BID/lock none - Pending, Reporting - Ephemeral node to lock a batch, deleted by the thread that creates the node
/batches/BID/submission json profile_name
submitter
payload_filename
submissionDate

erc_what
erc_who
erc_when
erc_where
type
submission_mode
creation none
/batches/BID/status json status
last_modified
message
creation all jobs done message is optional
/batches/BID/status-report json failed_jobs failure failure last status report sent to user
/batches/BID/states/STATE/JID none - STATE = batch-processing / batch-failed / batch-completed / batch-deleted
Create watcher to watch for states/processing to be empty
/batches/BID/lock none - Several states - Ephemeral node to lock a job, deleted by the thread that creates the node
/batch-uuids/UUID string - Pending none batchID uuid is minted by ingest, used for lookup
/jobs/JID/bid string batch_id creation none
/jobs/JID/configuration json profile_name
submitter
submissionDate
payload_url
payload_type
response_type
local_id
creation none
/jobs/JID/status json status
last_successful_status
last_modification_date
retry_count
message
creation none message is optional
/jobs/JID/priority int - creation estimating
/jobs/JID/space_needed long - creation estimating
/jobs/JID/identifiers json primary_id
local_id: []
creation processing
/jobs/JID/metadata json erc_what
erc_who
erc_when
erc_where
creation ?
/jobs/JID/inventory json manifest_url
mode
ingest inventory mode will be used when we implement "fix" options
/jobs/states/STATE/PP-JID none - PP = priority
STATE = pending / held / estimating / provisioning / downloading / processing / recording / notify / failed / completed

Locks

Zookeeper Node Path Node Data Type Fields Created By Modified By Comment
/locks/queue/ingest none - Admin Admin Previously file-system based
/locks/queue/accessSmall none - Admin Admin
/locks/queue/accessLarge none - Admin Admin
/locks/storage/{ark} none - Ingest Ingest slashes are replaced with _
/locks/queue/localid/{localid}{owner} none - Ingest Ingest slashes are replaced with _
/locks/inventory/{ark} none - Inventory Inventory slashes are replaced with _
/locks/collections/{mnemonic} none - Admin Admin

Access Queue

Zookeeper Node Path Node Data Type Fields Created By Modified By Comment
/access/small/ID/token
/access/large/ID/token
json token
delivery-node
cloud-content-byte
status
url
anticipated-availability-time
Access Access
/access/small/ID/status
/access/large/ID/status
json status
last_modified
message
creation all jobs done message is optional

Batch and Job State Transition

  • Processing /jobs/states/StateX/PP-JID
  • Job finishes StateX
  • Update /jobs/JID/status data
    • last_successful_status = StateX
    • status = StateY
    • last_modification_date = now
  • Delete /jobs/states/StateX/PP-JID
  • Create /jobs/states/StateY/PP-JID
    • Note: The prior state might have altered the priority
  • If StateY == Completed
    • Delete /batches/BID/states/processing/JID
    • Create /batches/BID/states/completed/JID
  • If StateY == Failed
    • Delete /batches/BID/states/processing/JID
    • Create /batches/BID/states/failed/JID
  • If /batches/BID/states/processing is empty, watcher will trigger batch notification

Legacy Zookeeper Data Structure

Record Data

  • Ingest currently serializes java properties
  • Inventory currently serializes XML data

Record Keys

The ingest service currently packs a priority value into the path name for the zookeeper record.

  • /ingest/mrtQ-02100000000003
  • (document the component parts here)
  • Question: priority may become a more dynamic property in the future
    • We could have a baseline priority in the pathname (for sorting) and an actual priority in the payload
    • We could also explore renaming a path dynamically when a priority change is appropriate

Record Sorting

Current Implementation

In Merritt's current zookeeper implementation, record headers contain binary data.

  • Status: 1 byte status field with each byte representing a different queue state
  • Time: 8 byte long representing the number of seconds since 1970

Footnotes

  1. https://zookeeper.apache.org/doc/r3.3.3/zookeeperProgrammers.html#Data+Access