Skip to content

Generate bulk reports for FlyBase curators, users and collaborators.

License

Notifications You must be signed in to change notification settings

FlyBase/harvdev-bulk-reports

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

harvdev-bulk-reports

Overview

This repo contains scripts that are used to generate most FlyBase bulk reports, as well as reports for FlyBase curators and external collaborators. This is a new public repository that replaces the private, retired harvdev-reports (renamed to harvdev-reports-old) repository. This repo contains a mix of newer python scripts and older perl (taken from the fb_cvs/FB/scripts/reports repo). Bulk reports for FlyBase users are shipped off to IUDev and incorporated into the public FB site on the Downloads page and FTP site. Other reports are posted to internal and external FTP sites for use.
This code is intended to run in docker using the Bulk_Reports GoCD pipeline (Reporting_Build pipeline group).
This Bulk_Reports GoCD pipeline is automatically triggered by the completion of the related, upstream Reporting_Build GoCD pipeline.

BulkReportsSOP

EnvironmentVariables

Variables are set at the outset of the entire reporting release build for the Reporting_Build pipeline group environment that runs all pipelines.
The variables relevant to the generation of bulk reports are as follows:

Here is the list of plain text variables that change each release:
ANNOTATIONRELEASE - the Dmel annotation release version for the reporting build - check the FB public release notes and increment that by one.
RELEASE - the current release under construction: e.g., 2024_01.
PREV_RELEASE - the release number for the previous reporting build: e.g., if RELEASE=2024_01, then PREV_RELEASE=2023_06.

Here is the list of plain text variables that rarely change from release to release:
BUILD_DIR - the directory in which files for the db build are stored: i.e., /data/build-public-release.
USER - the postgres use name, which does not change: i.e., go.
DOCKER_USER - the docker username for the FlyBase docker account: i.e., harvdevgocd.
REPORTING_SERVER - the server on which the reporting dbs are kept: i.e., flysql25.
REPORTING_DATABASE - the name of the reporting build that is in progress: e.g., fb_2024_01_reporting.
ASSEMBLY - the name of the current Dmel reference genome assembly as it is known at the Alliance: i.e., R6.

PipelineSummary

Download files are generated by the Bulk_Reports in Reporting_Build pipeline group.
The pipeline automates these steps:

  1. Gets HarvDev docker container and builds the appropriate docker image using this repo.
  2. Saves output bulk files to the /data/build-reporting/fb_${RELEASE}_reporting/bulk_reports directory.
  3. Checks file sizes relative to a reference release (usually the previous release) to detect any missing files, or files that are <99% expected size.
  • For FB2024_01 and earlier, file sizes were manually checked.
  • See the Reporting Builds Google Drive directory for examples.
  1. Notifies HarvDev by email that the files have been generated.

DetailedSOP

  1. Review the Environment variables for the Reporting_Build GoCD pipeline group.
  • They should've been updated in earlier release build steps, but double check them.
  1. Run the pipeline.
  2. When the files have been generated, upon receiving the email, check files sizes.

NextSteps

  1. If both this Bulk_Reports GoCD pipeline, and the upstream Reporting_Build GoCD pipeline, seem to have completed without issue and all file sizes are normal, then manually start the Upload_Reporting_Build GoCD pipeline, which will upload all files related to the release build for various users.

TroubleShooting

The Reporting Build SOP discusses various troubleshooting scenarios for dealing with failed scripts and GoCD pipelines.

About

Generate bulk reports for FlyBase curators, users and collaborators.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published