This repo contains scripts that are used to generate most FlyBase bulk reports, as well as reports for FlyBase curators and external collaborators. This is a new public repository that replaces the private, retired harvdev-reports
(renamed to harvdev-reports-old
) repository. This repo contains a mix of newer python scripts and older perl (taken from the fb_cvs/FB/scripts/reports
repo). Bulk reports for FlyBase users are shipped off to IUDev and incorporated into the public FB site on the Downloads page and FTP site. Other reports are posted to internal and external FTP sites for use.
This code is intended to run in docker using the Bulk_Reports
GoCD pipeline (Reporting_Build
pipeline group).
This Bulk_Reports
GoCD pipeline is automatically triggered by the completion of the related, upstream Reporting_Build
GoCD pipeline.
Variables are set at the outset of the entire reporting release build for the Reporting_Build
pipeline group environment that runs all pipelines.
The variables relevant to the generation of bulk reports are as follows:
Here is the list of plain text variables that change each release:
ANNOTATIONRELEASE
- the Dmel annotation release version for the reporting build - check the FB public release notes and increment that by one.
RELEASE
- the current release under construction: e.g., 2024_01
.
PREV_RELEASE
- the release number for the previous reporting build: e.g., if RELEASE=2024_01
, then PREV_RELEASE=2023_06
.
Here is the list of plain text variables that rarely change from release to release:
BUILD_DIR
- the directory in which files for the db build are stored: i.e., /data/build-public-release
.
USER
- the postgres use name, which does not change: i.e., go
.
DOCKER_USER
- the docker username for the FlyBase docker account: i.e., harvdevgocd
.
REPORTING_SERVER
- the server on which the reporting dbs are kept: i.e., flysql25
.
REPORTING_DATABASE
- the name of the reporting build that is in progress: e.g., fb_2024_01_reporting
.
ASSEMBLY
- the name of the current Dmel reference genome assembly as it is known at the Alliance: i.e., R6
.
Download files are generated by the Bulk_Reports in Reporting_Build
pipeline group.
The pipeline automates these steps:
- Gets HarvDev docker container and builds the appropriate docker image using this repo.
- Saves output bulk files to the
/data/build-reporting/fb_${RELEASE}_reporting/bulk_reports
directory. - Checks file sizes relative to a reference release (usually the previous release) to detect any missing files, or files that are <99% expected size.
- For FB2024_01 and earlier, file sizes were manually checked.
- See the Reporting Builds Google Drive directory for examples.
- Notifies HarvDev by email that the files have been generated.
- Review the
Environment variables
for theReporting_Build
GoCD pipeline group.
- They should've been updated in earlier release build steps, but double check them.
- Run the pipeline.
- When the files have been generated, upon receiving the email, check files sizes.
- If both this
Bulk_Reports
GoCD pipeline, and the upstreamReporting_Build
GoCD pipeline, seem to have completed without issue and all file sizes are normal, then manually start theUpload_Reporting_Build
GoCD pipeline, which will upload all files related to the release build for various users.
The Reporting Build SOP discusses various troubleshooting scenarios for dealing with failed scripts and GoCD pipelines.