Skip to content

Strategies for leveraging workflow systems to streamline large-scale biological analyses

License

Unknown, CC0-1.0 licenses found

Licenses found

Unknown
LICENSE.md
CC0-1.0
LICENSE-CC0.md
Notifications You must be signed in to change notification settings

dib-lab/2020-workflows-paper

Streamlining Data-Intensive Biology With Workflow Systems

GitHub Actions Status

Accepted Manuscript

DOI

bioRxiv preprint (initially preprinted 07/01/2020)

PDF: PDF Manuscript

HTML: HTML Manuscript

Code of Conduct

This project operates under a code of conduct. Participating in the project in any way (issues, pull requests, gitter, or other media) indicates that you agree that you will follow the code of conduct. We take this very seriously. If you experience harassment or notice violations of the code of conduct, please raise the issue to one of the project organizers (@taylorreiter or @bluegenes).

Project Description

As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. Data-centric workflow systems can alleviate some of these challenges, but knowledge of and training in these techniques is still lacking. Our goal is to generate a helpful set of strategies for leveraging workflow systems to streamline large-scale biological analyses.

Our initial version has been much improved through iterations of feedback primarily from members and friends of the DIB-lab. While the practices are written with specific examples for high-throughput sequencing data, we hope many of the perspectives and guidance provided by the document apply more generally to all workflow-enabled biology.

This repository is a living document (written with manubot) that aims to consolidate and integrate helpful information about workflow systems and their applications in data-intensive biology. We welcome constructive feedback from workflow-enabled biologists of all levels anywhere in the world.

Contributions

You'll need to make a free GitHub account.

Instructions and procedures for contributing are outlined here.

We will follow the ICMJE Guidelines for determining authorship.

Pull Requests

If you are not familiar with git and GitHub, you can use these directions to start contributing.

Please feel encouraged to ask questions by opening a Request for Help issue GitHub issues

This project is a collaborative effort that will benefit from the expertise of scientists across a wide range of workflow applications!

Manubot

Manubot is a system for writing scholarly manuscripts via GitHub. Manubot automates citations and references, versions manuscripts using git, and enables collaborative writing via GitHub. An overview manuscript presents the benefits of collaborative writing with Manubot and its unique features. The rootstock repository is a general purpose template for creating new Manubot instances, as detailed in SETUP.md. See USAGE.md for documentation how to write a manuscript.

Please open an issue for questions related to Manubot usage, bug reports, or general inquiries.

Repository directories & files

  • This file is called README.md It is the centralized document for the repository and will help direct users to other relevant information.
  • CONTRIBUTING.md contains procedures and directions for contributing to this effort.
  • INSTRUCTIONS.md contains instructions for new GitHub users for how to navigate GitHub in the browser as well as GitHub vocabulary. It also includes some instructions for more experienced users about the procedures we recommend and how to run manubot on the command line.
  • USAGE.md describes formatting instructions for formatting text, citing references, adding figures and tables, etc.
  • SETUP.md includes information about setting up manubot
  • LICENSE.md and LICENSE-CC0.md contain the licenses associated with manubot and with the content we are developing in this project. Please see the "License" section below.

The directories are as follows:

  • content contains the manuscript source, which includes markdown files as well as inputs for citations and references. These are the files that most contributors will be editing. See USAGE.md for more information.
  • output contains the outputs (generated files) from Manubot including the resulting manuscripts. You should not edit these files manually, because they will get overwritten.
  • webpage is a directory meant to be rendered as a static webpage for viewing the HTML manuscript.
  • build contains commands and tools for building the manuscript.
  • ci contains files necessary for deployment via continuous integration.

License

License: CC BY 4.0 License: CC0 1.0

Except when noted otherwise, the entirety of this repository is licensed under a CC BY 4.0 License (LICENSE.md), which allows reuse with attribution. Please attribute by linking to https://github.com/dib-lab/2020-workflows-paper.

Since CC BY is not ideal for code and data, certain repository components are also released under the CC0 1.0 public domain dedication (LICENSE-CC0.md). All files matched by the following glob patterns are dual licensed under CC BY 4.0 and CC0 1.0:

  • *.sh
  • *.py
  • *.yml / *.yaml
  • *.json
  • *.bib
  • *.tsv
  • .gitignore

All other files are only available under CC BY 4.0, including:

  • *.md
  • *.html
  • *.pdf
  • *.docx

Please open an issue for any question related to licensing.

Attribution

Many of the documents (especially *.md documents) and issues presented in this repository were modified from another manubot repository.

About

Strategies for leveraging workflow systems to streamline large-scale biological analyses

Resources

License

Unknown, CC0-1.0 licenses found

Licenses found

Unknown
LICENSE.md
CC0-1.0
LICENSE-CC0.md

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages