Skip to content

OSC/bc_osc_galaxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Batch Connect - OSC Galaxy

An interactive app designed for OSC OnDemand that launches Galaxy within an Owens batch job.

Prerequisites

This Batch Connect app requires the following software be installed on the compute nodes that the batch job is intended to run on (NOT the OnDemand node):

  • Lmod 6.0.1+ or any other module restore and module load <modules> based CLI used to load appropriate environments within the batch job before launching Galaxy.

Install

The Install process runs on the login node

Use git to clone this app and checkout the desired branch/version you want to use:

git clone <repo>
cd <dir>
git checkout <tag/branch>

Install Galaxy and dependencies

sh install-galaxy.sh

You will not need to do anything beyond this as all necessary assets are installed. You will also not need to restart this app as it isn't a Passenger app.

To update the app you would:

cd <dir>
git fetch
git checkout <tag/branch>

Again, you do not need to restart the app as it isn't a Passenger app.

Contributing

  1. Fork it ( https://github.com/OSC/bc_osc_galaxy/fork )
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Developer Notes

See the inline comments on PR #7 for more information. The main PR topic description was copied below.

Overview

Galaxy interface app runs on Owens. The users can install, manage and run tools and workflows.

Changes

  • Remove galaxy submodule
  • Add install-galaxy.sh to install galaxy 19.09
  • Galaxy config files are removed and will be generated by before.sh.yml instead
  • Update Readme
  • Job Configuration. Currently, the jobs submitted via Galaxy will run on the same node as Galaxy.

Known Issues:

  • Get Data from external data sources
  • Match workflow to destination
  • The links in the sidebar and the main section are the same. The links in the sidebar are working but the links in the main section are broken. As the screenshot shows, the admin/roles is not correctly appended to the URL. broken link
  • pbs-python is installed at run time when launching.
Collecting pbs_python (from -r /dev/stdin (line 1))
Installing collected packages: pbs-python
Successfully installed pbs-python-4.4.2.1

may need to add something similiar to this to install-galaxy.sh

galaxy_user@galaxy_server% git clone https://github.com/ehiggs/pbs-python
galaxy_user@galaxy_server% cd pbs-python
galaxy_user@galaxy_server% source /clusterfs/galaxy/galaxy-app/.venv/bin/activate
galaxy_user@galaxy_server% python setup.py install
  • When sharing the app, some files are trying to write to the galaxy.

Experiment 1: Passenger App (failed)

Failed to mount the app to /pun/dev/galaxy due to 404 not found error.

Experiment 2: Interactive App (succeed with known issues)

After git clone this repo, run sh install-galaxy.sh to git clone Galaxy release_19.09 to ./galaxy folder and install dependencies in the virtual environment under ./galaxy.venv folder and _conda under ./galaxy/database/dependencies folder. This script will also build custom visualization plugins

After completing the sh install-galaxy.sh, galaxy can be launched as an interactive app. In before.sh.erb, galaxy.yml (general configuration), job_resource_params_conf.xml (job resource configuration for users to select), job_conf.xml (job runners configuration) are generated.

Galaxy is mounted on /node/${HOSTNAME}/8080=galaxy.webapps.galaxy.buildapp:uwsgi_app()

Database:

Data files are stored in the user’s dataroot (default to ~/.galaxy/ configured in Galaxy.yml

azhu $ ls ~/.galaxy/
citations  compiled_templates  control.sqlite  files  jobs_directory  object_store_cache  pbs  tmp  universe.sqlite

Authentication:

Galaxy.yml takes in the user email address as the user authentication in the single-user mode. User identification has to be in email format, so ${USER}@osc.edu is passed to Galaxy.yml as a temporary solution. Further authentication can be configured as described [here].(https://galaxyproject.org/admin/config/external-user-auth/)

Select Job Runner

The users select the tool runner before starting the app. The developer adds destinations to job config file and assigns the user-selected runner to default.

<destinations default="<%= context.job_runner %>">
<destination id="dynamic_cores_time" runner="dynamic">
<param id="type">python</param>
<param id="function">dynamic_cores_time</param>
</destination>
<destination id="pbs" runner="pbs">
<param id="Resource_List">walltime=5:00:00,nodes=1:ppn=<%= ppn %></param>
</destination>
<destination id="local" runner="local">
<param id="local_slots"><%= ppn %></param>
</destination>
</destinations>
Job runner field: form

Three types of job runners we consider

1. Run tools locally

Pros:

  • tool jobs won't be queued and will run immediately

Cons:

  • The number of concurrent jobs is limited, the maximum is the number of cores.
  • When the session ends, the unfinished jobs will end too.

2. Submit tool jobs to the cluster

Pros:

  • When the session ends, the unfinished jobs will continue to run.
  • Unlimited number of concurrent jobs

Cons:

  • Galaxy can only submit the jobs to the same cluster Galaxy is running on. For now, we run Galaxy on Owens, It's not able to submit jobs to quick. Therefore, there's a waiting time for jobs to run.

3. Users configure the runner before submitting each tool job.

Pros:

  • It's very configurable and flexible. We can configure different resources for different tools. The user can choose the default runner or specify resources: use default If the user chooses to specify resources: select params

Cons:

  • Because we can configure different resources for different tools, we have to specify the resources for each tool in the job conf file. If the user installs new tools, we need to find a way to also add the configuration for the new tools to the job conf file.
  • Because the resource selection is part of the tool form, for tools without tool forms like tools under GET DATA section, the users can't specify resources.

Example: configure dynamic running tools with user-defined resources

As an example, I configured BED-to-GFF tool to provide resources selection fields. Steps to configure a tool to use the dynamic runner based on resource parameters selected by the user:

  1. Specify the parameters in the job resource configuration file (https://github.com/galaxyproject/galaxy/blob/dev/lib/galaxy/config/sample/job_resource_params_conf.xml.sample). The following example contains cores and walltime. The input field can be an input box or a dropdown with several options.
    # Generate Galaxy job resource parameter configuration file
    (
    umask 077
    cat > "${JOB_RESOURCE_PARAMS_CONF_FILE_PATH}" << EOL
    <parameters>
    <param label="Cores" name="cores" type="integer" min="1" max="28" value="1" help="Number of processing cores, 'ppn' value (1-28). Leave blank to use default value." />
    <param label="Walltime" name="time" type="integer" size="3" min="1" max="24" value="1" help="Maximum job time in hours, 'walltime' value (1-24). Leave blank to use default value." />
    </parameters>
    EOL
    )
  2. Add rules to https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/jobs/rules directory to match job resource parameters entered by the user to destinations. The following example matches the default runner to the default destination. If the user enters cores and walltime, we construct a resource list and run the tool with pbs runner.
    # Add job rules
    (
    umask 077
    cat > "./lib/galaxy/jobs/rules/destinations.py" << EOL
    import logging
    from galaxy.jobs.mapper import JobMappingException
    from galaxy.jobs import JobDestination
    log = logging.getLogger(__name__)
    FAILURE_MESSAGE = 'This tool could not be run because of a misconfiguration in the Galaxy job running system, please report this error'
    def dynamic_cores_time(app, tool, job, user_email, resource_params):
    # handle job resource parameters
    if not resource_params.get("cores") and not resource_params.get("time"):
    default_destination_id = app.job_config.get_destination(None)
    log.warning('(%s) has no input parameter cores or time. Run with default runner: %s' % (job.id, default_destination_id.runner))
    return default_destination_id
    try:
    cores = resource_params.get("cores")
    time = resource_params.get("time")
    resource_list = 'walltime=%s:00:00,nodes=1:ppn=%s' % (time, cores)
    except:
    default_destination_id = app.job_config.get_destination(None)
    log.warning('(%s) failed to run with customized configuration. Run with default runner: %s' % (job.id, default_destination_id.runner))
    return default_destination_id
    log.info('returning pbs runner with configuration %s', resource_list)
    return JobDestination(runner="pbs", params={"Resource_List": resource_list})
    EOL
    )
  3. Add dynamic job runner to the <plugins> in job config file. rules_module field indicates the location of the files we created at step 2.
    <plugin id="dynamic" type="runner">
    <param id="rules_module">galaxy.jobs.rules</param>
    </plugin>
  4. Inside of <resources> in the job config file, add a group of parameters we defined at step 1 and define the group id.
    <resources>
    <group id="basic">cores,time</group>
    </resources>
  5. Inside of <tools> in the job config file, specify the id="tool_id", destination="destination_id" and resource="resource_group_id"
    <tools>
    <tool id="bed2gff1" destination="dynamic_cores_time" resources="basic"/>
    </tools>

Tools are defined under https://github.com/galaxyproject/galaxy/tree/dev/tools in the xml files. To find tool id, it's defined in the <tool> tag such as <tool id="createInterval" name="Create single interval" version="1.0.0">.

About

Batch Connect - OSC Galaxy Server

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •