DataBiosphere · adamnovak · Sep 27, 2023 · Mar 16, 2023 · Mar 17, 2023 · Mar 24, 2023
diff --git a/docs/appendices/deploy.rst b/docs/appendices/deploy.rst
@@ -31,27 +31,27 @@ From here, you can install a project and its dependencies::
    $ tree
    .
    ├── util
-   │   ├── __init__.py
-   │   └── sort
-   │       ├── __init__.py
-   │       └── quick.py
+   │   ├── __init__.py
+   │   └── sort
+   │       ├── __init__.py
+   │       └── quick.py
    └── workflow
        ├── __init__.py
        └── main.py
 
    3 directories, 5 files
    $ pip install matplotlib
-   $ cp -R workflow util venv/lib/python2.7/site-packages
+   $ cp -R workflow util venv/lib/python3.9/site-packages
 
 Ideally, your project would have a ``setup.py`` file (see `setuptools`_) which streamlines the installation process::
 
    $ tree
    .
    ├── util
-   │   ├── __init__.py
-   │   └── sort
-   │       ├── __init__.py
-   │       └── quick.py
+   │   ├── __init__.py
+   │   └── sort
+   │       ├── __init__.py
+   │       └── quick.py
    ├── workflow
    │   ├── __init__.py
    │   └── main.py
@@ -70,7 +70,7 @@ both Python and Toil are assumed to be present on the leader and all worker node
 
 We can now run our workflow::
 
-   $ python main.py --batchSystem=mesos …
+   $ python main.py --batchSystem=kubernetes …
 
 .. important::
 
@@ -101,13 +101,13 @@ This scenario applies if the user script imports modules that are its siblings::
    $ cd my_project
    $ ls
    userScript.py utilities.py
-   $ ./userScript.py --batchSystem=mesos …
+   $ ./userScript.py --batchSystem=kubernetes …
 
 Here ``userScript.py`` imports additional functionality from ``utilities.py``.
 Toil detects that ``userScript.py`` has sibling modules and copies them to the
 workers, alongside the user script. Note that sibling modules will be
 auto-deployed regardless of whether they are actually imported by the user
-script–all .py files residing in the same directory as the user script will
+script-all .py files residing in the same directory as the user script will
 automatically be auto-deployed.
 
 Sibling modules are a suitable method of organizing the source code of
@@ -134,16 +134,16 @@ The following shell session illustrates this::
    $ tree
    .
    ├── utils
-   │   ├── __init__.py
-   │   └── sort
-   │       ├── __init__.py
-   │       └── quick.py
+   │   ├── __init__.py
+   │   └── sort
+   │       ├── __init__.py
+   │       └── quick.py
    └── workflow
        ├── __init__.py
        └── main.py
 
    3 directories, 5 files
-   $ python -m workflow.main --batchSystem=mesos …
+   $ python -m workflow.main --batchSystem=kubernetes …
 
 .. _package: https://docs.python.org/2/tutorial/modules.html#packages
 
@@ -168,7 +168,7 @@ could do this::
    $ cd my_project
    $ export PYTHONPATH="$PWD"
    $ cd /some/other/dir
-   $ python -m workflow.main --batchSystem=mesos …
+   $ python -m workflow.main --batchSystem=kubernetes …
 
 Also note that the root directory itself must not be package, i.e. must not
 contain an ``__init__.py``.
@@ -193,7 +193,7 @@ replicates ``PYTHONPATH`` from the leader to every worker.
 Toil Appliance
 --------------
 
-The term Toil Appliance refers to the Mesos Docker image that Toil uses to simulate the machines in the virtual mesos
+The term Toil Appliance refers to the ubuntu-based Docker image that Toil uses to simulate the machines in the virtual
 cluster.  It's easily deployed, only needs Docker, and allows for workflows to be run in single-machine mode and for
 clusters of VMs to be provisioned.  To specify a different image, see the Toil :ref:`envars` section.  For more
 information on the Toil Appliance, see the :ref:`runningAWS` section.
diff --git a/docs/gettingStarted/quickStart.rst b/docs/gettingStarted/quickStart.rst
@@ -32,14 +32,14 @@ Toil uses batch systems to manage the jobs it creates.
 
 The ``singleMachine`` batch system is primarily used to prepare and debug workflows on a
 local machine. Once validated, try running them on a full-fledged batch system (see :ref:`batchsysteminterface`).
-Toil supports many different batch systems such as `Apache Mesos`_ and Grid Engine; its versatility makes it
+Toil supports many different batch systems such as `Kubernetes`_ and Grid Engine; its versatility makes it
 easy to run your workflow in all kinds of places.
 
 Toil is totally customizable! Run ``python helloWorld.py --help`` to see a complete list of available options.
 
 For something beyond a "Hello, world!" example, refer to :ref:`runningDetail`.
 
-.. _Apache Mesos: https://mesos.apache.org/getting-started/
+.. _Kubernetes: https://kubernetes.io/
 
 .. _cwlquickstart:
 
@@ -279,7 +279,7 @@ workflow there is always one leader process, and potentially many worker process
 
 When using the single-machine batch system (the default), the worker processes will be running
 on the same machine as the leader process. With full-fledged batch systems like
-Mesos the worker processes will typically be started on separate machines. The
+Kubernetes the worker processes will typically be started on separate machines. The
 boilerplate ensures that the pipeline is only started once---on the leader---but
 not when its job functions are imported and executed on the individual workers.
 
@@ -394,8 +394,10 @@ Also!  Remember to use the :ref:`destroyCluster` command when finished to destro
 #. Launch a cluster in AWS using the :ref:`launchCluster` command::
 
         (venv) $ toil launch-cluster <cluster-name> \
+                     --clusterType kubernetes \
                      --keyPairName <AWS-key-pair-name> \
                      --leaderNodeType t2.medium \
+                     --nodeTypes t2.medium -w 1 \
                      --zone us-west-2a
 
    The arguments ``keyPairName``, ``leaderNodeType``, and ``zone`` are required to launch a cluster.
@@ -448,8 +450,10 @@ Also!  Remember to use the :ref:`destroyCluster` command when finished to destro
 #. First launch a node in AWS using the :ref:`launchCluster` command::
 
       (venv) $ toil launch-cluster <cluster-name> \
+                   --clusterType kubernetes \
                    --keyPairName <AWS-key-pair-name> \
                    --leaderNodeType t2.medium \
+                   --nodeTypes t2.medium -w 1 \
                    --zone us-west-2a
 
 #. Copy ``example.cwl`` and ``example-job.yaml`` from the :ref:`CWL example <cwlquickstart>` to the node using
@@ -462,24 +466,25 @@ Also!  Remember to use the :ref:`destroyCluster` command when finished to destro
 
       (venv) $ toil ssh-cluster --zone us-west-2a <cluster-name>
 
-#. Once on the leader node, it's a good idea to update and install the following::
+#. Once on the leader node, command line tools such as ``kubectl`` will be available to you. It's also a good idea to
+   update and install the following::
 
     sudo apt-get update
     sudo apt-get -y upgrade
     sudo apt-get -y dist-upgrade
     sudo apt-get -y install git
-    sudo pip install mesos.cli
 
 #. Now create a new ``virtualenv`` with the ``--system-site-packages`` option and activate::
 
     virtualenv --system-site-packages venv
     source venv/bin/activate
 
-#. Now run the CWL workflow::
+#. Now run the CWL workflow with the kubernetes batch system::
 
       (venv) $ toil-cwl-runner \
                    --provisioner aws \
-                   --jobStore aws:us-west-2a:any-name \
+                   --batchSystem kubernetes \
+                   --jobStore aws:us-west-2:any-name \
                    /tmp/example.cwl /tmp/example-job.yaml
 
    ..  tip::
@@ -498,6 +503,8 @@ Also!  Remember to use the :ref:`destroyCluster` command when finished to destro
 Running a Workflow with Autoscaling - Cactus
 ---------------------------------------------------
 
+.. TODO: change to use a kubernetes cluster.
+
 `Cactus <https://github.com/ComparativeGenomicsToolkit/cactus>`__ is a reference-free, whole-genome multiple alignment
 program that can be run on any of the cloud platforms Toil supports.
 

diff --git a/docs/running/cloud/amazon.rst b/docs/running/cloud/amazon.rst
@@ -99,21 +99,27 @@ during the computation of a workflow, first set up and configure an account with
    the installed version that you are using if you're using a different version): ::
 
     $ TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:5.3.0 \
-          toil launch-cluster clustername \
+          toil launch-cluster <cluster-name> \
+          --clusterType kubernetes \
           --leaderNodeType t2.medium \
+          --nodeTypes t2.medium -w 1 \
           --zone us-west-1a \
           --keyPairName id_rsa
 
 To further break down each of these commands:
 
-    **TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:latest** --- This is optional.  It specifies a mesos docker image that we maintain with the latest version of toil installed on it.  If you want to use a different version of toil, please specify the image tag you need from https://quay.io/repository/ucsc_cgl/toil?tag=latest&tab=tags.
+    **TOIL_APPLIANCE_SELF=quay.io/ucsc_cgl/toil:latest** --- This is optional.  It specifies a ubuntu-based docker image that we maintain with the latest version of toil installed on it.  If you want to use a different version of toil, please specify the image tag you need from https://quay.io/repository/ucsc_cgl/toil?tag=latest&tab=tags.
 
     **toil launch-cluster** --- Base command in toil to launch a cluster.
 
-    **clustername** --- Just choose a name for your cluster.
+    **<cluster-name>** --- Just choose a name for your cluster.
+
+    **--clusterType kubernetes** --- Specify the type of cluster to coordinate and execute your workflow. Kubernetes is the recommended option.
 
     **--leaderNodeType t2.medium** --- Specify the leader node type.  Make a t2.medium (2CPU; 4Gb RAM; $0.0464/Hour).  List of available AWS instances: https://aws.amazon.com/ec2/pricing/on-demand/
 
+    **--nodeTypes t2.medium -w 1** --- Specify the worker node type and the number of worker nodes to launch. The kubernetes cluster requires at least 1 worker node.
+
     **--zone us-west-1a** --- Specify the AWS zone you want to launch the instance in.  Must have the same prefix as the zone in your awscli credentials (which, in the example of this tutorial is: "us-west-1").
 
     **--keyPairName id_rsa** --- The name of your key pair, which should be "id_rsa" if you've followed this tutorial.
@@ -162,7 +168,7 @@ Getting started with the provisioner is simple:
    `here <http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#cli-config-files>`__.
 
 The Toil provisioner is built around the Toil Appliance, a Docker image that bundles
-Toil and all its requirements (e.g. Mesos). This makes deployment simple across
+Toil and all its requirements (e.g. Kubernetes). This makes deployment simple across
 platforms, and you can even simulate a cluster locally (see :ref:`appliance_dev` for details).
 
 .. admonition:: Choosing Toil Appliance Image
@@ -182,12 +188,14 @@ Details about Launching a Cluster in AWS
 ----------------------------------------
 
 Using the provisioner to launch a Toil leader instance is simple using the ``launch-cluster`` command. For example,
-to launch a cluster named "my-cluster" with a t2.medium leader in the us-west-2a zone, run ::
+to launch a kubernetes cluster named "my-cluster" with a t2.medium leader in the us-west-2a zone, run ::
 
     (venv) $ toil launch-cluster my-cluster \
+                 --clusterType kubernetes \
                  --leaderNodeType t2.medium \
+                 --nodeTypes t2.medium -w 1 \
                  --zone us-west-2a \
-                 --keyPairName <your-AWS-key-pair-name>
+                 --keyPairName <AWS-key-pair-name>
 
 The cluster name is used to uniquely identify your cluster and will be used to
 populate the instance's ``Name`` tag. Also, the Toil provisioner will
@@ -234,9 +242,12 @@ change. This is in contrast with :ref:`Autoscaling`.
 To launch worker nodes alongside the leader we use the ``-w`` option::
 
     (venv) $ toil launch-cluster my-cluster \
+                 --clusterType kubernetes \
                  --leaderNodeType t2.small -z us-west-2a \
-                 --keyPairName your-AWS-key-pair-name \
-                 --nodeTypes m3.large,t2.micro -w 1,4
+                 --keyPairName <AWS-key-pair-name> \
+                 --nodeTypes m3.large,t2.micro -w 1,4 \
+                 --zone us-west-2a
+
 
 This will spin up a leader node of type t2.small with five additional workers --- one m3.large instance and four t2.micro.
 
@@ -264,6 +275,8 @@ look like ::
 Running a Workflow with Autoscaling
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
+.. TODO: change to use kubernetes. But the kubernetes batch system doesn't support autoscaling?
+
 Autoscaling is a feature of running Toil in a cloud whereby additional cloud instances are launched to run the workflow.
 Autoscaling leverages Mesos containers to provide an execution environment for these workflows.
 
@@ -276,6 +289,7 @@ Autoscaling leverages Mesos containers to provide an execution environment for t
 #. Launch the leader node in AWS using the :ref:`launchCluster` command: ::
 
     (venv) $ toil launch-cluster <cluster-name> \
+                 --clusterType mesos \
                  --keyPairName <AWS-key-pair-name> \
                  --leaderNodeType t2.medium \
                  --zone us-west-2a
@@ -382,7 +396,7 @@ For example, to launch a Toil cluster with a Kubernetes scheduler, run: ::
             --provisioner=aws \
             --clusterType kubernetes \
             --zone us-west-2a \
-            --keyPairName wlgao@ucsc.edu \
+            --keyPairName <AWS-key-pair-name> \
             --leaderNodeType t2.medium \
             --leaderStorage 50 \
             --nodeTypes t2.medium -w 1-4 \

diff --git a/docs/running/cloud/cloud.rst b/docs/running/cloud/cloud.rst
@@ -7,9 +7,11 @@ Running in the Cloud
 Toil supports Amazon Web Services (AWS) and Google Compute Engine (GCE) in the cloud and has autoscaling capabilities
 that can adapt to the size of your workflow, whether your workflow requires 10 instances or 20,000.
 
-Toil does this by creating a virtual cluster with `Apache Mesos`_.  `Apache Mesos`_ requires a leader node to coordinate
+Toil does this by creating a virtual cluster with `Kubernetes`_.  `Kubernetes`_ requires a leader node to coordinate
 the workflow, and worker nodes to execute the various tasks within the workflow.  As the workflow runs, Toil will
-"autoscale", creating and terminating workers as needed to meet the demands of the workflow.
+"autoscale", creating and terminating workers as needed to meet the demands of the workflow. Historically, Toil has
+spun up clusters with `Apache Mesos`_, but it is no longer the recommended way to coordinate and execute tasks within
+the workflow.
 
 Once a user is familiar with the basics of running toil locally (specifying a :ref:`jobStore <jobStoreOverview>`, and
 how to write a toil script), they can move on to the guides below to learn how to translate these workflows into cloud
@@ -25,12 +27,13 @@ distributed over several nodes. The provisioner also has the ability to automati
 the cluster to handle dynamic changes in computational demand (autoscaling).  Currently we have working provisioners
 with AWS and GCE (Azure support has been deprecated).
 
-Toil uses `Apache Mesos`_ as the :ref:`batchSystemOverview`.
+Toil uses `Kubernetes`_ as the :ref:`batchSystemOverview`.
 
 See here for instructions for :ref:`runningAWS`.
 
 See here for instructions for :ref:`runningGCE`.
 
+.. _Kubernetes: https://kubernetes.io/
 .. _Apache Mesos: https://mesos.apache.org/gettingstarted/
 
 .. _cloudJobStore:

diff --git a/src/toil/utils/toilLaunchCluster.py b/src/toil/utils/toilLaunchCluster.py
@@ -39,7 +39,8 @@ def create_tags_dict(tags: List[str]) -> Dict[str, str]:
 def main() -> None:
     parser = parser_with_common_options(provisioner_options=True, jobstore_option=False)
     parser.add_argument("-T", "--clusterType", dest="clusterType",
-                        choices=['mesos', 'kubernetes'], default='mesos',
+                        choices=['mesos', 'kubernetes'],
+                        default=None,  # TODO: change default to "kubernetes" when we are ready.
                         help="Cluster scheduler to use.")
     parser.add_argument("--leaderNodeType", dest="leaderNodeType", required=True,
                         help="Non-preemptible node type to use for the cluster leader.")
@@ -160,6 +161,16 @@ def main() -> None:
         raise RuntimeError(f'Please provide a value for --zone or set a default in the '
                            f'TOIL_{options.provisioner.upper()}_ZONE environment variable.')
 
+    if options.clusterType == "mesos":
+        logger.warning('You are using a "mesos" cluster, which is no longer recommended as Toil is '
+                       'transitioning to using a kubernetes-based cluster. Consider switching to '
+                       '--clusterType=kubernetes.')
+
+    if options.clusterType is None:
+        logger.warning('Argument --clusterType is not set... using "mesos" as the cluster scheduler. '
+                       'Starting in the next version of Toil, the default cluster scheduler will be '
+                       'set to "kubernetes" if the cluster type is not specified.')
+        options.clusterType = "mesos"
 
     logger.info('Creating cluster %s...', options.clusterName)