diff --git a/docs/configuration_files.rst b/docs/configuration_files.rst new file mode 100644 index 00000000..7e3d7103 --- /dev/null +++ b/docs/configuration_files.rst @@ -0,0 +1,349 @@ +################### +Configuration Files +################### + +This page outlines how to construct configuration files to run your own routines +with `~skypy.pipeline.Pipeline`. + +`SkyPy` is an astrophysical simulation pipeline tool that allows to define any +arbitrary workflow and store data in table format. You may use `SkyPy` `~skypy.pipeline.Pipeline` +to call any function --your own implementation, from any compatible external software or from the `SkyPy library`. +Then `SkyPy` deals with the data dependencies and provides a library of functions to be used with it. + +These guidelines start with an example using one of the `SkyPy` functions, and it follows +the concrete YAML syntax necessary for you to write your own configuration files, beyond using `SkyPy` +functions. + +SkyPy example +------------- + +In this section, we exemplify how you can write a configuration file and use some of the `SkyPy` functions. +In this example, we sample redshifts and magnitudes from the SkyPy luminosity function, `~skypy.galaxies.schechter_lf`. + +- `Define variables`: + +From the documentation, the parameters for the `~skypy.galaxies.schechter_lf` function are: ``redshift``, the characteristic absolute magnitude ``M_star``, the amplitude ``phi_star``, faint-end slope parameter ``alpha``, +the magnitude limit ``magnitude_limit``, the fraction of sky ``sky_area``, ``cosmology`` and ``noise``. +If you are planning to reuse some of these parameters, you can define them at the top-level of your configuration file. +In our example, we are using ``Astropy`` linear and exponential models for the characteristic absolute magnitude and the amplitude, respectively. +Also, ``noise`` is an optional parameter and you could use its default value by omitting its definition. + + .. code:: yaml + + cosmology: !astropy.cosmology.default_cosmology.get [] + z_range: !numpy.linspace [0, 2, 21] + M_star: !astropy.modeling.models.Linear1D [-0.9, -20.4] + phi_star: !astropy.modeling.models.Exponential1D [3e-3, -9.7] + magnitude_limit: 23 + sky_area: 0.1 deg2 + +- `Tables and columns`: + +You can create a table ``blue_galaxies`` and for now add the columns for redshift and magnitude (note here the ``schechter_lf`` returns a 2D object) + + .. code:: yaml + + tables: + blue_galaxies: + redshift, magnitude: !skypy.galaxies.schechter_lf + redshift: $z_range + M_star: $M_star + phi_star: $phi_star + alpha: -1.3 + m_lim: $magnitude_limit + sky_area: $sky_area + +`Important:` if cosmology is detected as a parameter but is not set, it automatically uses the cosmology variable defined at the top-level of the file. + +This is how the entire configuration file looks like! + +.. literalinclude:: luminosity.yml + :language: yaml + +You may now save it as ``luminosity.yml`` and run it using the `SkyPy` `~skypy.pipeline.Pipeline`: + +.. plot:: + :include-source: true + :context: close-figs + + import matplotlib.pyplot as plt + from skypy.pipeline import Pipeline + + # Execute SkyPy luminosity pipeline + pipeline = Pipeline.read("luminosity.yml") + pipeline.execute() + + # Blue population + skypy_galaxies = pipeline['blue_galaxies'] + + # Plot histograms + fig, axs = plt.subplots(1, 2, figsize=(9, 3)) + + axs[0].hist(skypy_galaxies['redshift'], bins=50, histtype='step', color='purple') + axs[0].set_xlabel(r'$Redshift$') + axs[0].set_ylabel(r'$\mathrm{N}$') + axs[0].set_yscale('log') + + axs[1].hist(skypy_galaxies['magnitude'], bins=50, histtype='step', color='green') + axs[1].set_xlabel(r'$Magnitude$') + axs[1].set_yscale('log') + + plt.tight_layout() + plt.show() + +You can also run the pipeline directly from the command line and write the outputs to a fits file: + +.. code-block:: bash + + $ skypy luminosity.yml luminosity.fits + + + +Don’t forget to check out for more complete examples_! + +.. _examples: https://skypy.readthedocs.io/en/stable/examples/index.html + + +YAML syntax +----------- +YAML_ is a file format designed to be readable by both computers and humans. +Fundamentally, a file written in YAML consists of a set of key-value pairs. +Each pair is written as ``key: value``, where whitespace after the ``:`` is optional. +The hash character ``#`` denotes the start of a comment and all further text on that +line is ignored by the parser. + + +This guide introduces the main syntax of YAML relevant when writing +a configuration file to use with ``SkyPy``. Essentially, it begins with +definitions of individual variables at the top level, followed by the tables, +and, within the table entries, the features of objects to simulate are included. +Main keywords: ``parameters``, ``cosmology``, ``tables``. + + +Variables +^^^^^^^^^ +* `Variable definition`: a variable is defined as a key-value pair at the top of the file. + YAML is able to interpret any numeric data with the appropriate type: integer, float, boolean. + Similarly for lists and dictionaries. + In addition, SkyPy has added extra functionality to interpret and store Astropy Quantities_. + Everything else is stored as a string (with or without explicitly using quotes) + + .. code:: yaml + + # YAML interprets + counter: 100 # An integer + miles: 1000.0 # A floating point + name: "Joy" # A string + planet: Earth # Another string + mylist: [ 'abc', 789, 2.0e3 ] # A list + mydict: { 'fruit': 'orange', 'year': 2020 } # A dictionary + + # SkyPy extra functionality + angle: 10 deg + distance: 300 kpc + + +* `Import objects`: + the SkyPy configuration syntax allows objects to be imported directly from external + (sub)modules using the ``!`` tag and providing neither a list of arguments or a + dictionary of keywords. For example, this enables the import and usage of any Astropy cosmology: + + .. code:: yaml + + cosmology: !astropy.cosmology.Planck13 # import the Planck13 object and bind it to the variable named "cosmology" + + +Parameters +^^^^^^^^^^ + +* `Parameters definition`: parameters are variables that can be modified at execution. + + For example, + + .. code:: yaml + + parameters: + hubble_constant: 70 + omega_matter: 0.3 + + +Functions +^^^^^^^^^ +* `Function call`: functions are defined as tuples where the first entry is the fully qualified function name tagged with and exclamation mark ``!`` and the second entry is either a list of positional arguments or a dictionary of keyword arguments. + + For example, if you need to call the ``log10()`` and ``linspace()`` NumPy_ functions, then you define the following key-value pairs: + + .. code:: yaml + + log_of_2: !numpy.log10 [2] + myarray: !numpy.linspace [0, 2.5, 10] + + You can also define parameters of functions with a dictionary of keyword arguments. + Imagine you want to compute the comoving distance for a range of redshifts and an `Astropy` Planck 2015 cosmology. + To run it with the `SkyPy` pipeline, call the function and define the parameters as an indented dictionary. + + .. code:: yaml + + comoving_distance: !astropy.cosmology.Planck15.comoving_distance + z: !numpy.linspace [ 0, 1.3, 10 ] + + Similarly, you can specify the functions arguments as a dictionary: + + .. code:: yaml + + comoving_distance: !astropy.cosmology.Planck15.comoving_distance + z: !numpy.linspace {start: 0, stop: 1.3, num: 10} + + `N.B.` To call a function with no arguments, you should pass an empty list of + ``args`` or an empty dictionary of ``kwargs``. For example: + + .. code:: yaml + + cosmo: !astropy.cosmology.default_cosmology.get [] + + +* `Variable reference`: variables can be referenced by their full name tagged with a dollar sign ``$``. + In the previous example you could also define the variables at the top-level of the file and then reference them: + + .. code:: yaml + + redshift: !numpy.linspace [ 0, 1.3, 10 ] + comoving_distance: !astropy.cosmology.Planck15.comoving_distance + z: $redshift + +* The `cosmology` to be used by functions within the pipeline only needs to be set up at the top. If a function needs ``cosmology`` as an input, you need not define it again, it is automatically detected. + + For example, calculate the angular size of a galaxy with a given physical size, at a fixed redshift and for a given cosmology: + + .. code:: yaml + + cosmology: !astropy.cosmology.FlatLambdaCDM + H0: 70 + Om0: 0.3 + size: !skypy.galaxies.morphology.angular_size + physical_size: 10 kpc + redshift: 0.2 + +* `Job completion`: ``.depends`` can be used to force any function call to wait for completion + of any other job. + + A simple example where, for some reason, the comoving distance needs to be called after + completion of the angular size function: + + .. code:: yaml + + cosmology: !astropy.cosmology.Planck15 + size: !skypy.galaxies.morphology.angular_size + physical_size: 10 kpc + redshift: 0.2 + comoving_distance: !astropy.cosmology.Planck15.comoving_distance + z: !numpy.linspace [ 0, 1.3, 10 ] + .depends: size + + By doing so, you force the function call ``redshift`` to be completed before is used to compute the comoving distance. + + +Tables +^^^^^^ + +* `Table creation`: a dictionary of table names, each resolving to a dictionary of column names for that table. + + Let us create a table called ``telescope`` with a column to store the width of spectral lines that follow a normal distribution + + .. code:: yaml + + tables: + telescope: + spectral_lines: !scipy.stats.norm.rvs + loc: 550 + scale: 1.6 + size: 100 + +* `Column addition`: you can add as many columns to a table as you need. + Imagine you want to add a column for the telescope collecting surface + + .. code:: yaml + + tables: + telescope: + spectral_lines: !scipy.stats.norm.rvs + loc: 550 + scale: 1.6 + size: 100 + collecting_surface: !numpy.random.uniform + low: 6.9 + high: 7.1 + size: 100 + +* `Column reference`: columns in the pipeline can be referenced by their full name tagged with a dollar sign ``$``. + Example: the galaxy mass that follows a lognormal distribution. You can create a table ``galaxies`` + with a column ``mass`` where you sample 10000 object and a second column, ``radius`` which also follows a lognormal distribution + but the mean depends on how massive the galaxies are: + + .. code:: yaml + + tables: + galaxies: + mass: !numpy.random.lognormal + mean: 5. + size: 10000 + radius: !numpy.random.lognormal + mean: $galaxies.mass + + +* `Multi-column assignment`: multi-column assignment is performed with any 2d-array, where one of the dimensions is interpreted + as the rows of the table and the second dimension, as separate columns. Or you can do it from a function that returns a tuple. + + We use multi-column assignment in the following example where we sample a two-dimensional array of values from a lognormal distribution and then store them as three columns in a table: + + .. code:: yaml + + tables: + halos: + mass, radius, concentration: !numpy.random.lognormal + size: [10000, 3] + + +* `Table initialisation`: by default tables are initialised using ``astropy.table.Table()`` however this can be overridden using the ``.init`` keyword to initialise the table with any function call. + + For example, you can stack galaxy properties such as radii and mass: + + .. code:: yaml + + radii: !numpy.logspace [ 1, 2, 100 ] + mass: !numpy.logspace [ 9, 12, 100 ] + tables: + galaxies: + .init: !astropy.table.vstack [[ $radii, $mass ]] + + +* `Table reference`: when a function call depends on tables, you need to ensure the referenced table has the necessary content and is not empty. + You can do that with ``.complete``. + + Example: you want to perform a very simple abundance matching, i.e. painting galaxies within your halos. + You can create two tables ``halos`` and ``galaxies`` storing the halo mass and galaxy luminosities. + Then you can stack these two tables and store it in a third table called ``matching``. + + .. code:: yaml + + tables: + halos: + halo_mass: !numpy.random.uniform + low: 1.0e8 + high: 1.0e14 + size: 20 + galaxies: + luminosity: !numpy.random.uniform + low: 0.05 + high: 10.0 + size: 20 + matching: + .init: !astropy.table.hstack + tables: [ $halos, $galaxies ] + .depends: [ halos.complete, galaxies.complete ] + + +.. _YAML: https://yaml.org +.. _NumPy: https://numpy.org +.. _Quantities: https://docs.astropy.org/en/stable/units/ +.. _clone(): https://docs.astropy.org/en/stable/api/astropy.cosmology.FLRW.html?highlight=clone#astropy.cosmology.FLRW.clone diff --git a/docs/index.rst b/docs/index.rst index 3f1d2bb0..64604782 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -18,6 +18,7 @@ Getting Started install feature_list + configuration_files examples/index .. _user-docs: diff --git a/docs/luminosity.yml b/docs/luminosity.yml new file mode 100644 index 00000000..0ca5ed07 --- /dev/null +++ b/docs/luminosity.yml @@ -0,0 +1,15 @@ +cosmology: !astropy.cosmology.default_cosmology.get [] +z_range: !numpy.linspace [0, 2, 21] +M_star: !astropy.modeling.models.Linear1D [-0.9, -20.4] +phi_star: !astropy.modeling.models.Exponential1D [3e-3, -9.7] +magnitude_limit: 23 +sky_area: 0.1 deg2 +tables: + blue_galaxies: + redshift, magnitude: !skypy.galaxies.schechter_lf + redshift: $z_range + M_star: $M_star + phi_star: $phi_star + alpha: -1.3 + m_lim: $magnitude_limit + sky_area: $sky_area