Skip to content

Commit

Permalink
#65 first pass at deliverable
Browse files Browse the repository at this point in the history
  • Loading branch information
arosenbe committed Aug 29, 2017
1 parent 7dfbad4 commit a56aad0
Showing 1 changed file with 98 additions and 0 deletions.
98 changes: 98 additions & 0 deletions issues/issue65.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# Automatic dependency installation

This document outlines dependency handling for the tools used by the template.

## Summary

The global SCons download can be replaced by a distribution of scons-local at the top-level of the template. This entirely removes the SCons dependency.

Package dependencies for each language should be enumerated in config-global. We write a script in gslab_python that reads config-global and calls additional scripts in other languages to check for, install, and optionally update these dependencies. The python script is executed at runtime before SCons runs any build steps, and it only runs the language-specific script if its builder is active. All dependencies are tracked in the same way and installed without user interaction.

We could ask people to download a distribution of `anaconda` for their bundled Python and R binaries. This seems like more trouble than just asking people to install them according to the basic instruction.

There's no workaround for git, git-lfs, Stata, MATLAB, LyX, or LaTeX. These should (as necessary) be enumerated in the top-level `README.md` alongside any other dependencies.

## Scons-local

We should include a [scons-local](http://scons.org/faq.html#What.27s_the_difference_between_the_scons.2C_scons-local.2C_and_scons-src_packages.3F) distribution at the top level of template.

> It's intended to be dropped in to and shipped with packages of other software that want to build with SCons, but which don't want to have to require that their users install SCons.
The actual use is similar to that of scons, except we'd run the directory using

```bash
python ../scons/scons.py <optional args>
```

instead of

```bash
scons <optional args>
```

### Open questions

* Do we include the unpacked directory (2.3 MB, assumed above) or a compressed archive of scons-local (650 KB).
* Cloning will be slightly faster with the compressed archive. Though the difference is negligible compared to the pdfs.
* SCons seems to assume that only the compressed archive is included.
* There's room for paths to break when unpacking the archive.
* There's one more step for each fresh clone if we only include the compressed archive.

## Packages

The proposed package management solution is parsimonious and standard, but it's off the beaten track as far as "real people" are concerned. Below I outline the way developers seem to manage their packages for Python and R. I also

### Python

Packages, versions, and remote locations can be loaded into `requirements.txt`. It looks like this

```txt
PyYaml
-e git+https://github.com/gslab-econ/gslab_python@4.1.0#egg=gslab-tools
```

These packages can be installed at the versions from the locations via

```bash
pip install -r requirements.txt
```

This will overwrite other packages of the same name in the default package storage location.

See [here](https://pip.pypa.io/en/stable/user_guide/) for more detail. Also note that we can give `requirements.txt` an arbitrary name.

We could use `requirements.txt` to track all python package dependencies for each project, instead of tracking them in the `README.md`. This would cut down on clutter and speed up development.

#### Open questions

##### Virtual environments

Most people seem to use a `requirements.txt` in conjunction with a project-specific virtual environment ("virtualenv"). The virtualenv keeps package installation local to the project, so `requirements.txt` can't overwrite packages the user has already installed.

Python virtualenvs are [platform dependent and cannot be moved to machines with different environments](https://virtualenv.pypa.io/en/stable/userguide/#making-environments-relocatable). Real people use virtualenvs for their own sanity, but they're not something we can easily force on other users. I'm in favor of using `requirements.txt` for global package installation.

##### `conda`

We can supposedly use `conda` to create platform independent virtualenvs bound to a particular version of python. I spent a few hours playing with this and was finally able to load `PyYaml` in an environment, but I couldn't get `gslab_scons`. The `conda` method might be more robust, but it seems tough to work with. It also introduces dependencies on the `conda` distribution.

### R

We can use [`packrat`](http://rstudio.github.io/packrat/) to store a snapshot of all loaded R packages for a project in template/analysis. When a user (or scons) starts R in a `packrat` directory those packages (and no others) are automatically loaded. If any required package—including `packrat` itself`—isn't installed, then `packrat` will install that package as well.

This seems like a great tool, it's developed by the rstudio team, and I'm definitely in favor of making it standard. We can move R package dependencies from `README.md` into `packrat` to help cut down on clutter and speed up development.

#### Open questions

##### Masking packages

One issue is that `packrat` masks all packages not in a snapshot when you're working inside it's directory, so you might need to re-install `ggplot2` locally even if you have it globally. I could see this upsetting some users if code breaks in unexpected places or they need to install packages they're used to having.

#### Subdirectories

By default, `packrat` only runs when R is started from within it's directory, not any subdirectories. This could slow down interactive coding if packages are stored in `packrat` that aren't available system-wide. It's not a huge deal though, as missing packages can always be installed a la carte, and `packrat` can be `init`ed from any directory.

This isn't a problem for `scons` because each script is executed from the same directory as the `SConstruct`, so we just need to put the `packrat` directory there. It's also not a problem if scripts in `source` load packages that aren't stored in `packrat`, as `packrat` will automatically attempt to install and save them at runtime.

## Other Notes

Maybe add link to [chocolatey](https://chocolatey.org/) for windows users.

0 comments on commit a56aad0

Please sign in to comment.