Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.0.0 Release #52

Merged
merged 20 commits into from
Jul 27, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,5 @@ user-config.yaml
*.out
*.svn
.sconsign.dblite
*.lyx.emergency

152 changes: 101 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,110 @@
Using the repository template
=============================

#### Pre-requisites:

- Windows `cmd.exe`, Mac OS X `bash`, or Linux `bash`.
- [Python 2.X](https://www.python.org) (add to [PATH](https://en.wikipedia.org/wiki/PATH_(variable)))
- [Stata MP](http://www.stata.com/statamp/) (add to [PATH](https://en.wikipedia.org/wiki/PATH_(variable)))
- [R](https://www.r-project.org/) (add to [PATH](https://en.wikipedia.org/wiki/PATH_(variable)))
- [Lyx](https://www.lyx.org/) (add to [PATH](https://en.wikipedia.org/wiki/PATH_(variable)))
- [SCons](http://scons.org/) (Note that version 2.4.0 or later is best if using the [cache](http://scons.org/doc/2.0.1/HTML/scons-user/c4213.html)).
- More information about SCons can be found [here](https://github.com/gslab-econ/ra-manual/wiki/SCons).
- [git-lfs](https://git-lfs.github.com/)
- [gslab_tools](https://github.com/gslab-econ/gslab_python) version 3.0.3 or later
- [GSLab-modified Metropolis beamer theme](https://github.com/gslab-econ/mtheme)
- [YAML](http://yaml.org/)-related packages/modules:
- Stata ado file: [yaml](https://github.com/sergiocorreia/stata-misc/tree/75a8b251bec02ba590c862cc395c4b95077d8a95)
- Python module: [PyYAML](http://pyyaml.org/wiki/PyYAML)
- R package: [yaml](https://cran.r-project.org/web/packages/yaml/yaml.pdf)


The easiest way to install some of the applications above is to use [Homebrew](http://brew.sh/) on Mac OS and [Linuxbrew](http://linuxbrew.sh/) on Linux, as they will set up the `PATH` variable for you, e.g., `brew install scons`.

#### To run:
- The entire directory:
- In the root directory, type `scons` in the command line. This should run everything that is flagged as being modified or with dependencies that have been modified.
- A single directory of targets:
- `scons build/data` will re-build the `build/data` folder if it is out of sync, without rebuilding other files.
- A single target file:
- `scons build/paper/paper.pdf` will re-run only the code needed to update `build/paper/paper.pdf` without rebuilding other files.

See [here](https://github.com/gslab-econ/gslab_python/tree/master/gslab_scons) for directions on making a 'release'.

#### Copy the template:
In order to create a new repository using this template, either

- First, either:
- Fork this repository
- Create an empty repository in GitHub and clone it locally. Copy the contents of this template into the empty repository. Make sure to exclude the `.git` folder, but include the `.gitattributes` and `.gitignore` files. Re-run the entire directory using `Scons`. Commit and push to the new repository.
- Setup a `user-config.yaml` in the root of the directory (note that this file should not be versioned):
- MacOS or Linux minimal working example
```
stata_flavor: statamp
cache: /Users/leviboxell/Google Drive/cache/template
```
- Windows 10 minimal working example (note the quotation marks)
# GSLab Template

The GSLab Template is a minimal working demonstration of the tools and organization used by projects in the GSLab. We use SCons and a few custom builders to execute scripts and track dependencies in a portable and flexible manner.

## Prerequisites

You'll need the following to run the template. [Homebrew](https://brew.sh/) for Mac and [Linuxbrew](http://linuxbrew.sh/) for Linux make this easier.
* Windows `cmd.exe`, Mac OS X `bash`, or Linux `bash`.
* [Python 2.7.X](https://wiki.python.org/moin/BeginnersGuide/Download) for [Windows](https://docs.python.org/2/using/windows.html), [Mac](https://docs.python.org/2/using/mac.html) or [Linux](https://docs.python.org/2/using/unix.html).
* [gslab_python](https://github.com/gslab-econ/gslab_python) version 4.0.0.
* [PyYAML](http://pyyaml.org/wiki/PyYAML) a Python module for parsing YAML files.
* [SCons](http://scons.org/pages/download.html) version 2.4 or later.
* [git](https://git-scm.com/downloads) for version control.
* [git-lfs](https://git-lfs.github.com/) for versioning large files.
* You'll need both git and git-lfs to clone the repository.
* [LyX](https://www.lyx.org/Download) (with instructions for LaTeX)
* Add LyX to your PATH for [Windows](http://www.computerhope.com/issues/ch000549.htm), [Mac](http://hathaway.cc/post/69201163472/how-to-edit-your-path-environment-variables-on-mac), and [Linux](http://stackoverflow.com/questions/14637979/how-to-permanently-set-path-on-linux).
* The beamer theme [`metropolis`](https://github.com/matze/mtheme). This is part of MikTeX since Dec 2014.
* [Stata](http://www.stata.com/)
* Add Stata to your PATH for [Windows](http://www.computerhope.com/issues/ch000549.htm), [Mac](http://hathaway.cc/post/69201163472/how-to-edit-your-path-environment-variables-on-mac), and [Linux](http://stackoverflow.com/questions/14637979/how-to-permanently-set-path-on-linux).
* [yaml](https://github.com/gslab-econ/stata-misc) a Stata ado file for parsing YAML files.

## Getting started

1. Open a shell, clone the repository, and navigate to its root.
```
stata_flavor: "%STATAEXE%"
cache: C:\Users\Levi Boxell\Google Drive\cache\template
git clone https://github.com/gslab-econ/template.git
cd template
```


2. You're ready to go. We'll prompt you to enter any necessary information and store it in `user-config.yaml` as your scripts run.
* To build everything that has been modified or with dependencies in the repository that have been modified.
```
scons
```
* To build everything in a single directory of targets that has been modified and all of their dependencies that have been modified.
```
scons build/path/to/directory
```
* To build a single target that has been modified and all of its dependencies that have been modified.
```
scons build/path/to/file.ext
```

## Copying the template

If you want to create a repository with the same structure as this template you can fork it. If you want a repository without any of our git history, follow these instructions.
* Create an empty repository in GitHub and clone it.
* Copy the contents of this template into the empty repository. Make sure to exclude the `.git` folder, but include the [`.gitattributes`](https://git-scm.com/docs/gitattributes) and [`.gitignore`](https://git-scm.com/docs/gitignore) files.
* Create a `user-config.yaml` file and run `scons`.
* Commit the changes and push to the new repository.

## FAQ

##### What is `user-config.yaml`?

Each user is allowed to have different local specifications: We don't put any restrictions on where you keep large files, what you call your executables, or how you manage shared directories. We do need to find these things, and that's what `user-config.yaml` is for. Each user maintains an unversioned [YAML file](http://yaml.org/) with these sorts of specifications. Each script uses its associated yaml-parsing module to read these specifications each time the script is run.

##### What do I put in `user-config.yaml`?

There's no "default" for `user-config.yaml` because it depends on system specifications and user preferences. Three things we do recommend keeping in `user-config.yaml` are the name of your Stata executable, the location of a [SCons cache directory](http://scons.org/doc/2.0.1/HTML/scons-user/c4213.html), and the location of a release directory. These fields don't have to be specified if you're not using them, and we'll prompt you for their values at runtime if you've forgotten to specify them and they're necessary. A Mac example where Example_User is running a factory-fresh StataMP and has local access to a directories named cache/template and release on Dropbox would be

```YAML
stata_executable: statamp
cache_directory: /Users/Example_User/Dropbox/cache/template
release_directory: /Users/Example_User/Dropbox/release/
```

##### How do I handle data external to my repository?

We are agnostic about how you incorporate external data into the template. There's no custom builder for these assets, by design. Our suggestions:

* When a large dataset is stored locally, `user-config.yaml` can include an entry specifying the user-specific path to that dataset. The key of the entry should be constant across users and documented in the top-level readme of the repository.

* When a large dataset is stored externally, there are a few options.
* The top-level readme can specify manual download and storage instructions. This is simple, easy to customize, and unlikely to cause errors during a SCons build. It does, however, require each user to successfully download the same dataset, perhaps in an unstructured manner.
* The download can be incorporated into the SCons build. We either execute a program to transfer data (e.g., `rsync` or `rclone`) directly in a standard SCons command or from within a script executed by one of our custom builders. These methods have the benefits of automation and dependency tracking, but they can introduce idiosyncratic errors if the download steps are prone to failure.
* Regardless of the download method, the path to the dataset should be added to `constant.yaml` and `.gitignore` if it is stored within the repository and to `user-config.yaml` if it is stored elsewhere.

##### Can I use other software for data analysis?

Yes. We have custom builders for Python and R. You can also use them with the same syntax as the Stata builder. If you're using R, make sure it's been added to your PATH and that you have a YAML-parsing package, such as [yaml](https://cran.r-project.org/web/packages/yaml/yaml.pdf).

##### Can I pass "command line style" arguments to a script?

You bet. All of our custom builders accept "command line style" arguments with the same method. Enumerate the arguments in a list and pass them to the builder through the `CL_ARG` keyword argument, exactly the same way you specify sources and targets. We'll format this list, and `scons` will pass its contents to the script at runtime. You can reference these arguments when writing a script using the standard practice for its language.

##### How is the build process logged?

Each of our custom builders produces a log of its process in the same directory as the first of its targets. Each log is named `sconscript.log` by default, and you can insert custom text between `sconscript` and `.log` by passing it as a string through the builder's `log_ext` keyword argument. It's similar to the way that you specify sources and targets, except that the `log_ext` argument must be a string. You should specify the `log_ext` argument for builders that produce logs in the same directory, otherwise the default `sconscript.log` will be overwritten by each builder.

After all the steps in the build are completed, we'll comb through the directory and look for for any files named `sconscript*.log`. These logs will be concatenated—with the earliest completed ones first and all logs with errors on top. We'll store this concatenated log at the root of the repository in `sconstruct.log`.

##### Can I write my paper in LaTeX instead of LyX?

We don't have a custom builder for LaTeX. You can still write in it, but you will have to use [SCons's native builder](http://www.scons.org/doc/0.96.91/HTML/scons-user/a5334.html). You can still use our custom table builder to fill LaTeX tables.

##### Can I release my repository?

Yes, our [custom tool](https://github.com/gslab-econ/gslab_python/tree/master/gslab_scons) allows you to release to GitHub and a local destination specified in `user-config.yaml`. A new release can be transfered to a remote manually (e.g., using `rsync` or `rclone`) or automatically by specifying a local destination that's synced to a remote (e.g., a Dropbox directory).

Every file intended for release should be added to the `release` directory. Files not intended for release to GitHub should be added to `.gitignore`. Our tool will transfer everything in `release` to the local destination and create a [GitHub release](https://help.github.com/articles/creating-releases/) with all the versioned files—those not added to `.gitignore`—in `release`.

#### License

The MIT License (MIT)

Copyright (c) 2016 Matthew Gentzkow, Jesse Shapiro
Copyright (c) 2017 Matthew Gentzkow, Jesse Shapiro

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Expand Down
56 changes: 32 additions & 24 deletions SConstruct
Original file line number Diff line number Diff line change
@@ -1,37 +1,43 @@
# Preliminaries
import os
import sys
sys.dont_write_bytecode = True # Don't write .pyc files

# Test for proper prerequisites and setup
from setup import setup_test
[user_configs, mode, sf, cache_dir] = setup_test(ARGUMENTS)
import gslab_scons
from configuration_test import configuration_test
[mode, stata_executable, cache_dir] = configuration_test(ARGUMENTS,
gslab_python_version = '4.0.0')
import gslab_scons as gs
import gslab_scons.log as log
import yaml
import atexit

# Start log
gslab_scons.start_log()
# Start log after getting mode and release version
mode = ARGUMENTS.get('mode', 'develop')
vers = ARGUMENTS.get('version', '')
log.start_log(mode, vers)

# Defines environment
# Define the SCons environment
env = Environment(ENV = {'PATH' : os.environ['PATH']},
IMPLICIT_COMMAND_DEPENDENCIES = 0,
BUILDERS = {'Tablefill' : Builder(action = gslab_scons.build_tables),
'BuildLyx' : Builder(action = gslab_scons.build_lyx),
'BuildR' : Builder(action = gslab_scons.build_r),
'BuildStata' : Builder(action = gslab_scons.build_stata),
'BuildPython' : Builder(action = gslab_scons.build_python)},
user_flavor = sf)

env.Decider('MD5-timestamp') # Only computes hash if time-stamp changed
env.EXTENSIONS = ['.eps', '.pdf', '.lyx'] # Extensions to be used when scanning for source files in BuildLyx.
SourceFileScanner.add_scanner('.lyx', Scanner(gslab_scons.misc.lyx_scan, recursive = True))

BUILDERS = {'Tablefill': Builder(action = gs.build_tables),
'BuildLyx': Builder(action = gs.build_lyx),
'BuildStata': Builder(action = gs.build_stata),
'BuildPython': Builder(action = gs.build_python)},
stata_executable = stata_executable)

# Only computes hash if time-stamp changed
env.Decider('MD5-timestamp')
# Extensions to be used when scanning for source files in BuildLyx.
env.EXTENSIONS = ['.eps', '.pdf', '.lyx']
SourceFileScanner.add_scanner('.lyx', Scanner(gs.misc.lyx_scan, recursive = True))
# Load paths
env['PATHS'] = yaml.load(open("constants.yaml", 'rU'))

# Export environment
Export('env')

# Run sub-trees
SConscript('source/data/SConscript')
SConscript('source/analysis/SConscript')
SConscript('source/tables/SConscript')
SConscript('source/paper/SConscript')
Expand All @@ -42,9 +48,11 @@ Default('./build', './release')
if mode == 'cache':
CacheDir(cache_dir)

# Print the state of the repo at end of SCons run
finish_command = Command( 'state_of_repo.log', [], gslab_scons.misc.state_of_repo, MAXIT=10) # From http://stackoverflow.com/questions/8901296/how-do-i-run-some-code-after-every-build-in-scons
Depends(finish_command, BUILD_TARGETS)
env.AlwaysBuild(finish_command)
if 'state_of_repo.log' not in BUILD_TARGETS:
BUILD_TARGETS.append('state_of_repo.log')
debrief_env = {'MAXIT' : 10,
# Folders to look in for large versioned files
'look_in' : 'release;source',
# Soft limits on file sizes
'file_MB_limit' : 2,
'total_MB_limit' : 500}
atexit.register(log.end_log)
atexit.register(gs.misc.scons_debrief, target = 'state_of_repo.log', env = debrief_env)
53 changes: 53 additions & 0 deletions configuration_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/usr/bin/python
import sys
import os
import re
import subprocess
import warnings
from gslab_scons import _exception_classes
from gslab_scons import misc

def configuration_test(ARGUMENTS, gslab_python_version):
# Determines whether to print traceback messages
debug = ARGUMENTS.get('debug', False)
if not debug:
# Hide traceback for configuration test only
# http://stackoverflow.com/questions/27674602/hide-traceback-unless-a-debug-flag-is-set
sys.tracebacklimit = 0

# Checks initial prerequisites
try:
from gslab_scons import configuration_tests as config
except ImportError:
message = 'Your gslab_tools Python modules installation is outdated'
raise Exception(message)

config.check_python(gslab_python_version = gslab_python_version,
packages = ["yaml", "gslab_scons", "gslab_fill"])
config.check_lyx()
config.check_lfs()
stata_executable = config.check_stata(["yaml"])

# Uncomment if using
# config.check_r(packages = ["yaml"])

# Loads arguments and configurations
mode = ARGUMENTS.get('mode', 'develop') # Gets mode; defaults to 'develop'

# Checks mode/version
if not (mode in ['develop', 'cache']):
message = "Error: %s is not a defined mode" % mode
raise _exception_classes.PrerequisiteError(message)

# Get return list
if mode == 'cache':
cache_dir = misc.load_yaml_value("user-config.yaml", "cache_directory")
cache_dir = misc.check_and_expand_path(cache_dir)
return_list = [mode, stata_executable, cache_dir]
else:
return_list = [mode, stata_executable, None]

# Restore default tracebacklimit and return values
sys.tracebacklimit = 1000

return return_list
2 changes: 0 additions & 2 deletions constants.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,13 @@
# Build Directories
build:
analysis: build/analysis
data: build/data
paper: build/paper
tables: build/tables
talk: build/talk

# Source Directories
source:
analysis: source/analysis
data: source/data
figures: source/figures
paper: source/paper
raw: source/raw
Expand Down
Binary file modified release/paper/ondeck.pdf
Binary file not shown.
Binary file modified release/paper/online_appendix.pdf
Binary file not shown.
Binary file modified release/paper/paper.pdf
Binary file not shown.
16 changes: 4 additions & 12 deletions release/paper/sconscript.log
Original file line number Diff line number Diff line change
@@ -1,17 +1,9 @@
Log created: 2017-02-06 17:13:52
Log completed: 2017-02-06 17:13:54

This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex)
*** Builder log created: {2017-07-27 13:02:13}
*** Builder log completed: {2017-07-27 13:02:14}
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./paper.tex
LaTeX2e <2016/03/31>
Babel <3.9r> and hyphenation patterns for 83 language(s) loaded.

This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./paper.tex
(./text.tex
LaTeX2e <2016/03/31>
Babel <3.9r> and hyphenation patterns for 83 language(s) loaded.

Binary file modified release/paper/text.pdf
Binary file not shown.
7 changes: 3 additions & 4 deletions release/talk/sconscript.log
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
Log created: 2017-02-06 17:13:54
Log completed: 2017-02-06 17:13:57

This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex)
*** Builder log created: {2017-07-27 13:02:14}
*** Builder log completed: {2017-07-27 13:02:18}
This is pdfTeX, Version 3.14159265-2.6-1.40.17 (TeX Live 2016) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(./slides.tex
Expand Down
Binary file modified release/talk/slides.pdf
Binary file not shown.
Loading