Skip to content

Commit

Permalink
Pull request for #96: Complete MG points 1-5 (#98)
Browse files Browse the repository at this point in the history
* #95 switch to scons 3

* .#96 update to new builders and config and rerun

* #96 drop _ before anything builder

* #98 #96 rerun with new builders

* #98 #96 typo in README

* #98 #96 better advertise package dependencies in config scripts

* #98 #96 mention reset config_user in README FAQ
  • Loading branch information
arosenbe authored Mar 2, 2018
1 parent d764443 commit 0162177
Show file tree
Hide file tree
Showing 34 changed files with 612 additions and 791 deletions.
19 changes: 9 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ The GSLab Template is a minimal working demonstration of the tools and organizat
## Prerequisites

You'll need the following to run the template. [Homebrew](https://brew.sh/) for Mac and [Linuxbrew](http://linuxbrew.sh/) for Linux make this easier.
* Windows `cmd.exe`, Mac OS X `bash`, or Linux `bash`.
* [Python 2.7.X](https://wiki.python.org/moin/BeginnersGuide/Download) for [Windows](https://docs.python.org/2/using/windows.html), [Mac](https://docs.python.org/2/using/mac.html) or [Linux](https://docs.python.org/2/using/unix.html).
* Windows `cmd.exe`, Mac OS X `bash`, or Linux `bash`.
* [Python 2.7.X](https://wiki.python.org/moin/BeginnersGuide/Download) and [pip](https://pip.pypa.io/en/stable/installing/) for [Windows](https://docs.python.org/2/using/windows.html), [Mac](https://docs.python.org/2/using/mac.html) or [Linux](https://docs.python.org/2/using/unix.html).
* [git](https://git-scm.com/downloads) for version control.
* [git-lfs](https://git-lfs.github.com/) for versioning large files.
* You'll need both git and git-lfs to clone the repository.
Expand All @@ -23,6 +23,7 @@ You'll need the following to run the template. [Homebrew](https://brew.sh/) for
```
2. Install Python dependencies.
```bash
# Store package names in this script.
python config/config_python.py
```
3. Unzip the scons package.
Expand Down Expand Up @@ -80,22 +81,20 @@ git push

#### What is `config_user.yaml`?

Each user is allowed to have different local specifications: We don't put any restrictions on where you keep large files, what you call your executables, or how you manage shared directories. We do need to find these things, and that's what `config_user.yaml` is for. Each user maintains an **unversioned** [YAML file](http://yaml.org/) with these sorts of specifications. Each script uses its associated YAML-parsing module to read these specifications each time the script is run.
Each user is allowed to have different local specifications: We don't put any restrictions on where you keep large files, what you call your executables, or how you manage shared directories. We do need to find these things, and that's what `config_user.yaml` is for. Each user maintains an **unversioned** [YAML file](http://yaml.org/) with these sorts of specifications. Each script uses its associated YAML-parsing module to read these specifications each time the script is run.

#### What do I put in `config_user.yaml`?
If you try to build a directory without `config_user.yaml`, we'll copy a template to your current working directory. You can always switch back to this template by deleting your current `config_user.yaml` and rerunning.
There's no "default" for `config_user.yaml` because it depends on system specifications and user preferences. Three things we do recommend keeping in `config_user.yaml` are the name of your Stata executable, the location of a [SCons cache directory](http://scons.org/doc/2.0.1/HTML/scons-user/c4213.html), and the location of a release directory. These fields don't have to be specified if you're not using them, and we'll prompt you for their values at runtime if you've forgotten to specify them and they're necessary. A Mac example where Example_User is running a factory-fresh StataMP and has local access to directories named cache/template and release on Dropbox would be
#### What do I put in `config_user.yaml`?
```YAML
stata_executable: stata-mp
cache_directory: /Users/Example_User/Dropbox/cache/template
release_directory: /Users/Example_User/Dropbox/release/
```
There's no "default" for `config_user.yaml` because it depends on system specifications and user preferences. Three things we do recommend keeping in `config_user.yaml` are the name of your Stata executable, the location of a [SCons cache directory](http://scons.org/doc/2.0.1/HTML/scons-user/c4213.html), and the location of a release directory. These fields don't have to be specified if you're not using them, and we'll prompt you for their values at runtime if you've forgotten to specify them and they're necessary.
#### What is `config_global.yaml`?
The `config_global.yaml` tracks paths, specifications, variables, and software checks that are constant across users. We treat this file in the same manner as `config_user.yaml`, except that we do version `config_global.yaml`.
One important function of `config_global.yaml` is that it tracks the version of [gslab-python](https://github.com/gslab-econ/gslab_python) you expect your users to have installed.
#### How do I handle data external to my repository?
We are agnostic about how you incorporate external data into the template. There's no custom builder for these assets, by design. Our suggestions:
Expand Down
31 changes: 10 additions & 21 deletions analysis/SConstruct
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ import os
import sys
import atexit
import yaml

sys.path.append('../config')
sys.dont_write_bytecode = True # Don't write .pyc files

# Setup
from configuration import configuration
[mode, vers, cache_dir, PATHS, pythonpath] = configuration(ARGUMENTS)
[mode, cache_dir, CONFIG, executable_names, prereqs, pythonpath] = configuration(ARGUMENTS)
import gslab_scons as gs

# Define the SCons environment
Expand All @@ -20,16 +21,14 @@ env = Environment(ENV = {'PATH': os.environ['PATH'], 'PYTHONPATH': pythonpath},
# 'BuildR' : Builder(action = gs.build_r),
# 'BuildStata' : Builder(action = gs.build_stata),
})
# Store PATHS from configuration
env['PATHS'] = PATHS
# Load environment variables from configuration
env['CONFIG'] = CONFIG
env['executable_names'] = executable_names
# Only computes hash if time-stamp changed
env.Decider('MD5-timestamp')
# Extensions to be used when scanning for source files in BuildLyx.
env.EXTENSIONS = ['.eps', '.pdf', '.lyx']
SourceFileScanner.add_scanner('.lyx', Scanner(gs.misc.lyx_scan, recursive = True))
# Load Stata executable if Stata builder is defined
if 'BuildStata' in env['BUILDERS'].keys():
env['stata_executable'] = gs.misc.load_yaml_value('config_user.yaml', 'stata_executable')
# Export environment
Export('env')
# Additional mode options
Expand All @@ -38,23 +37,13 @@ if mode == 'cache':

# Logging (except on dry run)
# Log build process
gs.log.start_log(mode, vers)
gs.log.start_log(mode, CONFIG['global']['gslab_version'])
atexit.register(gs.log.end_log)
# Log input directories
gs.log_paths_dict(PATHS)
# Log final state of repository (numerous settings stored as debrief_args)
require_lfs = gs.misc.load_yaml_value('config_global.yaml', 'prereq_git-lfs')
debrief_args = {
'look_in' : 'release;source', # Folders to look in for large versioned files
'file_MB_limit_lfs' : 2, # Soft limit on file size (w/ LFS)
'total_MB_limit_lfs' : 500, # Soft limit on total size (w/ LFS)
'file_MB_limit' : 0.5, # Soft limit on file size (w/o LFS)
'total_MB_limit' : 125, # Soft limit on total size (w/o LFS)
'lfs_required' : require_lfs, # Check if the repo requires LFS
'git_attrib_path' : '../.gitattributes', # Location of .gitattributes
'MAXIT' : 10, # Max number of files in directory to log
'log' : 'state_of_repo.log' # Log name
}
gs.log_paths_dict(CONFIG)
# Log final state of repository
debrief_args = CONFIG['global']['scons_debrief_args']
debrief_args['lfs_required'] = bool('git_lfs' in prereqs)
atexit.register(gs.scons_debrief, args = debrief_args)

# Run sub-trees
Expand Down
38 changes: 36 additions & 2 deletions analysis/config_global.yaml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# GSLab Python Version
gslab_version: 4.1.0

# Build Directories
build:
prepare_data: build/prepare_data
Expand All @@ -8,5 +11,36 @@ source:
prepare_data: source/prepare_data
descriptive: source/descriptive

# Switch to 'Yes' if it's required and 'No' if it's not.
prereq_git-lfs: Yes
# Input Directories (will record contents)
input: None

# Executable names
executable_names:
python: None
r: None
stata: None
matlab: None
lyx: none
latex: None

# If True, check that application is installed/up-to-date before SCons run
prereq_checks:
git_lfs: Yes
gslab_python: Yes
python: Yes
r: No
stata: No
matlab: No
lyx: No
latex: No

# Repository logging at end of SCons run
scons_debrief_args:
look_in: release;source # Folders to look in for large versioned files. Assumes ; seperator.
file_MB_limit_lfs: 2 # Soft limit on file size (w/ LFS)
total_MB_limit_lfs: 500 # Soft limit on total size (w/ LFS)
file_MB_limit: 0.5 # Soft limit on file size (w/o LFS)
total_MB_limit: 125 # Soft limit on total size (w/o LFS)
git_attrib_path: ../.gitattributes # Location of .gitattributes
MAXIT: 10 # Max number of files in directory to log
log: state_of_repo.log # Log name
52 changes: 52 additions & 0 deletions analysis/release/create_data.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@

___ ____ ____ ____ ____ (R)
/__ / ____/ / ____/
___/ / /___/ / /___/ 15.1 Copyright 1985-2017 StataCorp LLC
Statistics/Data Analysis StataCorp
4905 Lakeway Drive
MP - Parallel Edition College Station, Texas 77845 USA
800-STATA-PC http://www.stata.com
979-696-4600 stata@stata.com
979-696-4601 (fax)

Unlimited-user 4-core Stata network license expires 21 Jul 2018:
Serial number: 501509201134
Licensed to: Economics
Stanford University

Notes:
1. Stata is running in batch mode.
2. Unicode is supported; see help unicode_advice.
3. More than 2 billion observations are allowed; see help obs_advice.
4. Maximum number of variables is set to 5000; see help set_maxvar.


running /Applications/Stata/profile.do ...

. do "source/prepare_data/create_data.do"

. version 14

. set more off

. preliminaries

.
. program main
1. yaml read YAML using config_global.yaml
2. yaml global build = YAML.build.prepare_data
3.
. set obs 300000
4. gen x = _n
5. export delimited "$build/data.txt", delimiter("|") replace
6. end

.
. * EXECUTE
. main
yaml_read error: level = 3 but previous level was 1 (prepare_data: build/prepare_data)
invalid syntax
r(100);

end of do-file
r(100);
Loading

0 comments on commit 0162177

Please sign in to comment.