Skip to content

Version 0.1.1

Latest
Compare
Choose a tag to compare
@stefanvanwouw stefanvanwouw released this 17 Mar 22:59
Merge version 0.1.1 (#36)

* Refactor structure (#1)

* Restructure directories on high-level concepts.

* Fix cross references from docs.

* Copy bdr-data-science-stack contents into the data-science-box to start off with. (#2)

* Merge basics of cents setup and anaconda (#3)

* Copy bdr-data-science-stack contents into the data-science-box to start off with.

* CentOS 7 with virtualbox shared folder

* Added starting point for Jupyterhub in data science box

* Basic single user jupyter working (#4)

* Copy bdr-data-science-stack contents into the data-science-box to start off with.

* JupyterHub with sudospawner working

* Change command for single user server to standalone notebook and use root to run on port 80 for simplification (no need to run as separate user since host only anyway).

* Spark clients installation module (incl. java 8) (#6)

* Update README

* Spark kernels added + conda pre-installed environments. (#7)

* Quick fix nb extension updates not working when vagrant up initially

* Add PYSPARK_PYTHON to kernel (#8)

* Add PYSPARK_PYTHON to kernel

* Overwrite kernel files with new values

* Mount bdr-infra-stack's parent dir as notebook root instead of data-science-box dir. (#9)

* Update README.md (#13)

Extremely usefull tip included

* Disabled requiretty in sudoers to fix sudo spawner as a service (#14)

* Extracted spark_client_kernel from spark_client (#16)

* Refactor to be conform variable conventions (#17)

* init data science hub (#18)

* Add basic Travis CI for box and hub (#19)

Travis will now run the entire box and hub playbook from scratch on every push. This takes approximately 9 minutes to complete. We can think of optimising this later / making trade-offs between full integration testing and smaller role-specific tests.

* Add build status for develop

* Correct build status

* Elastic Search Box (#23)

* refactor to match bdr-infra style

* Add search-box to TravisCI

* added single node data science cluster box with kafka, spark, zookeepr (#22)

* added single node data science cluster box with kafka, spark and zookeeper
* Merged the spark_client tasks from cluster into common components
* Added travis check for new data science cluster box
* added ip's to travis dockers
* user defined network test for travis
* added subnet for travis
* ignoring .pyc files
* removed python compiled file from git

* Ensure UTF-8 locale enabled (#24)

* Configure elastic search to be accessible from outside (#25)

* Install octave and octave-kernel for jupyter (#26)

Looking great. Thanks for the contribution!

* Feature/travis integration (#27)

* Add slack notification

* Try disabling yum update because of time

* Feature/cql box (#28)

* added cql-box

* fixed sudo rights in cql-box tasks

* updated cassandra version in cql-box

* Simplified setting up the cql box

* added cql-box to travis

* fixxed csv, avro and xml support for pyspark

moved csv package import before pyspark-shell execution, this was ignored. Added avro and xml support

* typo update

* Speed up travis build by using git diff to see which modules changed (#30)

* WIP: Feature/embedded execution layer (#31)

Feature/embedded execution layer

* Docker Flow proxy for hosting multiple micro services under one http endpoint (#32)

* Base for gateway or docker flow proxy.

* Change default overlay subnet to not conflict with default aws subnet

* Use rsync folder because of guest addition failures

* Use rsync folder because of guest addition failures

* Add data science api deployment script

* Quick n dirty local docker registry working (#33)

* Update README.md

* Added virtualbox folder syncing instead of default rsync (#34)

Now also works on Windows

* Packer build for data-science-box (#35)

* Packer build for data-science-box

* Global box

* Ensure jupyter is always started after a provision