Husky is a distributed computing system designed to handle mixed jobs of coarse-grained transformations, graph computing and machine learning. The core of Husky is written in C++ so as to leverage the performance of native runtime. For machine learning, Husky supports relaxed consistency level and asynchronous computing in order to exploit higher network/CPU throughput.
For more details about Husky, please check our Wiki.
For bugs in Husky, please file an issue on github issue platform.
For further discussions, please send email to support@husky-project.com.
Husky has the following minimal dependencies:
- CMake (Version >= 3.0.2)
- ZeroMQ (including both libzmq and cppzmq)
- Boost (Version >= 1.58)
- A working C++ compiler (clang/gcc Version >= 4.9/icc/MSVC)
- TCMalloc (In gperftools)
- GLOG (Latest version, it will be included automatically)
Some optional dependencies:
- libhdfs3 C/C++ HDFS Client
- MongoDB C++ Driver (Version legacy 1.1.2)
Download the latest source code of Husky:
$ git clone https://github.com/husky-team/husky.git
Or download latest release in Releases Notes.
We assume the root directory of Husky is $HUSKY_ROOT
. Go to $HUSKY_ROOT
and do a out-of-source build using CMake:
$ cd $HUSKY_ROOT
$ mkdir release
$ cd release
$ cmake -DCMAKE_BUILD_TYPE=Release .. # CMAKE_BUILD_TYPE: Release, Debug, RelWithDebInfo
$ make help # List all build target
$ make -j{N} Master # Build the Husky master
$ make $ApplicationName # Build the Husky application
It is available to compile static or shared library for those projects based on Husky.
$ make -j{N} husky # Build static library for default
$ cmake .. -DBUILD_SHARED_LIBRARY
$ make -j{N} husky-shared # Build shared library
Husky is supposed to run on any platform. Configurations can be stored in a configure file (INI format) or can be the command arguments when running Husky. An example file for configuration is like the following:
# Required
master_host=xxx.xxx.xxx.xxx
master_port=yyyyy
comm_port=yyyyy
# Optional
log_dir=path/to/log
hdfs_namenode=xxx.xxx.xxx.xxx
hdfs_namenode_port=yyyyy
# For Master
serve=1
# Session for worker information
[worker]
info=master:3
For single-machine environment, use the hostname of the machine as both the master and the (only) worker.
For distributed environment, first copy and modify $HUSKY_ROOT/scripts/exec.sh
according to actual configuration. scripts/exec.sh
depends on pssh
.
Run ./Master --help
for helps. Check the examples in examples
directory.
First make sure that the master is running. Use the following to start the master
$ ./Master --conf /path/to/your/conf
In the single-machine environment, use the following,
$ ./<executable> --conf /path/to/your/conf
In the distributed environment, use the following to execute workers on all machines,
$ cp $HUSKY_ROOT/scripts/exec.sh .
$ ./exec.sh <executable> --conf /path/to/your/conf
If MPI has been installed in the distributed environment, you may use the following alternatively,
$ cp $HUSKY_ROOT/scripts/mpi-exec.sh .
$ ./mpi-exec.sh <executable> --conf /path/to/your/conf
Husky provides a set unit tests (based on gtest 1.7.0) in core/
. Run it with:
$ make HuskyUnitTest
$ ./HuskyUnitTest
Do the following to generate API documentation,
$ doxygen doxygen.config
Or use the provided script,
$ ./scripts/doxygen.py --gen
Then go to html/
for HTML documentation, and latex/
for LaTeX documentation
Start a http server to view the documentation by browser,
$ ./scripts/doxygen.py --server
Copyright 2016-2017 Husky Team
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.