forked from discoproject/discoproject.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
200 lines (183 loc) · 8.2 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<title>Disco MapReduce</title>
<link rel="stylesheet" href="stylesheets/styles.css">
<link rel="stylesheet" href="stylesheets/googlecode.css">
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no">
<!--[if lt IE 9]>
<script src="//html5shiv.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body>
<div class="wrapper">
<header>
<!-- <h1>Disco</h1> -->
<img src="images/logo_disco_2x.png" style="height: 48px; margin-left: -57px"></img>
<h2>massive data - minimal code</h2>
<!-- <img src="images/diagram.png"></img> -->
<ul>
<li><a href="http://disco.readthedocs.org/en/latest/start/download.html">
Get Disco
</a></li>
<li><a href="https://github.com/discoproject/disco/">
GitHub repository
</a></li>
<li><a href="https://groups.google.com/forum/#!forum/disco-dev">
Google Group
</a></li>
<li>
<a href="http://disco.readthedocs.org">Documentation</a>
<ul>
<li><a href="http://disco.readthedocs.org/en/latest/start/tutorial.html">
Tutorial
</a></li>
<li><a href="http://disco.readthedocs.org/en/latest/start/install.html">
Cluster setup
</a></li>
<li><a href="http://disco.readthedocs.org/en/latest/faq.html">
FAQ
</a></li>
<li><a href="http://disco.readthedocs.org/en/latest/lib/">
API Reference
</a></li>
<li><a href="http://disco.readthedocs.org/en/latest/screenshots.html">
Screenshots
</a></li>
<li><a href="http://disco.readthedocs.org/en/latest/releases.html">
Release Notes
</a></li>
</ul>
</li>
</ul>
</header>
<section>
<div class="blurb">
<p>
Disco is a lightweight, open-source framework for distributed computing based on the <a
href="http://en.wikipedia.org/wiki/MapReduce">MapReduce</a>
paradigm.
</p>
<p>
Disco is powerful and easy to use, thanks to Python.
Disco <a href="http://disco.readthedocs.org/en/latest/howto/ddfs.html">
distributes and replicates your data</a>, and <a
href="http://disco.readthedocs.org/en/latest/overview.html">
schedules your jobs</a>
efficiently. Disco even includes the tools you need to <a
href="http://discodb.readthedocs.org">
index billions
of data points and query them in real-time</a>.
</p>
<p>
Disco was born in <a href="http://research.nokia.com">Nokia Research Center</a>
in 2008 to solve real challenges in handling massive amounts of data. Disco
has been actively developed since then by Nokia and <a href="#users">
many other companies</a> who
use it for a variety of purposes, such as log analysis, probabilistic
modelling, data mining, and full-text indexing.
</p>
</div>
<a href="https://github.com/discoproject/disco"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://camo.githubusercontent.com/e7bbb0521b397edbd5fe43e7f760759336b5e05f/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f6769746875622f726962626f6e732f666f726b6d655f72696768745f677265656e5f3030373230302e706e67" alt="Fork me on GitHub" data-canonical-src="https://s3.amazonaws.com/github/ribbons/forkme_right_green_007200.png"></a>
<div>
<p style="text-align: center"><a class="demo" href="http://goo.gl/j5v3J5">Try Disco (alpha) »</a></p>
</div>
<div class="ebodu">
<h2 style="margin-bottom: 0">Disco in action</h2>
<pre><code class="python">
from disco.core import Job, result_iterator
def map(line, params):
for word in line.split():
yield word, 1
def reduce(iter, params):
from disco.util import kvgroup
for word, counts in kvgroup(sorted(iter)):
yield word, sum(counts)
if __name__ == '__main__':
input = ["http://discoproject.org/media/text/chekhov.txt"]
job = Job().run(input=input, map=map, reduce=reduce)
for word, count in result_iterator(job.wait()):
print word, count
</pre></code>
<p>
This is a fully working Disco script that computes word frequencies in
a text corpus. Disco distributes the script automatically to a cluster,
so it can utilize all available CPUs in parallel. For details, see
<a href="http://disco.readthedocs.org/en/latest/start/tutorial.html">
Disco tutorial</a>.
</p>
<h2>Highlights</h2>
<p>
<ul id="features">
<li>
<div class="ftext">
Easy to
<a href="http://disco.readthedocs.org/en/latest/start/install.html">
install</a> on Linux, Mac OS X, and FreeBSD.
</div>
</li>
<li>
<div class="ftext">
Efficient data-locality-preserving IO,
either over HTTP or the builtin petabyte-scale
<a href="http://disco.readthedocs.org/en/latest/howto/ddfs.html">
Disco Distributed Filesystem</a>.
</div>
</li>
<li>
<div class="ftext">
Supports <a href="http://disco.readthedocs.org/en/latest/faq.html#profiling">
profiling and debugging</a> of mapreduce jobs.
</div>
</li>
<li>
<div class="ftext">
Random access data and auxiliary results through
<a href="http://disco.readthedocs.org/en/latest/howto/oob.html">
out of band results</a>.
</div>
</li>
<li>
<div class="ftext">
Run jobs written in any language using the
<a href="http://disco.readthedocs.org/en/latest/howto/worker.html">
worker protocol</a>.
</div>
</li>
<li>
<div class="ftext">
Build and query indices with billions of keys and values,
using <a href="http://discodb.readthedocs.org">DiscoDB</a>
</div>
</li>
</ul>
...and more! See the <a href="http://disco.readthedocs.org">
documentation</a> for details.
</p>
<p>
Need help with Disco?
We can be reached on our IRC channel
<tt>#discoproject</tt> at <a href="http://freenode.net">Freenode</a> or on
the <a href="http://groups.google.com/group/disco-dev">Disco discussion group</a>,
or by opening an issue at <a href="http://github.com/discoproject/disco">Disco repository at GitHub</a>.
</p>
</section>
<footer></footer>
</div>
<script src="javascripts/scale.fix.js"></script>
<script src="javascripts/highlight.pack.js"></script>
<script>hljs.initHighlightingOnLoad();</script>
<script type="text/javascript">
var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www.");
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
</script>
<script type="text/javascript">
try {
var pageTracker = _gat._getTracker("UA-2269620-4");
pageTracker._trackPageview();
} catch(err) {}
</script>
</body>
</html>