Bitcoin Spark Framework (BTCSpark)

What is BTCSpark?

BTCSpark is a layer for accessing the Bitcoin Blockchain from Apache Spark.

The goal of BTCSpark is to offer high quality, easy to use, performant, and free software to Bitcoin developers and analysts.

NOTE: BTCSpark is currently unmaintained. BlockSci is a similar project with better performance, available here.

Benchmarks

The following benchhmark finds the Transaction Output Amount Distribution (TOAD). On an AWS 6 node (5 slave, one master) m3.large cluster, with the blockchain in hadoop on ephemera storage, this take 8.4 minutes to run using the nativ_lazy_blockchain implementation.

    block_objs = sb.fetch_chain()
    unlazy = lambda x: x()
    txns = block_objs.map(unlazy)\
                     .flatMap(lambda b: 
                          b.txns)\
                     .map(unlazy)
    txns.flatMap(lambda txn:
                 map(lambda txo:
                     ((txo.value>>14)<<14, 1),
                 txn.tx_outs.map(unlazy)))\
        .reduceByKey(lambda x,y: x+y)\
        .saveAsTextFile("txouts_values")

Finding the BIP100 Blocks takes 5.0 minutes on the same cluster.

    block_objs.map(unlazy)\
              .map(lambda b: b.txns[0]().tx_ins[0]().signature_script)\
              .filter(lambda f: "BIP100" in f)\
              .saveAsTextFile(result_name("BIP100_Blocks"))

Note: Unless you have a lot of memory, or you've reduced the working set largely, it isn't recommended to use caching as the overhead of re-parsing isn't horrible.

License

BTCSpark is released under the terms of the AGPL license. See COPYING for more information. Non-free license may also be purchased from Jeremy Rubin for organizations who are unable to use AGPL licensed software.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
cluster		cluster
src		src
.gitignore		.gitignore
COPYING		COPYING
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bitcoin Spark Framework (BTCSpark)

What is BTCSpark?

Benchmarks

License

About

Releases

Packages

Contributors 2

Languages

License

JeremyRubin/BTCSpark

Folders and files

Latest commit

History

Repository files navigation

Bitcoin Spark Framework (BTCSpark)

What is BTCSpark?

Benchmarks

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages