-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster way to get all address balances #264
Comments
I managed to come up with a faster solution using the mapreduce framework to iterate over all the blocks and assign import blocksci
from collections import Counter
def calculateBlockBals(block):
balances = Counter()
addresses = list(block.outputs.unspent.address)
values = list(block.outputs.unspent.value)
for idx in range(len(addresses)):
addr_type = int(addresses[idx].type)
if addr_type in [0,6]:
continue
elif addr_type == 5:
addr = str(addresses[idx].addresses)
else:
addr = addresses[idx].address_string
if addr in balances:
balances[addr] = balances[addr] + (values[idx] / 1e8)
else:
balances[addr] = values[idx] / 1e8
return balances
def calculateAllBals(chain):
def mapFunc(block):
return [calculateBlockBals(block)]
def reduceFunc(x, y):
return x + y
parts = chain.mapreduce_block_ranges(mapFunc, reduceFunc, cpu_count=14)
return parts
%time data = calculateAllBals(chain)
# CPU times: user 1min 39s, sys: 9.78 s, total: 1min 49s
# Wall time: 3min 32s This seems to be pretty performant. If this isn't the correct way or if there is a better way i'd still be interested to hear about it. Thanks! |
Looks legit. :) BlockSci team can you integrate this implementation into the API? Further, will you support more pubkey related API methods in the future? Thanks |
@simonohanlon101 in v0.5 I think your multithreaded approach is probably the most performant. In v0.6 the
That might be interesting to compare performance-wise with just running
I'll see if I have the time to do some evaluations of these. @jpzk: what do you mean by supporting more API methods? Is there something missing wrt to accessing address data? |
Somewhat related to #192 |
@jpzk The next version of BlockSci (v0.6) will have a generic address_balances = chain.blocks.outputs.where(lambda o: ~o.is_spent).group_by( \
lambda output: output.address, \
lambda outputs: outputs.value.sum \
) And because the aggregation is happening in C++ (and not in Python as in the map_reduce example), it's also pretty fast (some benchmarks). |
Wow, this is great, thank you. I'll try to also run a v0.6 as well. Thank you. |
Awesome, I've added this issue to the FAQ. Closing for now, feel free to reopen if you have follow-up questions. |
What's the fastest way of getting historical balances? |
@joequant |
Is it possible to pass a list of block heights and compute the balances faster than computing individually? @maltemoeser |
@jiagengliu nope, though you could probably write such a function easily based on the existing one |
@maltemoeser I run the script provided by @simonohanlon101 using the same AWS instance type with 1TB HDD. It took 3min CPU time but 40min wall time. I am wondering what a reason could be for having way longer wall time? |
@shun61 most likely the HDD. I'd recommend to use an SSD. |
System Information (if applicable)
AWS: r5.2xlarge
BlockSci version: 0.5
Using AMI: no
Total memory: 64GB
My use-case is that I would like to calculate the current balance of all addresses on the blockchain. I may want to run this query quite a lot, and possibly for historical block heights. My current way to do this would be:
Is there a more efficient (faster) way to do this. I understand blocksci is optimised for iterating over txes rather than addresses. Is there therefore a better way to achieve the desired result?
Many thanks,
Simon
The text was updated successfully, but these errors were encountered: