Skip to content

Commit

Permalink
Rename coinbase param and addresses (#82)
Browse files Browse the repository at this point in the history
* Rename coinbase param and addresses

* Add forgotten files
  • Loading branch information
LadyChristina authored Jul 21, 2023
1 parent d6b1452 commit 47b2aa9
Show file tree
Hide file tree
Showing 33 changed files with 216 additions and 263 deletions.
20 changes: 10 additions & 10 deletions docs/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Sample raw Bitcoin data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT block_number as number, block_timestamp as timestamp, coinbase_param, `bigquery-public-data.crypto_bitcoin.transactions`.outputs
SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs
FROM `bigquery-public-data.crypto_bitcoin.transactions`
JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number
WHERE is_coinbase is TRUE
Expand All @@ -30,7 +30,7 @@ Sample raw Bitcoin Cash data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT block_number as number, block_timestamp as timestamp, coinbase_param, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs
SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs
FROM `bigquery-public-data.crypto_bitcoin_cash.transactions`
JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number
WHERE is_coinbase is TRUE
Expand All @@ -45,8 +45,8 @@ Sample raw Cardano data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as coinbase_param, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp, `iog-data-analytics.cardano_mainnet.block`.pool_hash
FROM `iog-data-analytics.cardano_mainnet.block`
SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses
FROM `iog-data-analytics.cardano_mainnet.block`
LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash
WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2020-12-31'
```
Expand All @@ -59,7 +59,7 @@ Sample raw Dash data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT block_number as number, block_timestamp as timestamp, coinbase_param, `bigquery-public-data.crypto_dash.transactions`.outputs
SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dash.transactions`.outputs
FROM `bigquery-public-data.crypto_dash.transactions`
JOIN `bigquery-public-data.crypto_dash.blocks` ON `bigquery-public-data.crypto_dash.transactions`.block_number = `bigquery-public-data.crypto_dash.blocks`.number
WHERE is_coinbase is TRUE
Expand All @@ -74,7 +74,7 @@ Sample raw Dogecoin data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT block_number as number, block_timestamp as timestamp, coinbase_param, `bigquery-public-data.crypto_dogecoin.transactions`.outputs
SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs
FROM `bigquery-public-data.crypto_dogecoin.transactions`
JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number
WHERE is_coinbase is TRUE
Expand All @@ -89,7 +89,7 @@ Sample raw Ethereum data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT number, timestamp, miner as coinbase_addresses, extra_data as coinbase_param
SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers
FROM `bigquery-public-data.crypto_ethereum.blocks`
WHERE timestamp > '2018-12-31'
```
Expand All @@ -102,7 +102,7 @@ Sample raw Litecoin data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT block_number as number, block_timestamp as timestamp, coinbase_param, `bigquery-public-data.crypto_litecoin.transactions`.outputs
SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs
FROM `bigquery-public-data.crypto_litecoin.transactions`
JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number
WHERE is_coinbase is TRUE
Expand All @@ -117,7 +117,7 @@ Sample raw Tezos data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT level as number, timestamp, baker as coinbase_addresses
SELECT level as number, timestamp, baker as reward_addresses
FROM `public-data-finance.crypto_tezos.blocks`
WHERE timestamp > '2020-12-31'
```
Expand All @@ -130,7 +130,7 @@ Sample raw Zcash data are available
They can be retrieved using [Google BigQuery](https://console.cloud.google.com/bigquery) with the following query:

```
SELECT block_number as number, block_timestamp as timestamp, coinbase_param, `bigquery-public-data.crypto_zcash.transactions`.outputs
SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs
FROM `bigquery-public-data.crypto_zcash.transactions`
JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number
WHERE is_coinbase is TRUE
Expand Down
18 changes: 9 additions & 9 deletions docs/mappings.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,18 @@ The name of the `csv` file is the timeframe, over which the mapping was executed
project's output directory (`output/<project_name>/`).

The logic of the mapping depends on the type of clustering we want to achieve. So, different mappings will output
different results, even if applied on the same data. An exception to this is the "no-cluster" mapping, which maps blocks to
coinbase addresses, so it doesn't perform any extra processing on the raw data.
different results, even if applied on the same data. An exception to this is the "no-cluster" mapping (DummyMapping
in the code), which maps blocks to reward addresses, so it doesn't perform any extra processing on the raw data.

## Pool Information

To assist the mapping process, the directory `helpers/pool_information/` contains
To assist the mapping process, the directory `helpers/pool_information/` contains
pool information about the supported projects.

There exist three subdirectories. In each subdirectory there exists a file for
the corresponding ledger data, if such data exists.

`coinbase_tags` defines information about block creators. Each key
`identifiers` defines information about block creators. Each key
corresponds to a tag or ticker, by which the pool is identifiable in its
produced blocks. The value for each key is a dictionary of pool-related
information, specifically its name, a url to its homepage, etc. Each file's
Expand Down Expand Up @@ -89,19 +89,19 @@ The values for each entry are the same as `clusters` in the above pool informati

In our implementation, the mapping of a block uses the auxiliary information as follows.

First, it iterates over all known tags and compares each one with the block's coinbase parameter. If the tag is a
First, it iterates over all known tags and compares each one with the block's identifiers. If the tag is a
substring of the parameter, then we have a match.

Second, if the first step fails, we compare the block's coinbase addresses with known pool addresses and again look for
Second, if the first step fails, we compare the block's reward addresses with known pool addresses and again look for
a match.

In both cases, if there is a match, then: (i) we map the block to the matched pool; (ii) we associate all of the block's
coinbase addresses (that is, the addresses that receive fees from the block) with the matched pool.
reward addresses (that is, the addresses that receive fees from the block) with the matched pool.

In essence, the coinbase parameter is the principal element for mapping a block to an entity and the known addresses is
In essence, the identifiers are the principal element for mapping a block to an entity and the known addresses are
the fallback mechanism.

If there is a match, we also parse the auxiliary information, such as pool ownership or clusters, in order to assign the
block to the top level entity, e.g., the pool's parent company or cluster.

If both mechanisms fail, then no match is found. In this case, we assign the coinbase addresses as the block's entity.
If both mechanisms fail, then no match is found. In this case, we assign the reward addresses as the block's entity.
10 changes: 5 additions & 5 deletions docs/parsers.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,23 +12,23 @@ The output file is stored under `output/<project_name>/parsed_data.json` and is
{
"number": "<block's number>",
"timestamp": "<block's timestamp of the form: yyyy-mm-dd hh:mm:ss UTC>",
"coinbase_addresses": "<address1>,<address2>"
"coinbase_param": "<coinbase parameter>"
"reward_addresses": "<address1>,<address2>"
"identifiers": "<identifiers>"
}
]
```

`number` and `timestamp` are consistent among different blockchains.
`coinbase_addresses` and `coinbase_param` vary, depending on each ledger.
`reward_addresses` and `identifiers` vary, depending on each ledger.

Specifically, `coinbase_addresses` corresponds to:
Specifically, `reward_addresses` corresponds to:

- `Bitcoin`, `Bitcoin Cash`, `Dogecoin`, `Litecoin`, `Zcash`, `Dash`: a string of comma-separated addresses which appear in the block's coinbase transaction with non-negative value (i.e., which are given part of the block's fees)
- `Ethereum`: the block's `miner` field
- `Cardano`: the hash of the pool that created the data, if defined, otherwise the empty string
- `Tezos`: the block's `baker` field

The field `coinbase_param` corresponds to:
The field `identifiers` corresponds to:

- `Bitcoin`, `Bitcoin Cash`, `Dogecoin`, `Litecoin`, `Zcash`, `Dash`: the field `coinbase_param` of the block's coinbase transaction
- `Ethereum`: the block's `extra_data` field
Expand Down
Loading

0 comments on commit 47b2aa9

Please sign in to comment.