How to extract raw hex data from substrate chains

Example transaction

For this tutorial we will decode a swap transaction on Moonbeam using Moonscan as a guide. Here is a screenshot of the relevant event log:

The Name section tells you a lot of important information such as the transaction type and what each field represents:

Swap (index_topic_1 address sender, index_topic_2 address recipient, int256 amount0, int256 amount1, uint160 price, uint128 liquidity, int24 tick)

In this case the transaction type is a swap in the pool DOT / USDt, which is represented by topic0, which is 0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67. The additional fields are as follows:

sender address is topic1
recipient address is topic2
amount0 is data0
amount1 is data1
price is data2
liquidity is data3
tick is data4

In the dotlake database, all of the topic and data fields are in combined into one JSON field called data emitted by the evm.Log event. Here is what is looks like for this example:

[{"address": "0x5daf7f80cc550ee6249a4635c3bb0678e94d3867", "topics": ["0xc42079f94a6350d7e6235f29174924f928cc2ac818eb64fed8004e115fbcca67", "0x000000000000000000000000e6d0ed3759709b743707dcfecae39bc180c981fe", "0x000000000000000000000000d27cece5ce8b4f6fa47a160105d842f97aee227d"], "data": "0xffffffff...

Decoding the topics

The address above is the contract address for the DOT / USDt pool which you can verify on this Stellaswap page.

After the address, there is a topics array with three values. You can get the first one like this:

JSON_EXTRACT(ev.data, '$[0].topics[0]') AS topic0

The last two topics are the sender and recipient addresses, so for those we want to omit all the leading zeros. All hex values are stored as 64-bit numbers with a “0x” prepended, so we need to skip the first 27 characters and then prepend “0x”, as follows:

CONCAT('0x', SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].topics[1]'), 27)) AS sender
CONCAT('0x', SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].topics[2]'), 27)) AS recipient

Decoding the data

Next we have the data JSON field. In the example above it is truncated due to the length, but here is the full string:

0xffffffffffffffffffffffffffffffffffffffffffffffffffffff2a01a27aa0000000000000000000000000000000000000000000000000000000001dcd6500000000000000000000000000000000000000000005f6a629d9c8e352f3cafbb80000000000000000000000000000000000000000000000000000467509cd0192fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffeda45

In decoded form you see this:

field	value
amount0	-919095575904
amount1	500000000
price	1845605485577630350788197304
liquidity	77468489548178
tick	-75195

So how do we decode that long hex string? I will give you a hint by reformatting the above string:

0x
ffffffffffffffffffffffffffffffffffffffffffffffffffffff2a01a27aa0
000000000000000000000000000000000000000000000000000000001dcd6500
000000000000000000000000000000000000000005f6a629d9c8e352f3cafbb8
0000000000000000000000000000000000000000000000000000467509cd0192
fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffeda45

Now we see that it is just 5 separate 64-bit hex strings concatenated together with a “0x” prepended to denote that it is in hex format.

So data0 is ffffffffffffffffffffffffffffffffffffffffffffffffffffff2a01a27aa0.

data1 is 000000000000000000000000000000000000000000000000000000001dcd6500. And so on.

As a quick check you can Google for hex to decimal converter and put one of those string in see what it converts to in decimal representation. For the middle three string this will work well, for the first and the last there is an additional complication (see below).

Convert hex to decimal representation

For small numbers we can convert from hex to decimal within Big Query by casting the hex number as an integer:

SELECT SAFE_CAST('0x000000000000000000000000000000000000000000000000000000001dcd6500' as INT64)

This yields a value of 500000000 which matches the decoded amount1 from Moonscan.

However, due to the number of decimal places, most blockchain-related numbers are more than 64 bits, so we are better off using a user defined function (UDF) to do the conversion. Here is an example:

CREATE TEMP FUNCTION hexToDec(hex_string STRING) 
RETURNS STRING
LANGUAGE js AS """
  return BigInt("0x" + hex_string).toString();
""";

The above UDF will work for most of the hex string, but notice that some of the hex string above start with an ‘f’. That means that it is a negative number. Converting a hex string into decimals is a little more complicated for negative numbers, you have to use something called the two’s complement (don’t ask me, I used chatGPT). To do this you need to convert the hex string into binary, flip every bit and then add 1, and finally convert the resulting binary number into decimal. Here is am improved version of the above UDF that works for both positive and negative hex strings:

CREATE FUNCTION hexToDec(hex_string STRING) 
RETURNS STRING
LANGUAGE js AS """
  if (hex_string[0] === 'f') {
    // Convert hex to binary
    let binaryString = BigInt(`0x${hex_string}`).toString(2).padStart(hex_string.length * 4, '0');
    // If the number is negative, calculate the twos complement
    let invertedBinary = Array.from(binaryString).map(bit => bit === '0' ? '1' : '0').join('');
    let twoComplement = (BigInt(`0b${invertedBinary}`) + BigInt(1)).toString(2);
    return (-BigInt(`0b${twoComplement}`)).toString();
  }
  return BigInt("0x" + hex_string).toString();
""";

Final query

Here is the final query that replicates all the data shown by Moonscan: sender, recipient, amount0, amount1, price, liquidity, and tick.

SELECT TIMESTAMP_MILLIS(timestamp) as time, chain, number as block_number
, JSON_EXTRACT_SCALAR(ev.data, '$[0].address') AS address
, JSON_EXTRACT(ev.data, '$[0].topics[0]') AS topic0
, CONCAT('0x', SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].topics[1]'), 27)) AS sender
, CONCAT('0x', SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].topics[2]'), 27)) AS recipient
, metrics_substrate_etl.hexToDec(SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].data'), 3, 64)) AS amount0
, metrics_substrate_etl.hexToDec(SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].data'), 67, 64)) AS amount1
, metrics_substrate_etl.hexToDec(SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].data'), 131, 64)) AS price
, metrics_substrate_etl.hexToDec(SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].data'), 195, 64)) AS liquidity
, metrics_substrate_etl.hexToDec(SUBSTR(JSON_EXTRACT_SCALAR(ev.data, '$[0].data'), 259, 64)) AS tick
FROM `parity-data-infra-evaluation.dotlake_partition.blocks_polkadot_moonbeam` a,
  UNNEST(extrinsics) AS ex, UNNEST(ex.events) as ev
WHERE block_date='2023-07-21' and number = '4037676'

Different events will have different numbers of topics and data, but using this technique will allow you to decode than all and understand exactly where the data is coming from.