py-substrate-interface

We will be using the py-substrate-interface to make RPC calls from python. This tutorial will show how to extract smart contract code from a substrate blockchain and determine which language it was written in (ink! / ask! / solidity). Some parachains use the Contracts pallet while others use a custom developed EVM pallet, so we will have to change the query parameters as necessary.

Shiden & Rococo

As an example, we will pull a contract from the Shiden network using the Contracts module and the CodeStorage storage function.


from substrateinterface import SubstrateInterface
from datetime import datetime
import pandas as pd
import pandas_gbq
pd.set_option('display.max_columns', None)
pd.options.display.max_colwidth = 100

import sys
sys.path.insert(0, '/srv/dev/')
from pyCrypto.substrate.contracts import *

url = 'wss://shiden.api.onfinality.io/public-ws'; chain='shiden';module='Contracts';storage='CodeStorage'
substrate = SubstrateInterface(url)
result = substrate.query_map(module, storage, block_hash = None, max_results = 1)
rec = result.records[0]
rec[0]
## <scale_info::11(value=0x027d3df505101ca79b42b623f87b30163f773c84b08655bcda2b078ff08cdac7)>
rec[1].value.keys()
## dict_keys(['instruction_weights_version', 'initial', 'maximum', 'code', 'determinism'])

So we can obtain the string representation of the wasm blob using the code key. Note that we need to add the .value suffix to get the string value of the code as opposed to the scale type.


code = rec[1]['code'].value

Here is a sample of the code:

code = ‘0x0061736d01000000017d1460037f7f7f017f60027f7f0060027f7f017f60037f7f7f0060047f7f7f7f0060017f0…’

The wasm blob starts with a 0x and we need to remove that before converting it to hex, which we do using the bytes.fromhex() python function.


code = code.replace("0x","")
hexcode = bytes.fromhex(code)

Here is a sample of the hexcode:

b’00asm}\x03\x7f\x7f\x7f\x01\x7f7f7f\x02\x7f\x7f\x01\x7f7f7f7f`

We use some open-source Github gist code by Marc to heuristically determine what language the smart contract was written in, either ink!, ask!, or solidity. Our slightly modified version is in Github.


getLanguage(hexcode)
## 'Ink!'

As you can see, we have successfully determined that this contract was written in ink!

Karura

Karura does not use the contracts pallet, therefore this process will not work. However, we can still get the code from their EVM pallet using the Codes storage function.


url = 'wss://karura-rpc-1.aca-api.network'; chain='karura';module='EVM';storage='Codes'
substrate = SubstrateInterface(url)
result = substrate.query_map(module, storage, block_hash = None, max_results = 1)
rec = result.records[0]
rec[0].value
## '0x027b94197c73af060a8e90935f9bb19ba34e2c417c578d4103975573a64cb315'
code = rec[1].value
code = code.replace("0x","")
hexcode = bytes.fromhex(code)

Here is a sample of the Karura code:

code = ‘0x608060405234801561001057600080fd5b50600436106100885760003560e01c806394d1dd931161005b5780639…’

and once it is converted into hex:

b’\x80@R415a10W\x00\x80\xfd[P10a88W\x0051c80c11a[W80c’

Since this code is not a WASM blob, we get a ValueError: Magic wasm marker is invalid error when trying to determine the language.


try:
  getLanguage(hexcode)
except ValueError as e:
    print(e)
## Magic wasm marker is invalid

Specifically, the error occurs in the wasm.Module function from the ppci module in python.


from ppci import wasm
try:
  m = wasm.Module(hexcode)
except ValueError as e:
    print(e)  
## Magic wasm marker is invalid

TODO: The heuristic could be improved as there are a small number of contracts that do not match any of the markers to determine the source language.