Using the Metabase API in Python

Metabase API

The Metabase API can preform any task that can be performed in the GUI, but the documentation is not that great and they don’t provide much in the way of examples. After providing some very brief examples in Python, R, and Node.js they tell you that the best way to learn about the API is to open Chrome developer tools and look at the request and response fields from the Network tab.

The first step is to get the session id from Metabase using the username and password shared with you in 1Password. Replace the xxx and yyy with the username and password. I have this section commented out since I already have a session id. Metabase recommends storing the session id and using it for the full 14 day period it is valid rather than getting a new one each time to prevent rate limiting from too many logins.

import requests
import pandas as pd

# The next step is to get a session token to authenticate all future requests. 
# (The example uses my credentials—you can replace them with your username and password, 
# but be sure that you don’t commit those values to a version control repository.) 
# To get the token out of the result, we convert the latter to JSON and look up the ID. 
# We store this in a dictionary with the right key for future use:
baseurl = 'https://test.metabase.data.paritytech.io/'
# response = requests.post(f'{baseurl}api/session',
#                          json={'username': xxx,
#                                'password': yyy})
                         
session_id = '3ae7b1e2-fc51-44a9-8b4c-348e4e3e292c' #response.json()['id']
headers = {'X-Metabase-Session': session_id}

Looking at a question using the API

Jumping right in, let’s look at a question that already exists before we try to create a new one. All interactions with the API are made using post, get, and put commands. In the api questions are called cards, so here is how we look at question #2:

id = 2
response = requests.get(f'{baseurl}api/card/{id}', headers=headers).json()

Really long JSON strings are hard for me to read, but the API docs do provide the parameters you need to create a question, so lets use those parameters to read the response from question #2:

params = ['visualization_settings','parameters','description','collection_position','result_metadata','collection_id','name','cache_ttl','dataset_query','parameter_mappings','display']
for p in params:
  print("")
  print(f"__{p}__: {response[p]}")

## 
## __visualization_settings__: {'graph.show_values': True, 'graph.y_axis.title_text': 'Transfer Total Amount KSM', 'graph.x_axis.axis_enabled': False, 'graph.label_value_formatting': 'compact', 'graph.dimensions': ['month', 'chain'], 'column_settings': {'["name","sum"]': {'number_style': 'currency'}}, 'graph.metrics': ['sum']}
## 
## __parameters__: []
## 
## __description__: API test
## 
## __collection_position__: None
## 
## __result_metadata__: [{'description': None, 'semantic_type': None, 'coercion_strategy': None, 'unit': 'day', 'name': 'month', 'settings': None, 'field_ref': ['field', 148, {'temporal-unit': 'day'}], 'effective_type': 'type/Date', 'id': 148, 'visibility_type': 'normal', 'display_name': 'Month', 'fingerprint': {'global': {'distinct-count': 17, 'nil%': 0.0}, 'type': {'type/DateTime': {'earliest': '2021-12-31T00:00:00Z', 'latest': '2023-04-30T00:00:00Z'}}}, 'base_type': 'type/Date'}, {'description': None, 'semantic_type': None, 'coercion_strategy': None, 'name': 'chain', 'settings': None, 'field_ref': ['field', 144, None], 'effective_type': 'type/Text', 'id': 144, 'visibility_type': 'normal', 'display_name': 'Chain', 'fingerprint': {'global': {'distinct-count': 1, 'nil%': 0.0}, 'type': {'type/Text': {'percent-json': 0.0, 'percent-url': 0.0, 'percent-email': 0.0, 'percent-state': 0.0, 'average-length': 5.0}}}, 'base_type': 'type/Text'}, {'display_name': 'Sum of Transfer Total Amount Usd', 'semantic_type': None, 'settings': None, 'field_ref': ['aggregation', 0], 'name': 'sum', 'base_type': 'type/Float', 'effective_type': 'type/Float', 'fingerprint': {'global': {'distinct-count': 17, 'nil%': 0.0}, 'type': {'type/Number': {'min': 397379.29242924, 'q1': 147617725.47246593, 'q3': 850366889.0496614, 'max': 3788999852.92558, 'sd': 1045910391.165927, 'avg': 736720053.6049192}}}}]
## 
## __collection_id__: None
## 
## __name__: API question
## 
## __cache_ttl__: None
## 
## __dataset_query__: {'database': 2, 'query': {'source-table': 18, 'aggregation': [['sum', ['field', 145, None]]], 'breakout': [['field', 148, {'temporal-unit': 'day'}], ['field', 144, None]], 'filter': ['=', ['field', 144, None], 'astar']}, 'type': 'query'}
## 
## __parameter_mappings__: []
## 
## __display__: bar

Here we see that the JSON keys we are most interested in are:

visualization_settings
result_metadata
dataset_query

All the other keys are simple text blocks like name and description, or can be left blank.

Create a new question

Lets use the same data from question #2 to make a new question with a different name and description:

data = {'visualization_settings': response['visualization_settings'],
        'parameters': None,
        'description': 'API test to archive',
        'collection_position': None,
        'result_metadata': response['result_metadata'],
        'collection_id': None,
        'name': 'API question to archive',
        'cache_ttl': None,
        'dataset_query': response['dataset_query'],
        'parameter_mappings': None,
        'display': 'bar'}
test = requests.post(f'{baseurl}api/card', json = data, headers=headers)
test

## <Response [200]>

The response code of 200 means that the new question was created and you should see it when you refresh Metabase (you are playing along, right?).

Modify a question

Next I wanted to learn how to modify a question and since I was going to be creating a bunch of questions for testing purposes, I also wanted to know how to delete them. Turns out these two tasks are accomplished very similarly. In the example below we use most of the settings for visualization_settings from question #2 above and only change what we need to change (in this case the graph.y_axis.title_text which I change to KSM instead of the correct value of USD. In the data argument to request, I only pass the changed information. I also pass the id. I am not sure if this is necessary since the id is also referenced in the URL. Here is the example:

id = 2
visualization_settings = {'graph.show_values': True, 'graph.y_axis.title_text': 'Transfer Total Amount KSM', 'graph.x_axis.axis_enabled': False, 'graph.label_value_formatting': 'compact', 'graph.dimensions': ['month', 'chain'], 'column_settings': {'["name","sum"]': {'number_style': 'currency'}}, 'graph.metrics': ['sum']}
data = {'visualization_settings': visualization_settings,
        'id': id}
test = requests.put(f'{baseurl}api/card/{id}', json = data, headers=headers)
test

## <Response [200]>

Delete a question

The API has an endpoint to delete a card, but it is deprecated. The docs say to archive the card instead, but it doesn’t give any example. I looked around and couldn’t find an example. As a lark I tried updating archived field to true and it worked. Here is the example:

id = 8
data = {'archived': True,
        'id': id}
test = requests.put(f'{baseurl}api/card/{id}', json = data, headers=headers)
test

## <Response [200]>

The way forward

Looking at the JSON returned from the first card, we see that the database, tables, fields, and questions are all referenced using id numbers, so the main task is to find all the appropriate id numbers we need to build new questions. To that end, I have written some helper functions that return the id numbers for each target. Here are the functions:

def metabase_questions(id=None):
  if id == None:
    response = requests.get(f'{baseurl}api/card', headers=headers).json()
    df = pd.DataFrame(response)
    print(df[['id', 'name']])
    return df
  else:
    print(requests.get(f'{baseurl}api/card/{id}', headers=headers).json())


def metabase_databases(id=None):
  if id == None:
    response = requests.get(f'{baseurl}api/database/', headers=headers).json()
    df = pd.DataFrame(response['data'])
    print(df[['id', 'name']])
    return df
  else:
    print(requests.get(f'{baseurl}api/database/{id}', headers=headers).json())


def metabase_fields(id):
  response = requests.get(f'{baseurl}api/database/{id}/fields', headers=headers).json()
  df = pd.DataFrame(response)
  print(df[['id', 'table_name', 'name', 'display_name']])
  return df


def metabase_tables():
  response = requests.get(f'{baseurl}api/table', headers=headers).json()
  df = pd.DataFrame(response)
  print(df[['id', 'name', 'display_name']])
  return df


def metabase_dashboards(id=None):
  if id == None:
    response = requests.get(f'{baseurl}api/dashboard', headers=headers).json()
    df = pd.DataFrame(response)
    print(df[['id', 'name']])
    return df
  else:
    print(requests.get(f'{baseurl}api/dashboard/{id}', headers=headers).json())

Some of the functions accept an id as a parameter and some do not. If you call any of these helper functions without the id parameter you will get a DataFrame with all the ids listed. If you supply the id number the function will print out the JSON object. Some examples follow.

Dashboards

This function will list all the dashboards. Currently there is only one so it is a short list.

tmp = metabase_dashboards()

##    id            name
## 0   2  Test Dashboard

Databases

This function will list all the datasets available. Currently we have two.

tmp = metabase_databases()

##    id                          name
## 0   2           Analytics PGSQL POC
## 1   3  parity-data-infra-evaluation

Questions

This function lists all the questions.

tmp = metabase_questions()

##    id                     name
## 0   2             API question
## 1  10  API question to archive
## 2   9  API question to archive
## 3  11  API question to archive
## 4  12  API question to archive
## 5  16  API question to archive
## 6  13  API question to archive
## 7  14  API question to archive
## 8  15  API question to archive
## 9   1            Test question

Tables

This function will list all the tables available in Metabase.

tmp = metabase_tables()

##         id                   name           display_name
## 0      835                  acala                  Acala
## 1      617                  acala                  Acala
## 2     2039                  acala                  Acala
## 3     1204           acala_blocks           Acala Blocks
## 4     1033     acala_blocks_debug     Acala Blocks Debug
## ...    ...                    ...                    ...
## 2211   711  xcm_moonbeam_accounts  Xcm Moonbeam Accounts
## 2212  1391  xcm_polkadot_accounts  Xcm Polkadot Accounts
## 2213   181   xcm_sent_to_moonbeam   Xcm Sent To Moonbeam
## 2214  1662             xcm_weekly             Xcm Weekly
## 2215   938        xcm_weekly_view        Xcm Weekly View
## 
## [2216 rows x 3 columns]

Fields

If you already know the table number and just want to associated fields, use this function and pass in the table id.

tmp = metabase_fields(2)

##         id  ...            display_name
## 0       72  ...                   Chain
## 1       73  ...                      ID
## 2       74  ...                    Date
## 3       75  ...         NumTransactions
## 4       76  ...                 Extract
## ..     ...  ...                     ...
## 243  51250  ...          Fee Usd Signed
## 244  51251  ...  Numtransactions Signed
## 245  51252  ...                      ID
## 246  51253  ...                 Extract
## 247  51254  ...                   Chain
## 
## [248 rows x 4 columns]

This is as far as I have gotten on this journey, but now that we know how to get all the table ids and field ids we need, we can start making some questions programmatically and see how they work. Then we can make some by hand and then read the JSON to learn more about how to make new questions. Next time to need to change the label on a dashboard with 20 questions, maybe we can write code to do it for us.

TODO: Learn how to use Chrome developer tools to get more insights into the API commands (if needed).