Skip to main content

Serving execution data

Flow chain data comprises of two parts,

  1. Protocol state data - This refers to the blocks, collection, transaction that are being continuously added to the chain.
  2. Execution state data - This refers to what makes up the execution state and includes transaction events and account balances.

The access node by default syncs the protocol state data and has been now updated to also sync the execution state data. This guide provides an overview of how to use the execution data sync feature of the Access node.

Setup node’s directory

The access node typically has the following directory structure:


_12
$ tree flow_access
_12
flow_access/
_12
├── bootstrap
_12
│ ├── private-root-information (with corresponding AN data)
_12
│ └── execution-state
_12
│ └── public-root-information
_12
│ ├── node-id
_12
│ └── node-info.pub.NODE_ID.json
_12
│ └── root-protocol-state-snapshot.json (the genesis data)
_12
└── data (directory used by the node to store block data)
_12
│ └── execution-data
_12
│ └── execution-state

Setup execution data indexing

First, your node needs to download and index the execution data. There are 3 steps:

  1. Enable Execution Data Sync
  2. Download the root checkpoint file
  3. Configure the node to run the indexer

As of mainnet24 / devnet49, Access nodes can be configured to index execution data to support local script execution, and serving all of the Access API endpoints using local data. There are different setup procedures depending on if you are enabling indexing immediately after a network upgrade, or at some point between upgrades.

Enable Execution Data Sync

This is enabled by default, so as long as you didn’t explicitly disable it, the data should already be available.

  1. Make sure that either --execution-data-sync-enabled is not set, or is set to true
  2. Make sure that you have a path configured for --execution-data-dir, otherwise the data will be written to the running user’s home directory, which is most likely inside the container’s volume. For example, you can create a folder within the node’s data directory /data/execution-data/.

There are some additional flags available, but you most likely do not need to change them.

Option 1: Enabling Indexing at the Beginning of a Spork

Download the root protocol state snapshot

The root-protocol-state-snapshot.json is generated for each spork and contains the genesis data for that spork. It is published and made available after each spork. The download location is specified here under rootProtocolStateSnapshot.

Store the root-protocol-state-snapshot.json into the /bootstrap/public-root-information/ folder.

Download the root checkpoint

The root checkpoint for the network is used by Execution nodes and Access nodes to bootstrap their local execution state database with a known trusted snapshot. The checkpoint contains 18 files that make up the merkle trie used to store the blockchain’s state.

The root checkpoint for each spork is hosted in GCP. You can find the link for the specific network in the sporks.json file. Here’s the URL for mainnet24:

https://github.com/onflow/flow/blob/52ee94b830c2d413f0e86c1e346154f84c2643a4/sporks.json#L15

The URL in that file will point to a file named root.checkpoint. This is the base file and is fairly small. There are 17 additional files that make up the actual data, named root.checkpoint.000, root.checkpoint.001, …, root.checkpoint.016. If you have gsutil installed, you can download them all easily with the following command.


_10
gsutil -m cp "gs://flow-genesis-bootstrap/[network]-execution/public-root-information/root.checkpoint*" .

Where [network] is the network you are downloading for. For example, mainnet-24 or testnet-49.

Once the files are downloaded, you can either move them to /bootstrap/execution-state/ within the node’s bootstrap directory or put them in any mounted directory and reference the location with this cli flag: --execution-state-checkpoint=/path/to/root.checkpoint. The naming of files should be root.checkpoint.*.

Option 2: Enabling Indexing Mid-Spork

Identify the root checkpoint

The root checkpoint for the network is used by Execution and Access nodes to bootstrap their local execution state database with a known trusted snapshot. The checkpoint contains 18 files that make up the merkle trie used to store the blockchain’s state.

Root checkpoints are periodically generated on Flow Foundation execution nodes and uploaded to a GCP bucket. You can see a list of available checkpoints here, or list them using the gsutil command


_10
gsutil ls "gs://flow-genesis-bootstrap/checkpoints/"

The checkpoint paths are in the format flow-genesis-bootstrap/checkpoints/[network]/[epoch number]-[block height]/. Where

  • [network] is the network the checkpoint is from. For example, mainnet or testnet.
  • [epoch number] is the epoch number when the checkpoint was taken. You can find the current epoch number on the flowdiver home page.
  • [block height] is the block height at which the checkpoint was taken. Make sure that the checkpoint you select is from an epoch when your node was part of the network.

Download the root checkpoint

Once you have selected the checkpoint to download, you can download the files. If you have gsutil installed, you can download them all easily with the following command.


_10
gsutil -m cp "gs://flow-genesis-bootstrap/checkpoints/[network]/[epoch number]-[block height]/root.checkpoint*" .

Once the files are downloaded, you can either move them to /bootstrap/execution-state/ within the node’s bootstrap directory or put them in any mounted directory and reference the location with this cli flag: --execution-state-checkpoint=/path/to/root.checkpoint. The naming of files should be root.checkpoint*.

Download the root protocol state snapshot

Access nodes require that the data in the root checkpoint corresponds to the root block in the root-protocol-state-snapshot.json file. It's important to download the snapshot for the correct height, otherwise bootstrapping will fail with an error described in the Troubleshooting section.

You can download the root-protocol-state-snapshot.json file generated by the Execution from the same GCP bucket.


_10
gsutil cp "gs://flow-genesis-bootstrap/checkpoints/[network]/[epoch number]-[block height]/root-protocol-state-snapshot.json" .

Alternatively, you can download it directly from a trusted Access node using the GetProtocolStateSnapshotByHeight gRPC endpoint with the corresponding height. You will get a base64 encoded snapshot which decodes into a json object. At this time, this endpoint is only support using the grpc API.

Store the root-protocol-state-snapshot.json into the /bootstrap/public-root-information/ folder.

Configure the node to run the indexer

Now you have the execution sync setup and the root checkpoint in place, it’s time to configure the node to index all of the data so it can be used for script execution.

There are 2 cli flags that you will need to add:

  • --execution-data-indexing-enabled=true This will enable the indexer.
  • --execution-state-dir This defines the path where the registers db will be stored. A good default is on the same drive as the protocol db. e.g. /data/execution-state

Start your node

Now that all of the settings to enable indexing are in place, you can start your node.

At a minimum, you will need the following flags:


_10
--execution-data-indexing-enabled=true
_10
--execution-state-dir=/data/execution-state
_10
--execution-data-sync-enabled=true
_10
--execution-data-dir=/data/execution-data

For better visibility of the process, you can also add

-p 8080:8080 - export port 8080 from your docker container, so you could inspect the metrics

--loglevel=info - for checking logs.

Notes on what to expect:

  • On startup, the node will load the checkpoint into the execution-state db. For devnet48, this takes 20-30 min depending on the node’s specs. For mainnet24, it takes >45 min. The loading time will increase over time. You can follow along with the process by grepping your logs for register_bootstrap.
  • After the checkpoint is loaded, the indexer will begin ingesting the downloaded execution data. This will take several hours to days depending on if the data was already downloaded and the hardware specs of the node.
  • If your node already had all the data, it will index all of it as quickly as possible. This will likely cause the node to run with a high CPU.

When you restart the node for the first time with syncing enabled, it will sync execution data for all blocks from the network.

Troubleshooting

  • If the root checkpoint file is missing or invalid, the node will crash. It must be taken from the same block as the root-protocol-state-snapshot.json used to start your node.
  • If you don’t set one the --execution-data-dir and --execution-state-dir flags, the data will be written to the home directory inside the container (likely /root). This may cause your container to run out of disk space and crash, or lose all data each time the container is restarted.
  • If your node crashes or restarts before the checkpoint finishes loading, you will need to stop the node, delete the execution-state directory, and start it again. Resuming is currently not supported.
  • If you see the following message then your checkpoint and root-protocol-state-snapshot are not for the same height.

_10
{
_10
"level":"error",
_10
...
_10
"module":"execution_indexer",
_10
"sub_module":"job_queue",
_10
"error":"could not query processable jobs: could not read job at index 75792641, failed to get execution data for height 75792641: blob QmSZRu2SHN32d9SCkz9KXEtX3M3PozhzksMuYgNdMgmBwH not found",
_10
"message":"failed to check processables"
_10
}

  • You can check if the execution sync and index heights are increasing by querying the metrics endpoint:

    _10
    curl localhost:8080/metrics | grep highest_download_height
    _10
    curl -s localhost:8080/metrics | grep highest_indexed_height

Execution Data Sync

The Execution Sync protocol is enabled by default on Access nodes, and uses the bitswap protocol developed by Protocol Labs to share data trustlessly over a peer-to-peer network. When enabled, nodes will download execution data for each block as it is sealed, and contribute to sharing the data with its peers. The data is also made available to systems within the node, such as the ExecutionDataAPI.

Below is a list of the available CLI flags to control the behavior of Execution Sync requester engine.

FlagTypeDescription
execution-data-sync-enabledboolWhether to enable the execution data sync protocol. Default is true
execution-data-dirstringDirectory to use for Execution Data database. Default is in the user’s home directory.
execution-data-start-heightuint64Height of first block to sync execution data from when starting with an empty Execution Data database. Default is the node’s root block.
execution-data-max-search-aheaduint64Max number of heights to search ahead of the lowest outstanding execution data height. This limits the number non-consecutive objects that will be downloaded if an earlier block is unavailable. Default is 5000.
execution-data-fetch-timeoutdurationInitial timeout to use when fetching execution data from the network. timeout increases using an incremental backoff until execution-data-max-fetch-timeout. Default is 10m.
execution-data-max-fetch-timeoutdurationMaximum timeout to use when fetching execution data from the network. Default is 10s
execution-data-retry-delaydurationInitial delay for exponential backoff when fetching execution data fails. Default is 1s
execution-data-max-retry-delaydurationMaximum delay for exponential backoff when fetching execution data fails. Default is 5m

Execution Data Indexer

Below is a list of the available CLI flags to control the behavior of Execution Data Indexer.

FlagTypeDescription
execution-data-indexing-enabledboolWhether to enable the execution data indexing. Default is false
execution-state-dirstringDirectory to use for execution-state database. Default is in the user’s home directory.
execution-state-checkpointstringLocation of execution-state checkpoint (root.checkpoint.*) files.
event-query-modestringMode to use when querying events. one of [local-only, execution-nodes-only(default), failover]
tx-result-query-modestringMode to use when querying transaction results. one of [local-only, execution-nodes-only(default), failover]

Below is a list of the available CLI flags to control the behavior of Script Execution.

FlagTypeDescription
script-execution-modestringMode to use when executing scripts. one of [local-only, execution-nodes-only, failover, compare ]
script-execution-computation-limituint64Maximum number of computation units a locally executed script can use. Default: 100000
script-execution-max-error-lengthintMaximum number characters to include in error message strings. additional characters are truncated. Default: 1000
script-execution-log-time-thresholddurationEmit a log for any scripts that take over this threshold. Default: 1s
script-execution-timeoutdurationThe timeout value for locally executed scripts. Default: 10s
script-execution-min-heightuint64Lowest block height to allow for script execution. Default: no limit
script-execution-max-heightuint64Highest block height to allow for script execution. default: no limit
register-cache-typestringType of backend cache to use for registers [lru, arc, 2q]
register-cache-sizeuintNumber of registers to cache for script execution. Default: 0 (no cache)
program-cache-sizeuint[experimental] number of blocks to cache for cadence programs. use 0 to disable cache. Default: 0. Note: this is an experimental feature and may cause nodes to become unstable under certain workloads. Use with caution.

Resources

FLIP: https://github.com/onflow/flips/blob/main/protocol/20230309-accessnode-event-streaming-api.md

Protobuf: https://github.com/onflow/flow/blob/master/protobuf/flow/executiondata/executiondata.proto