Designing a secure blockchain from scratch is not trivial because of their distributed computing and cryptographic building blocks. Substrate is a framework that enables developers to write their own blockchains using the memory-safe Rust programming language. By providing developers with a library that takes care of some of the challenging aspects of writing blockchains, Substrate alleviates this problem. The design primitives and implementations of Substrate have been heavily scrutinized and reviewed, which gives a high degree of confidence in their security.
Nevertheless, also when building on Substrate, security vulnerabilities remain. We audited a number of Substrate-based blockchains over the past two years and herein share common observations.
Our review experience shows that most security bugs fall into six areas. Most bugs are introduced when implementing extrinsics, and five of the following six categories are indeed example of mistakes in extrinsics. (Only #2- Runtime configuration issues are unrelated to extrinsics.) Extrinsics are state transition functions, callable by blockchain users via transactions. The blockchain developers specify how exactly the state of the blockchain should change when an extrinsic is called.
Description: Logic bugs are the most widespread security bug category our team has identified. Finding a logic bug is a complex process that requires the audit team to have detailed knowledge of the involved protocols and business logic. Logic bugs come in very different form and are often very specific to the problem a given blockchain tries to solve. The distributed and often hostile operating environment of blockchains adds another layer of complexity when trying to prevent logic bugs.
Example: Consider a module that allows users to solve riddles – the first user to correctly solve the riddle gets a prize. This could be implemented via two extrinsics: One to pose a riddle and store on-chain the sha-3 of the solution and one to send the preimage of the respective sha-3 hash. If the preimage is correct, the riddle is considered solved and a user gets a reward. However, the second extrinsic has a problem: Once Alice broadcasts an extrinsic containing a preimage to the network, other nodes can read this extrinsic before it ever makes it onto the chain. The nodes can then front-run Alice’s transaction and claim the reward for themselves. Accounting for these blockchain-specific threats is vital to keep the blockchain secure.
How to identify: Since business logic differs from project to project, it is not an option to automate finding bugs of this category. This makes logic bugs very hard to find since it requires significantly more time to understand all the pieces involved in the business logic. We tackle this challenge by taking the time to create an individual threat model before diving into the code itself – The threat model provides guidance for our team of auditors during their manual code audit process.
Mitigation: Ensure that the code logic protects and implements all security measures against pre-identified threats. Code that is tied to extensive documentation, and an implementation guideline helps in protecting against logic errors in the codebase.
Description: Every Substrate runtime implementation has a set of configuration items that can be set individually according to the business requirements of a project. Some configuration items, if wrongly set, can impact the security of the blockchain.
Example: We have noticed that many projects allow accounts in their blockchain storage with a zero balance:
This allows for so called dust-accounts: Accounts that use up storage although their balance is zero. Such an issue can easily be abused by attackers to clutter the blockchain storage (by creating a large number of accounts and transferring money from account to account). To sync with the blockchain, a node needs to store the whole storage on disk – subsequently, if the storage exceeds a certain size, this makes running a node infeasible, damaging the reputation of the whole project.
How to identify: Runtime configuration issues can easily be identified by going through each runtime configuration item one-by-one and verifying that the configuration values protect against threats laid out in the threat model.
Mitigation: Configure your runtime in a way that protects against threats which you identified in a prior review stage. Often it helps to take inspiration from other extensively audited and established Substrate-based blockchains.
Description: Resources available to blockchains are limited. These resources include memory usage, storage I/O, computation, transaction/block size and state database size. Substrate uses a clever mechanism called weights to manage the time it takes to validate (mine) a block (one unit of weight is one picosecond of execution time, that is 10**12 weight = 1 second). Weights are a main component of most transaction fees systems – to limit spam and being economically sustainable. If block execution takes too long, the validator nodes (the nodes that produce a block, somewhat equivalent to a miner) in the network will miss their chance to generate new blocks and thus halt block production. Therefore, it is crucial to set proper weights for all extrinsics that will be bundled into a block. To facilitate this process, Substrate has a build-in methods to benchmark the runtime of an extrinsic, which means measuring execution time of an extrinsic on a reference machine on how this execution time scales with input size. In our assessments, we often see extrinsic which are assigned a static weight or benchmarking code which does not account for the worst-case complexity of an extrinsic.
Example: Consider an extrinsic, which loops through an array of integers:
The runtime of this extrinsic clearly scales with the length of the `numbers` vector, however, the assigned static weight does not account for that. The weight of the `calculate_sum` extrsinsic thus does not reflect its actual runtime.
How to identify: Static weights are easy to spot when reading code. If there is benchmarking code available, this code should benchmark the worst-complexity of an extrinsic. To identify incorrect benchmarking, identify what inputs would trigger the worst-case behaviour on an extrinsic and ensure that the respective benchmarking code triggers this worst-case scenario.
Mitigation: Substrate recommends using a benchmark approach on a reference hardware to find appropriate weight values. Non-benchmarked weights are not uncommon to find during our audits. Consequently, we strongly support Substrate’s recommendation to benchmark every extrinsic before launching the mainnet.
Description: Arithmetic calculations are known to be error-prone across programming languages. Not using or misusing secure arithmetic functions is a security bug we identified many times during our codebase audits. The most common consequence of arithmetic issues like overflows are wrap arounds, causing program conditions that may lead to unintended behavior (e.g. underweighted extrinsics).
Example: Consider an extrinsic `call_as` which gives users the possibility to execute calls as different users. This extrinsic takes as a parameter the maximum weight of the call that should be dispatched:
Note that calculation can overflow, leading to a severe underestimate of the weight if the calculation 100_000 + max_weight wraps around. An attacker can abuse this in two ways: Either to pay less for the extrinsic execution then she should pay. Or to trick the validator into the execution of an extrinsic with high computation time. Because of the overflow and the subsequent underestimation of the weight, the validator will try to include the extrinsic in a block although there is not enough time (=weight) left. When the validator tries to include an extrinsic into a block, the validator needs to execute the extrinsic – this execution will take so much time that validator will miss the production slot, causing the validator to get slashed and reducing the throughput of the chain.
How to identify: We found that fuzz testing of the Substrate runtime interface is an efficient technique that can be employed to catch arithmetic issues in the runtime early in development or during reviews.
Mitigation: Rust provides function for safe arithmetic, for example the `saturating_add` or `safe_add` function. These functions can be used to prevent integer overflows.
Description: Like the weight system is used to calculate appropriate fees, it is security best practice to take a deposit from the user for any storage items they place on chain. As of now, all the mechanics of managing these storage deposits are custom and manual. This results in project teams forgetting to charge a deposit, an issue we often see during our Substrate project code audits. The issue with not charging a deposit is the same as with dust accounts as described earlier – attackers can clutter the storage and once the storage exceeds a certain size, it becomes infeasible to keep up with the storage requirements to run a node.
Example: Consider a pallet that allows accounts to store an arbitrary account data matching corresponding to their accounts:
The `store_account_info` does not charge a deposit for the stored data – an attacker could thus use this extrinsic to fill up the blockchain storage for very cheap.
How to identify: These issues can be identified in a manual code audit. When reviewing code, identify extrinsics where users can cause persistent storage writes. The users should than be incentivized to clean up storage by a high enough storage deposit – if this is missing, this is most likely a security issue.
Mitigation: To mitigate this issue, ensure that users provide a storage deposit. The deposit should scale with the size of the storage items – the more bytes are stored persistently, the higher the deposit. Some inspiration on how to implement this can be found in Substrate’s code here.
Description: Avoiding panics (a state that the program cannot handle) in the runtime is of particular importance for the security of the blockchain system. Panics can cause a denial-of-service condition for the validator at best and will cause the validator to create invalid blocks at worst.
Example: Consider an extrinsic that handles a payload containing a raw report that needs to be split using a delimiter character.
A missing return value check of the iter.next() operation is causing a panic condition in case an attacker crafts a payload not using the expected input format. In case an input does not contain the correct delimiter 0xAB, it will cause the iteration to finish early and thus iter.next() returns None, which subsequently causes a panic after trying to unwrap None.
How to identify: Most panics conditions in code based on Substrate are detectable by utilizing testing tools (unit, functional and fuzz testing), but the coverage of those tools is always limited. Our audit team often detects these issues that violate the must-not-panic condition in immature code that has not been sufficiently reviewed and tested from the project devs.
How to mitigate: The runtime should be free of panics that can be triggered by an attacker. In the spirit of defensive programming, it is better to return an `Err` rather than panicking when encountering an unexpected state.
Blockchain systems might at first appear overwhelming in complexity, both to its creators and to security researchers. While most blockchain bug are unique and only apply to a single chain, they tend to cluster in six categories.
Being familiar with these categories helps blockchain designers, developers, and auditors reduce security risks.