Storage in substrate -Part1
Amit Nadiger
Polyglot(Rust??, C++ 11,14,17,20, C, Kotlin, Java) Android TV, Cas, Blockchain, Polkadot, UTXO, Substrate, Wasm, Proxy-wasm,AndroidTV, Dvb, STB, Linux, Engineering management.
I will be composing a comprehensive trilogy on Substrate storage. Part 1 will center on best practices for determining what to include and exclude from blockchain storage, along with an exploration of transactional storage. In Part 2, we will delve into various storage data structures, including StorageValue, StorageMap, StorageDoubleMap, and StorageNMap. Lastly, Part 3 will provide an in-depth examination of hashmaps, encompassing both cryptographic options like the Blake2 series and non-cryptographic alternatives and when use what.
When developing the runtime (pallets or modules) that runs within the blockchain, one crucial aspect is deciding what information to store and how to do it efficiently.
Storing data, like saving and reading information, can be resource-intensive and slow down the blockchain. So, it's essential to be careful about what and how much you store.
Using hashes in blockchain governance helps save storage space and keeps the blockchain efficient. It's like storing a digital fingerprint of data rather than the entire data itself. This approach is particularly useful when dealing with large pieces of code or files that should only be brought onto the blockchain when necessary.
In blockchain development, it's important to be efficient with storage space. You should avoid storing temporary data that won't be needed if an operation fails. Instead, only store it when it's certain it will be used. Additionally, creating bounds or limits on the amount of data that can be stored for certain actions helps control and optimize the use of storage space on the blockchain.
Deciding what to store in BC.
The key idea here is to be selective and efficient in what we choose to store in the blockchain's runtime. Focus on critical information that's essential for the blockchain's operation and consensus, and avoid storing data that's temporary, large, or unnecessary for maintaining the blockchain's integrity. This helps keep the blockchain lean, fast, and cost-effective.
Using Hashed Data to Reduce Storage:
When you're dealing with a blockchain system, it's crucial to be efficient with the storage of data. Storing data on a blockchain can be costly and slow down the network. One way to be efficient is by using a technique called hashing.
What is Hashing?
Hashing is like a digital fingerprint for data. It takes any amount of data and turns it into a fixed-size string of characters (the hash). This hash is unique for each unique set of data. Importantly, the size of the hash is always the same, regardless of how much data you put in.
Example in Governance:
In blockchain governance, like in Substrate's Democracy pallet, network participants often need to vote on things. Instead of voting on the entire proposal or decision (which could be very long), they vote on the hash of that proposal.
Why Hash the Proposal?
Hashing the proposal makes sense because hashes are always a fixed size, no matter how big the proposal is. So, it's more efficient to store and manage on the blockchain.
Runtime Upgrades Example:
Consider a scenario where a proposal involves a large piece of code (Wasm blob) that needs to be executed during a runtime upgrade. Storing this entire code on the blockchain would be impractical and costly. Instead, the proposal can be tied to the hash of the code.
The Benefit:
Minimizing On-Chain Data Using IPFS:
Another way to use hashes is with IPFS (InterPlanetary File System). Instead of storing large files directly on the blockchain, you store only the hash of the file's location on IPFS. This hash is small and manageable on the blockchain.
Avoid Storing Transient Data:
Multi-Signature Example: For instance, consider a multi-signature feature where several parties need to sign a transaction. You may want to keep track of who has signed it. However, you shouldn't store each signer's information in the blockchain storage right away. Instead, you should only record this information after all the conditions for signing are met. Until then, it's considered temporary and doesn't need to be stored.
Create Bounds for storage:
Creating bounds means setting limits on how much storage space can be used for specific data. It's like saying, "You can store this much data, but not more." This is a powerful way to control the use of storage space in the blockchain, ensuring it doesn't get overloaded.
Example: Let's go back to the multi-signature feature. If you allow an unlimited number of signatories to be tracked, it could potentially lead to excessive data storage. To prevent this, you create a limit, or a bound, on how many signatories can be tracked for a specific operation. Users are required to set this limit as a precondition before the data is stored.
Using Hashed Data in Transactional Storage:
In blockchain systems, storing data efficiently is crucial to keep the network fast and cost-effective. One way to do this is by using a technique called hashing, which creates a unique fingerprint for data.
Transactional Storage Basics:
Substrate's storage architecture is designed with efficiency, data integrity, and security in mind. The runtime storage layer is where application-specific data resides, while the overlay change set manages changes before they are committed. The Merkle Trie provides an efficient structure for data organization and validation, and the key-value database offers permanent storage for blockchain data. Together, these layers enable Substrate to efficiently manage and secure data on the blockchain.
Imagine Substrate's storage layers like a stack, with each layer built on top of the previous one:
.--------------------------.
| Runtime Storage |
| (Application Data) |
| Features: |
| - Utilizes SP-IO |
| - Easy APIs for |
| data management |
`--------------------------'
.--------------------------.
| Overlay Change Set |
| (Temporary Workspace) |
| Features: |
| - Stages changes |
| - Submitted once per |
| block |
| - Manages two types of |
| changes: |
| - Prospective |
| - Committed |
`--------------------------'
.--------------------------.
| Merkle Trie (Patricia)|
| (Efficient Data Store) |
| Features: |
| - Efficient data |
| organization |
| - Used for validating |
| transactions and |
| blocks |
`--------------------------'
.--------------------------.
| Key-Value Database |
| (On-disk Storage) |
| Features: |
| - Stores data on disk |
| - Permanent storage |
| of blockchain data |
`--------------------------'
1. Runtime Storage:
This is the top layer of storage in Substrate, where application-specific data is stored. It's where blockchain modules and smart contracts save their information.
Utilizes SP-IO (Storage and Pallet Input/Output): SP-IO is a set of tools and APIs provided by Substrate for managing storage efficiently. It helps with reading and writing data to this storage layer.
Easy APIs for Data Management: Substrate provides straightforward APIs for managing data, making it easier for developers to interact with and manipulate storage.
2. Overlay Change Set:
The Overlay Change Set is an intermediate layer that acts as a temporary workspace for tracking changes before they are finalized and committed to the Merkle Trie and, ultimately, the database. It plays a crucial role in managing transactions and ensuring data consistency.
Stages Changes: This layer stages changes to data, allowing them to be reviewed and verified before they become permanent.
Submitted Once per Block: Changes are submitted and finalized once per block, ensuring that updates to the blockchain state are batched and executed in an organized manner.
Two Types of Changes:
Prospective Changes: These are potential changes that are proposed but not yet committed. They are tentative and can be discarded if needed.
领英推荐
Committed Changes: These are changes that have been reviewed and approved by the network and are ready to be permanently stored in the blockchain.
3. Merkle Trie (Patricia):
The Merkle Trie, also known as the Patricia Trie, is a data structure used to efficiently organize and store data on the blockchain. It's a critical component for validating transactions and blocks.
Efficient Data Organization: The Merkle Trie organizes data in a way that makes it efficient to prove the state of the blockchain at any given point without storing all data in a single structure.
Used for Validating Transactions and Blocks: In consensus mechanisms like PoA and PoS, the Merkle Trie is used to validate transactions and ensure their integrity, contributing to the blockchain's security and reliability.
4. Key-Value Database(Rocksdb or parity DB ):
The Key-Value Database is the bottom layer of Substrate's storage architecture. It's where the actual data is stored on disk, providing permanent storage for blockchain data.
Stores Data on Disk: This layer is responsible for persistently storing data on physical storage devices, ensuring that data remains accessible even when the system restarts.
Permanent Storage of Blockchain Data: It serves as the long-term storage solution for all blockchain data, including historical data, making it available for retrieval and validation.
Error Handling in Transactional Storage:
If, for some reason, there's an error that prevents a change from being successfully recorded, the data in the transactional storage layer is discarded. The main database remains unchanged to maintain consistency.
Extending Transactional Storage:
Nesting Transactional Layers:
Dispatching a Transactional Storage Layer Call:
How to commit changes without transactional storage layer:
In a blockchain system, it's common to use a transactional storage layer to temporarily store changes before they are committed to the main storage. This approach helps ensure data consistency and prevents errors from affecting the main database. However, there might be cases where you want to commit changes directly to the main storage without using this temporary layer.
The #[without_transactional] macro is a tool that allows blockchain developers to bypass the usual transactional storage layer and commit changes directly to the main storage overlay. However, it should be used with caution, as it can lead to data consistency issues if errors occur after storage modifications. Developers should carefully assess whether a function is safe for this approach based on the specific requirements of their blockchain application. Below are details:
Using the #[without_transactional] Macro:
Example Function:
/// This function is safe to execute without an additional transactional storage layer.
#[without_transactional]
fn set_value(x: u32) -> DispatchResult {
Self::check_value(x)?;
MyStorage::set(x);
Ok(())
}
In this function, the #[without_transactional] macro is applied, indicating that it's safe to directly modify the main storage.
Caution with #[without_transactional]:
How to Access Runtime Storage in Substrate:
In Substrate, the blockchain state is stored using a key-value database. Developers interact with this storage using storage abstractions provided by Substrate. These abstractions simplify the process of reading and writing data in the underlying database.
Substrate provides a structured and organized way to work with blockchain state data through the FRAME Storage module. We can choose from various storage structures to efficiently store and manage data, tailoring our choice to the specific requirements of your blockchain application. These storage items become part of the blockchain's state and play a crucial role in the functionality and integrity of the blockchain.
FRAME Storage Module:
Types of Storage Structures:
Introducing New Storage Items:
Choosing the Right Storage Structure:
Let's discuss details of storage and storing in next article part-2.
Will discuss below in next article:
Referance : Runtime storage structures | Substrate_ Docs