Blockchain ecosystem for file storage
Navdeep Garg
Chief Executive Officer @revinfotech | Business Consulting, Strategic Consulting
Abstract
It’s a platform for decentralized storage. It enables the formation of storage contracts between peers. Contracts are agreements between a storage provider and their client, defining what data will be stored and at what price. They require the storage provider to prove, at regular intervals, that they are still storing their client’s data. Contracts are stored in a block chain, making them publicly audit-able.
Our Chain is a decentralized cloud storage platform that intends to compete with existing storage solutions, at both the P2P and enterprise level. Instead of renting storage from a centralized provider, peers on chain rent storage from each other. Chain itself stores only the storage contracts formed between parties, defining the terms of their arrangement. By forming a contract, a storage provider (also known as a host) agrees to store a client’s data, and to periodically submit proof of their continued storage until the contract expires. The host is compensated for every proof they submit, and penalized for missing a proof. Since these proofs are publicly verifiable (and are publicly available in the block chain),
N3-General Structure
Its primary departure from our chain lies in its transactions. Our chain can uses a scripting system to enable a range of transaction types, such as pay-to-public-key hash and pay-to-script-hash. It opts instead to use an M–of–N multi-signature scheme for all transactions, eschewing the scripting system entirely. This reduces complexity and attack surface. It also extends transactions to enable the creation and enforcement of storage contracts. Three extensions are used to accomplish this: contracts, proofs, and contract updates. Contracts declare the intention of a host to store a file with a certain size and hash. They define the regularity with which a host must submit storage proofs. Once established, contracts can be modified later via contract updates.
Transactions
A transaction contains the following fields:
inputs and Outputs
An output comprises a volume of Transactions . Each output has an associated identifier, which is derived from the transaction that the output appeared in.
H (t||“output”||i)
Where H is a cryptographic hashing function, and “output” is a string literal. The block reward and miner fees have special output IDs, given by:
H (H (Block Header)||“block reward”)
Every input must come from a prior output, so an input is simply an output ID. Inputs and outputs are also paired with a set of spend conditions. Inputs contain the spend conditions themselves, while outputs contain their Merkle root hash .
Spend Conditions
Spend conditions are properties that must be met before coins are “unlocked” and can be spent. The spend conditions include a time lock and a set of public keys, and the number of signatures required. An output cannot be spent until the time lock has expired and enough of the specified keys have added their signature. The spend conditions are hashed into a Merkle tree, using the time lock, the number of signatures required, and the public keys as leaves. The root hash of this tree is used as the address to which the token are sent. In order to spend the token, the spend conditions corresponding to the address hash must be provided. The use of a Merkle tree allows parties to selectively reveal information in the spend conditions. the time lock can be revealed without revealing the number of public keys or the number of signatures required. It should be noted that the time lock and number of signatures have low entropy, making their hashes vulnerable to brute-forcing. This could be resolved by adding a random nonce to these fields, increasing their entropy at the cost of space efficiency.
Signatures
Each input in a transaction must be signed. The cryptographic signature itself is paired with an input ID, a time lock, and a set of flags indicating which parts of the transaction have been signed. The input ID indicates which input the signature is being applied to. The time lock specifies when the signature becomes valid. Any subset of fields in the transaction can be signed, with the exception of the signature itself (as this would be impossible). There is also a flag to indicate that the whole transaction should be signed, except for the signatures. This allows for more nuanced transaction schemes. The actual data being signed, then, is a concatenation of the time lock, input ID, flags, and every flagged field. Every such signature in the transaction must be valid for the transaction to be accepted.
File Contracts
A file contract is an agreement between a storage provider and their client. At the core of a file contract
Formed. The outcome is a string literal: either “valid proof” or “missed proof”, corresponding to the validity of the proof. The output ID of a contract termination is defined as
H(contract ID||outcome)
Where outcome has the potential values “successful termination” and “unsucessfultermination”, corresponding to the termination status of the contract. File contracts are also created with a list of “edit conditions,” analogous to the spend conditions of a transaction. If the edit conditions are fulfilled, the contract may be modified. Any of the values can be modified, including the contract funds, file hash, and output addresses. As these modifications can affect the validity of subsequent storage proofs, contract edits must specify a future challenge window at which they will become effective. Theoretically, peers could create “micro-edit channels” to facilitate frequent edits; see discussion of micro payment channels, section.
Proof of Storage
Storage proof transactions are periodically submitted in order to fulfill file contracts. Each storage proof targets a specific file contract. A storage proof does not need to have any inputs or outputs; only a contract ID and the proof data are required.
Algorithm
Hosts prove their storage by providing a segment of the original file and a list of hashes from the file’s Merkle tree. This information is sufficient to prove that the segment came from the original file. Because proofs are submitted to the block chain, anyone can verify their validity or invalidity. Each storage proof uses a randomly selected segment. The random seed for challenge window WI is given by:
H (contract ID||H(Bi?1))
Where Bi?1 is the block immediately prior to the beginning of WI.
If the host is consistently able to demonstrate possession of a random segment, then they are very likely storing the whole file. A host storing only 50% of the file will be unable to complete approximately 50% of the proofs.
Block Withholding Attacks
The random number generator is subject to manipulation via block withholding attacks, in which the attacker withholds blocks until they find one that will produce a favorable random number. However, the attacker has only one chance to manipulate the random number for a particular challenge. Furthermore, withholding a block to manipulate the random number will cost the attacker the block reward. If an attacker is able to mine 50% of the blocks, then 50% of the challenges can be manipulated. Nevertheless, the remaining 50% are still random, so the attacker will still fail some storage proofs. Specifically, they will fail half as many as they would without the withholding attack. To protect against such attacks, clients can specify a high challenge frequency and large penalties for missing proofs. These precautions should be sufficient to deter any financially-motivated attacker that controls less than 50% of the network’s hashing power. Regardless, clients are advised to plan around potential Byzantine attacks, which may not be financially motivated.
Closed Window Attacks
Hosts can only complete a storage proof if their proof transaction makes it into the block chain. Miners could maliciously exclude storage proofs from blocks, depriving themselves of transaction fees but forcing a penalty on hosts. Hosts can reasonably assume that some percentage of miners will include their proofs in return for a transaction fee. Because hosts consent to all file contracts, they are free to reject any contract that they feel leaves them vulnerable to closed window attacks.
Arbitrary Transaction
Data each transaction has an arbitrary data field which can be used for any type of information. Nodes will be required to store the arbitrary data if it is signed by any signature in the transaction. Nodes will initially accept up to 64 KB of arbitrary data per block. This arbitrary data provides hosts and clients with a decentralized way to organize themselves. It can be used to advertise available space or files seeking a host, or to create a decentralized file tracker. Arbitrary data could also be used to implement other types of soft forks. This would be done by creating an “anyone-can-spend” output but with restrictions specified in the arbitrary data. Miners that understand the restrictions can block any transaction that spends the output without satisfying the necessary stipulations. Naive nodes will stay synchronized without needing to be able to parse the arbitrary data
Storage Ecosystem
Our chain can relies on an ecosystem that facilitates decentralized storage. Storage providers can use the arbitrary data field to announce themselves to the network. This can be done using standardized template that clients will be able to read. Clients can use these announcements to create a database of potential hosts, and form contracts with only those they trust.
Host Protections
A contract requires consent from both the storage provider and their client, allowing the provider to reject unfavorable terms or unwanted (e.g. illegal) files. The provider may also refuse to sign a contract until the entire file has been uploaded to them. Contract terms give storage providers some flexibility. They can advertise themselves as minimally reliable, offering a low price and a agreeing to minimal penalties for losing files; or they can advertise themselves as highly reliable, offering a higher price and agreeing to harsher penalties for losing files. An efficient market will optimize storage strategies. Hosts are vulnerable to denial of service attacks, which could prevent them from submitting storage proofs or transferring files. It is the responsibility of the host to protect themselves from such attacks.
Availability can be further improved by re hosting file pieces whose hosts have gone offline. Other metrics benefit from this strategy as well; the client can reduce latency by downloading from the closest 10 hosts, or increase download speed by downloading from the 10 fastest hosts. These downloads can be run in parallel to maximize available bandwidth.
Up time Incentives
The storage proofs contain no mechanism to enforce constant uptime. There are also no provisions that require hosts to transfer files to clients upon request. One might expect, then, to see hosts holding their clients’ files hostage and demanding exorbitant fees to download them.
My Clients may request a file at any time, which incentivizes hosts to maximize uptime in order to collect as many rewards as possible. Clients can also incentivize greater throughput and lower latency via proportionally larger rewards. Clients could even perform random “checkups” that reward hosts simply for being online, even if they do not wish to download anything.
Basic Reputation System
Our chain can reliable method for picking quality hosts. Analyzing their history is insufficient, because the history could be spoofed. A host could repeatedly form contracts with itself, agreeing to store large “fake” files, such as a file containing only zeros. It would be trivial to perform storage proofs on such data without actually storing anything. To mitigate this Sybil attack, clients can require that hosts that announce themselves in the arbitrary data section also include a large volume of time locked storage. clients can mitigate the risk of Sybil attacks, as valuable locks are not trivial to create. Each client can choose their own equation for picking hosts, and can use a large number of factors, including price, lock value, volume of storage being offered, and the penalties hosts are willing to pay for losing files. More complex systems, such as those that use human review or other metrics, could be implemented out-of-band in a more centralized setting.
Join our 6th of June Global B2B Conference | Up to 50 Exhibitors | 10 plus sponsor | 200+ Attendees
2 年Navdeep, thanks for sharing!