In our recent research, we investigated decentralized-storage designs that combine erasure coding with Proof-of-Replication.
The rapid growth of sensitive data in the digital age necessitates innovative storage solutions that address the limitations of traditional cloud services such as single points of failure, high bandwidth costs, and lack of transparency in data handling and security. Decentralized storage systems address these limitations by leveraging peer-to-peer (P2P) networks and blockchain technology to eliminate central control, improve data availability and fault tolerance. These systems achieve robustness by fragmenting data, enabling reconstruction even when some fragments are lost [1]. Redundancy mechanisms are critical in decentralized networks because nodes may go offline frequently. We use Reed-Solomon encoding as the erasure coding scheme for redundancy due to its computational efficiency. It is mainly described by two numbers k and n where n is the total number of fragments into which the original file is divided, and k (also known as the number of data fragments) is the minimum number of fragments required to reconstruct the original file. The system can tolerate to lose up to m fragments, where m=n-k (known as parity fragments) [2].
Cryptographic proofs such as Proof-of-SpaceTime (PoSt) [3], Proof-of-Retrievability (PoR) [4], and Proof-of-Replication (PoRep) [5] ensure data integrity and availability. Existing decentralized storage systems like Sia [6] or Storj [2] offer weaker security against deduplication and Sybil attacks. Our work addresses a key research gap by integrating PoRep's robust cryptographic storage with erasure coding to combine strong security with reduced storage overhead. Erasure coding splits data into smaller fragments and distributes them across nodes, which significantly reduces storage overhead. PoRep uses cryptographic protocols to ensure that storage providers allocate unique storage for data D, preventing deduplication. The provider stores D in a sector, where a computationally intensive sealing process transforms D into a uniquely encoded replica, R.
The Proof-of-Replication process has three main phases:
- Encoding – Data is split into nodes and structured in a layered Stacked-DRG graph. Each node is sequentially labeled and encoded using a key derived from graph labels, producing a unique replica.
- Replication – A unique replica is generated using a ReplicaID tied to the provider and sector, ensuring tamper-proof, verifiable storage.
- Merkle Tree & Proof Generation – A Merkle Tree is built over the encoded data to create a commitment. The final PoRep proof includes data, metadata, and provider identity, and is compressed before submission to the blockchain.
We implemented a Python-based system combining erasure coding and PoRep to enable secure and efficient decentralized storage. Using a (k=10, n=14) erasure coding scheme, we observed that encoding time scales linearly with file size, ensuring good scalability. We evaluated the storage efficiency of a (10,14) erasure coding scheme and compared it with the 5x replication method commonly used in systems like Filecoin [3], across various file sizes. Our erasure coding approach required only 40% additional storage (i.e., total of 1.4x the original data size), whereas 5x replication results in 400% additional storage. Despite this significant reduction in storage overhead, our scheme can still tolerate up to 28.5% data loss, demonstrating strong fault tolerance as shown in figure 1. We also evaluated PoRep’s sealing and verification performance, as shown in figure 2, showing that sealing time increases significantly with sector size, highlighting the computationally intensive nature of the sealing process. This supports the security property of PoRep: it ensures that the provider must have already performed the intensive sealing operation and stored the unique sealed replica of the data. Attempting to generate this sealed replica on demand would be computationally infeasible, proving their prior commitment to storage. In contrast, verification time remains low and scales linearly, benefiting from the logarithmic structure of Merkle trees. In conclusion, our work of integration of erasure coding with PoRep offers a secure, fault-tolerant, and storage-efficient solution, ideal for applications like archival and decentralized backup systems.
The person
Shrigouri Navaratna is from India and she came to Germany to study her master’s degree at Mittweida University of Applied Sciences in 2021. She studied Applied Mathematics for Networks and Data Science and did her masters thesis with Blockchain Competence Centre Mittweida (BCCM) on the topic Decentralized Storage System under Professor Andreas Ittner and Mario Oettler. Her thesis focused on researching and analyzing erasure coding, encryption, and storage proofs to enhance the reliability and verifiability of decentralized storage systems. After finishing her master’s in 2024, she started working as a research assistant at BCCM and continued her research and development in the project Decentralized Storage System. Outside of her work and studies, she likes to play badminton and table tennis.
References
[1] N, Racin (2023): Improving Data Availability in Decentralized Storage Systems, University of Stavanger, Norway. ISBN: 978-82-8439-158-8.
[2] Storj Labs (2016): "Storj: A Peer-to-Peer Cloud Storage Network," [Online] github.com/storj/whitepaper. [Accessed: 18.01.2025]
[3] Protocol Labs (2017): “Filecoin: A Decentralized Storage Network”, [Online] filecoin.io/filecoin.pdf. [Accessed: 16.01.2025]
[4] Shacham, H., Waters, B. (2013): Compact Proofs of Retrievability. J Cryptol 26, pg. no. 442– 483
[5] B. Juan, D. David, G. Nicola (2017). “Proof of Replication”, [Online] filecoin.io/proofof- replication.pdf. [Accessed 06.08.2024]
[6] D. Vorick, L. Champine: “Sia: Simple Decentralized Storage”, [Online] sia.tech/sia.pdf. [Accessed 30.03.2025]