Some 18+ months back I was writing about distributed networking & distributed P2P storage for GT. I was then, and now, advocating redundant storage of contents that are often addressed by unique hashes along with suitable distributed directory structures in some novel name space. It’s desirable for the network to be able to self-organize for efficient storage and access, replicating & moving contents closer to where searches originate, considering frequency.
A possible route to analysis:
Say that a blob has is H,K,V triple where V is the fixed content blob, K is a header that includes directory features & cryptographic hash of V, and H is the crytographic hash of K, along with possibly some other info about the replication status of K,V. Tree based hierarchies are to be based on self-organizing around K.
Consider the probabilistic analysis of splay trees or similar, which reorganize to minimize avg search depth.
Consider a variation for networking where storage within a node & search within a node are relatively cheaper compared to querying other nodes. Redundancy is permitted as an optimization (in our reality, some minimal amount may be required!)
Consider patterns of access from multiple roots, with some distribution