Recently, I and Gleb Naumenko, along with former colleague Greg Maxwell, revealed Minisketch, a software library for reducing bandwidth requirements when syncing data between nodes on distributed systems.
Minisketch was originally developed as a component of a project researching the use of set reconciliation for the sharing of transaction data between nodes on Bitcoin, “Set Reconciliation Relay” or SRR. The goal of SRR is to significantly reduce the bandwidth associated with running a Bitcoin full node.
The team took the decision to release Minisketch separately from SRR because it is expected to be useful for many other applications both inside and outside of the Bitcoin space.
Why Minisketch?
All distributed systems have traditionally had difficulty syncing data between nodes - it’s a lot easier in a centralized system to tell nodes what data they should and shouldn’t have.
For instance, one approach for syncing data between nodes on distributed networks is Invertible Bloom Lookup Tables (IBLT). While IBLT has relatively low CPU demands, this is achieved at the expense of relatively high bandwidth requirements, particularly when the number of differences is small. Minisketch uses a more bandwidth efficient algorithm known as PinSketch.
Compared to other bandwidth-efficient set reconciliation algorithms such as CPISync and Pinsketch’s original implementation, Minisketch uses much less computational power. It is 20 to 100 times faster than PinSketch, and sometimes over 1,000 times faster than CPISync.
How are the improvements achieved?
Set reconciliation, as implemented by Minisketch, is more bandwidth efficient than simply sending the entire list of data, as it allows nodes to produce a mathematical “sketch” of their list instead. The node then sends this to other nodes for comparison with their own lists. The size of the sketch only depends on the expected number of differences between the nodes, and not on the total size of the set. Despite that, it still allows nodes to distinguish with certainty which data they require from other nodes.
If we simplify it to just a single difference, it’s easy to see how this may work: Say I have the set {3,5,7,11}, and you have the set {3,5,7,9,11}, so the difference is {9}. We both compute the sum of our elements, so I obtain 3+5+7+11=26, and you obtain 3+5+7+9+11=35. I send my sum 26 to you, and you subtract it from your sum; the difference is 9. This works, but it is restricted to finding a single difference. Minisketch generalizes this by sending various types of ‘sums’ of the data. The result is that with N different sums, you can find N differences … As long as the number of differences between the sets is not more than the number of ‘sums’ sent, Minisketch will always succeed in finding all the differences.
Minisketch on Bitcoin
The robustness of the Bitcoin network is dependent on ensuring there are enough connections between full nodes to thwart Sybil and partitioning attacks.
Unfortunately, the majority of the data usage of a Bitcoin node (typically between 40% and 70%) is not even spent on transaction data, but just on announcing new transactions to each other to find out which ones to relay. Right now, increasing the number of connections between nodes proportionally increases bandwidth overheads. This limits the number of connections each node can support.
By using set reconciliation it is possible to efficiently find out what transactions haven’t been relayed yet, without needing to announce every single transaction to each and every peer. Bandwidth overheads for finding transactions to relay become independent of the number of connections, possibly increasing the number of connections supported by each node.
The beauty of the solution is that it does not require any changes to Bitcoin’s network consensus rules. SRR will be enabled whenever both sides’ software supports the SRR protocol, and have no negative impact on node operators who don’t.
The SRR protocol is still in the early stages of research and it may be a long time before it sees adoption in the Bitcoin network, but advances like Minisketch represent a very important development in improving Bitcoin full node adoption and accessibility (as well as the optimisation of other distributed networks). Watch this space for updates on the project’s progress!
If you’d like to learn more about Minisketch, check out the Minisketch GitHub repository.