RFC-000-008: Veedo randomness for shard selection

loong · October 30, 2020, 7:07am

RFC-000-008

Name: Veedo randomness for shard selection
Category: Protocol
Status: Draft
Overview: Periodically use randomness from Veedo for shard selection, instead of only RenVM’s randomness, allowing corrupted shards to eventually become uncorrupted.

Overview

In the next few months, the introduction of shards to RenVM will result in RenVM using its MPC algorithm for randomness as well as signature generation. Our MPC wiki gives a good overview of how randomness in RenVM works. Before diving into the details of the RFC, it is instructive to give a bit of background on how shard selection will work in RenVM assuming this RFC is not implemented.

Definitions

First, we should recap on some definitions. The RenVM wiki is out-dates on the precise mechanics of shards (mostly a minor change in nomenclature, and some missing details).

The core is a blockchain maintained by RenVM nodes that is responsible for shard selection and map/reducing transactions (and other work) to the shards.
The shards are responsible for execution using the RZL MPC algorithm for key and signature generation.

Essentially, the core tells the shards what to do, and the shards do it. In the context of this RFC, one bit of functionality delegated to the core is of particular interest: shard selection.

Shard Selection

Once per day, an epoch change is triggered and the core undergoes membership selection for itself and all shards for the next epoch. The members of the core in the current epoch engage in an MPC random number generation to create a random seed. MPC random number generation has a safety threshold of 1/3rd; it is secure when less than 1/3rd of the core is adversarial.

The seed is appended to the cryptographic identity of every node and then hashed to produce a sorting identity. Nodes are then sorted from smallest to largest sorting identity and then selected one by one for each shard. The first N nodes will be the members of the core in the next epoch, the next N nodes will be the members of the first shard in the next epoch, and so on.

This works assuming that the core is not corrupted (more than 1/3rd of its members are adversarial). In this case it becomes possible for the core to manipulate shard selection. It is still non-trivial, because the random number must be selected such that the sorting algorithm gives the desired results, but we should consider any form of manipulation unacceptable.

In the worst case, the corrupted core can find a random number such that the core is still corrupted in the next epoch. This implies that the core will be forever corrupted. The goal of this RFC is to bring in external randomness so that the corruption of the core does not necessarily imply its permanent corruption.

Veedo

Veedo is a new verifiable delay function built by StarkWare, but it can also be used as a random number generator. The general idea is that, one in every M epochs would use the Veedo random number generator instead of the MPC algorithm. This means that a corrupted core would — at most — stay corrupted for M epochs.

How It Works

Every M epochs, the Ethereum block hash of the block prior to the one that triggered the new epoch would be passed to Veedo to begin a ~10 minute verifiable delay. This results in a random number that no one can know for ~10 minutes. This delay is important, because it prevents mining attacks against the Ethereum blockchain in an attempt to grind for a specific random number (after ~10 minutes the Ethereum blockchain has moved on, making it very difficult to reorg the chain, making grinding ineffective).

Cost

There is, of course, a cost required to use Veedo. This cost would be paid for by the rewards RenVM generates, meaning that the cost is essentially being paid for by node operators. I think the cost is on the order of $X00 per invocation, but this can be verified with the StarkWare team. The larger M, the lower the cost of implementing this RFC, but the longer a corrupted core could stay corrupted.

RenVM is currently making ~$5K per day, and once shards are enabled, we expect epochs to be approximately one day in length.

Alternative

One alternative to using Veedo is to engage all shards, not just the core, in random number generation. I am not entirely sure what this model looks like, but certainly involving all shards makes random number generation much harder to corrupt (harder than stealing all TLV in RenVM which is obviously a worse failure mode). This would incur none of the costs of Veedo, and would prevent RenVM having any dependencies on it, but would increase the work load required by RenVM itself during epoch changes (using all shards to generate one random number is non-trivial).

preston · October 30, 2020, 8:58am

@loongy, I have come to understand that randomness is a complex issue when it comes to blockchains. I believe Chainlink who is already working with us with Proof of Reserve also has a Verifiable Randomness for blockchains. Cost concerns come up, but frankly I think the DN operators must prioritized security. We can raise fees to cover these costs, and customers will gladly pay up for a product that champions security at a price.

This is an older article that goes into some details abot Veedo’s randomness, but what was a bit concerning is that they mention that it is still a POC

“An important disclaimer: as a PoC, there are no guarantees on uptime, nor on the service’s longevity. This is an exploratory move, intended to solicit comments and ideas from the community.”

DeFiFrog · October 30, 2020, 3:24pm

Let me start off by saying security is obviously the utmost priority for our system. However, this security comes in two forms - technical and economic. With this in mind, I’d like to assume your RFC for implementation in a more decentralized RenVM state for sake of debate.

If epochs are 1 day in length and shards are selected each day, for Veedo to provide uniform, additional protection, it would perhaps require invocation within that same day so we have effectively 2 shard selections per 24 hours.

If RenVM makes $5k per day and Veedo costs $100 or $900 (to do a rough price sensitivity impact analysis), that results in at least 2% to a max of 18% impact to profit.

Unless there is a substantial amount of additional security, this seems like a large burden to economic security. Without my own technical expertise, I hesitate to weigh in with an opinion but hope this helps illustrate the quantitative impact for those that do have that expertise.

Perhaps an alternative path forward would be to prepare implementation but refrain from launching/invoking until a certain threshold of revenues are met? We may be able to afford something like this after we’ve increased TVB through increased revenue (whether by volume from integrations or by value from fee increase).

——

Scenario B, we implement this while epochs are 28 days and shuffle within those 28 days, the impact to the bottom line becomes much less.

I’m curious what your thoughts are to the practical implementation of Veedo (as in when you’d want to implement and at what frequency). This would help provide a better idea of how to measure impact to economics.

—-

To your alternative route, I like the idea of maintaining security functions internally; however, we would need to estimate the security risks of doing so and the technical/economic impact of increased workload. Tough to have an opinion without the estimated impact.

Hope my input contributes to new perspectives.

Big · October 30, 2020, 9:55pm

my initial take is that i like the alternative more. Im not against using veedo, i just think theres a lot of benefit in not having reliance.
I dont see any issue with using veedo now or taking more time to explore both options. I think phase 0 is important but not in a less than ready state given security over all.
I wonder if the RNG could be sold as a side product as well

YaelDo · November 4, 2020, 8:22am

Hi guys, I’m Yael, Veedo’s product manager at StarkWare. I would like to respond to the questions raised here:

VeeDo’s price is 0.35 Eth per invocation that corresponds to a 7-minute delay. For more details, please read VeeDo’s documentation (prices will be available there in a few days).
Regarding Chainlink’s VRF, I would like to emphasize that a key security component of VeeDo is its delay. Consider an attacker attempting to preselect randomness that would benefit it. For concreteness, suppose that randomness ending with 10 zeros is what the attacker Alice wants. Had this randomness arrived from an ideal randomness beacon then Alice would have had a 1/1024 chance of winning. If the randomness is provided by VeeDo, Alice would need to run approximately 1024 invocations of VeeDo, each with a different seed, and each requiring a 7 minute delay to find the output of the delay function. By the time she learns which seed is good for her, it would be very hard to reorg the blocks and change the block hash that was chosen, 7 minutes ago, to serve as the seed. However, with Chainlink’s VRF, Alice may collude with an operator (or compromise its server), use it to calculate the randomness for 1024 different seeds (this part is done in nearly no time using a VRF) and then choose the preferred seed that would benefit her. Summarizing, the delay function of VeeDo means stronger security and better collusion resistance.
VeeDo is currently beta. In the unlikely event that the delay computation fails for some reason, a possible fallback is to provide the block hash or revert to the current solution that Ren uses today (we do think it won’t be often that this solution is invoked).
Regarding StarkWare’s long term vision, we plan to operate VeeDo for as far as we can see. In addition, we plan to expand it to be a decentralized service in the sense that many different operators could serve it and thus remove the dependency on a single operator. So, bottom line, we think the threat of StarkWare dropping support for VeeDo is quite small.

preston · November 4, 2020, 4:53pm

Do you know how your product prices against Chainlinks VRF?

YaelDo:

Regarding Chainlink’s VRF, I would like to emphasize that a key security component of VeeDo is its delay. Consider an attacker attempting to preselect randomness that would benefit it. For concreteness, suppose that randomness ending with 10 zeros is what the attacker Alice wants. Had this randomness arrived from an ideal randomness beacon then Alice would have had a 1/1024 chance of winning. If the randomness is provided by VeeDo, Alice would need to run approximately 1024 invocations of VeeDo, each with a different seed, and each requiring a 7 minute delay to find the output of the delay function. By the time she learns which seed is good for her, it would be very hard to reorg the blocks and change the block hash that was chosen, 7 minutes ago, to serve as the seed. However, with Chainlink’s VRF, Alice may collude with an operator (or compromise its server), use it to calculate the randomness for 1024 different seeds (this part is done in nearly no time using a VRF) and then choose the preferred seed that would benefit her. Summarizing, the delay function of VeeDo means stronger security and better collusion resistance.

Thank you for this comparison interesting how both products address security

How long do you expect to remain in Beta? When do you expect in 2021 to have a fully vetted audited product?

Glad to hear that, looks like a great product you all are building, excited to hear more!

Thomm · November 6, 2020, 5:16pm

If the core gets corrupted once, isn’t that it for the system? Has the damage not already been done? I understand it would be beneficial to de-corrupt the core automatically instead of it remaining corrupted, but if some actor was able to corrupt the core we should assume that they can and will do it again.

Which is not to say we shouldn’t be developing a cure, but prevention seems to be where our effort is best spend.

I do have an idea to contribute with regards to determining the M epochs on when to use the Veedo randomization.

We can determine that X% percent of the darknode income will be spend on this. Once we have collected enough to pay for the cost, it will be used on the next epoch change.

This ensures that we always forego the same % of income to this security feature.

Under the assumption that a period of high volume is more profitable to attack, and vice versa for lower volume, we also adjust our additional security interval (and expenses) accordingly.

loong · November 16, 2020, 12:12am

Indeed, a corrupted core does imply that a bad actor would be able to re-corrupt it again some time in the future. However, there are a few practical things to consider. A corrupted core does not immediately imply stolen funds; the core can only suggest execution to the shards (which independently verify the legitimacy of what they’re told to do). The main mechanism through which a corrupted core steals funds is by selecting a biased random number that selects shards in such a way that the adversary in question now owns one or more shards. The main point to note: corrupting N out of M shards only allows you to steal N/M amount of assets from the system. If the adversary corrupts all M shards, then this adversary already controlled 1/3rd+ of the entire network, so the corrupted core is not the main concern. For some N < M, we are able to mitigate further loss by de-corrupting the core again (and quickly adjusting fees to ensure that further attacks are unprofitable, if they were not already). This would make future corruption less appealing, while mitigating the initial effect.

Topic		Replies	Views
RFC-000-023: Token whitelisting process Requests For Comment \| RFC	26	1595	November 18, 2021
renVM integration with LN General	3	384	December 22, 2020
RFC-000-004: Onboarding new token support and blockchain integrations Requests For Comment \| RFC	12	1162	November 18, 2021
RFC-000-030: Deprecate use of the name "RenVM" Requests For Comment \| RFC	12	831	January 4, 2022
RFC-000-017: Add Snapshot as a signaling mechanism for RIPs Requests For Comment \| RFC	22	1260	April 13, 2021