Formal Verification of a Staking Oracle — Certora Prover Continued
--
From trusted price sources to economic security, buckets, rewards, and slashing
From Whitelist Oracle to Staking Oracle
In the previous oracle verification post, I started with the simplest design in the Speedrun Ethereum Oracles challenge: a WhitelistOracle. That contract was mostly about structural correctness. Could the owner add and remove oracle contracts? Did the oracle list stay consistent after swap-and-pop? Did stale prices get filtered before calculating the median?
The next checkpoint is a much richer target: StakingOracle.
Here, correctness is no longer just “does the array contain the right oracle addresses?” The protocol now has economic state:
- oracle nodes must stake ORA tokens before reporting
- reports are grouped into block buckets
- each node may report at most once per bucket
- buckets must be finalized by recording a median
- reporters can claim rewards
- outliers can be slashed
- inactive nodes lose effective stake over time
- nodes can exit after a waiting period
That means the verification target moved from a mostly trust-based oracle to an incentive-based oracle. The core question became:
Can the staking, reporting, reward, exit, and slashing lifecycle preserve the accounting and state-machine promises the protocol relies on?
This post walks through the Certora suite I built for StakingOracle, what it proves, what it does not prove, and how mutation testing shaped the final result.
The Protocol: StakingOracle
At a high level, the oracle works like this:
struct OracleNode {
uint256 stakedAmount;
uint256 lastReportedBucket;
uint256 reportCount;
uint256 claimedReportCount;
uint256 firstBucket;
bool active;
}
struct BlockBucket {
mapping(address => bool) slashedOffenses;
address[] reporters;
uint256[] prices;
uint256 medianPrice;
}Nodes register by staking ORA:
function registerNode(uint256 amount) public;They report prices into the current block bucket:
function reportPrice(uint256 price) public;Someone later finalizes a past bucket by recording the median:
function recordBucketMedian(uint256 bucketNumber) public;If a node’s price is more than 10% away from that median, it can be slashed:
function slashNode(
address nodeToSlash,
uint256 bucketNumber,
uint256 reportIndex,
uint256 nodeAddressesIndex
) public;Nodes can also add stake, claim rewards, and exit:
function addStake(uint256 amount) public;
function claimReward() public;
function exitNode(uint256 index) public;On paper, this is still a teaching challenge. But the design already has the ingredients that make real DeFi verification interesting: dynamic arrays, mappings, token escrow, delayed finalization, replay prevention, slashable offenses, and view functions whose answers depend on historical state.
What Could Go Wrong?
Before writing CVL, I wrote the failure modes in plain English. This step matters because Certora will happily prove the wrong thing if I ask the wrong question precisely.
For this oracle, the main questions were:
- Can an inactive or unregistered node report a price?
- Can a node report twice in the same bucket?
- Can a node skip finalizing an earlier report bucket and keep reporting forever?
- Can the node address array drift away from
nodes[address].active? - Can bucket reporter and price arrays become misaligned?
- Does
registerNodeescrow exactly the amount it records as stake? - Does
addStakeincrease stored stake by exactly the transferred amount? - Does
claimRewardmint exactly one ORA per unclaimed report? - Does
exitNodepay at most the effective stake and deactivate the node? - Does
slashNodereduce stake by the expected penalty and pay the slasher exactly 10% of that penalty? - Does
getEffectiveStakefloor at zero after inactivity penalties? - Does the outlier view exclude honest reporters and already-slashed reporters?
- Is exactly 10% deviation safe, while greater than 10% is slashable?
Those questions became five focused spec families:
Sanity.specValidState.specStateMachine.specAccounting.specViews.spec
I grouped these properties into their own StakingOracle suite so the assumptions, artifacts, mutation history, and modeling debt were easy to read as one campaign.
Why This Needed a Richer Model
The WhitelistOracle suite already needed structural reasoning over a dynamic array. StakingOracle adds more moving parts:
nodeAddressesis a dynamic registrynodes[address]is the authoritative per-node mapping- each bucket has parallel dynamic arrays:
reportersandprices - each bucket also has a mapping of already-slashed offenses
- ORA token transfers affect stake accounting
- ORA minting affects rewards
- block number controls the current bucket and inactivity penalties
This is what makes the model richer than the WhitelistOracle model. The hard part is not contract size. The hard part is that correctness depends on relationships between multiple storage objects and multi-step lifecycle flows.
The suite uses a verification harness for read-only observability and a Certora-friendly ORA model so token flows can be checked directly instead of hidden behind broad summaries.
The important harness getters expose proof-only state such as:
nodeAddresses[index]andnodeAddresses.length- bucket
reporters[index]andprices[index] - bucket reporter and price lengths
- bucket median
- bucket slashed flags
- individual fields from
nodes[address]
This does not change production behavior. It gives the prover enough visibility to talk about the protocol’s internal state.
Part 1: Sanity
I started with reachability.
The Sanity family does not try to prove the deepest correctness properties. It asks whether important flows can succeed under reasonable setup:
- a funded account can register
- a registered account can report
- a reporter can claim a reward
This is a small but useful first gate. If these rules fail, the issue is usually setup, linking, token modeling, or an impossible precondition. There is no point debugging a beautiful accounting rule if the prover cannot even reach registerNode.
The final sanity run was green:
Sanity v2: verifiedI did not run mutation testing for Sanity. That was a deliberate choice. Sanity is mostly a reachability family; mutation testing is more useful on semantic families like Accounting, StateMachine, ValidState, and Views.
Part 2: Valid State
The ValidState family checks the shape of the protocol’s storage.
The obvious invariant I wanted was:
Active nodes in
nodeAddressesshould agree withnodes[address].active.
But this is where formal verification becomes humbling. A property can be true in the reachable states we care about and still be awkward or non-inductive as a global invariant. The initial approach tried to force too much as standalone invariants. The final version uses a bounded preservation rule instead:
rule successfulCallsPreserveBoundedNodeAddressShape(
env e,
method f,
calldataarg args,
address trackedNode
)That rule checks the node-list shape across successful calls within the configured loop bound. It covers the key structural expectations without pretending we have an unbounded proof over every possible array length.
The suite also keeps the bucket arrays aligned:
invariant bucketReportersAndPricesStayAligned(uint256 bucketNumber)That invariant matters because bucket reports are represented as parallel arrays:
address[] reporters;
uint256[] prices;If those arrays ever diverge in length, prices[i] no longer necessarily belongs to reporters[i]. That would be a serious slashing and outlier-detection problem.
The final ValidState run was green:
ValidState v7: verifiedMutation testing found an important lesson here.
The first ValidState mutation run looked disappointing: 10 of 10 mutants survived. If I only looked at that number, I might conclude that the ValidState rules were weak.
But after triage, the result meant something more specific. The random mutants did not mostly mutate the things ValidState was designed to protect. They landed on behavior that belonged to other families, such as reward math, reporting transitions, or token-flow assumptions. A node-list structural rule is not supposed to kill a reward-accounting mutant. So the result was low-signal rather than a clean indictment of the ValidState family.
That changed how I used mutation testing. Instead of asking “did this family kill random mutants anywhere in the contract?”, I asked a sharper question:
If I mutate the storage-shape promises that ValidState is responsible for, does the ValidState family catch it?
For example, one focused structural mutant removed the registration append:
nodes[msg.sender] = OracleNode({
stakedAmount: amount,
lastReportedBucket: 0,
reportCount: 0,
claimedReportCount: 0,
firstBucket: getCurrentBucketNumber(),
active: true
});
// Mutant: nodeAddresses.push(msg.sender) is missingThat mutant leaves the mapping saying “this node is active,” but the public node registry does not include the node. Accounting might still look fine, and a reward rule might not notice it. But ValidState should notice because its job is to keep the mapping and registry consistent. In the final suite, the structural rules kill this kind of mutant.
The improvement was adding a small structural layer to the spec:
definition listedInBoundedPrefix(address node) returns bool = ...;
definition boundedListedEntriesActiveAndNonzero() returns bool = ...;
definition boundedListedEntriesUnique() returns bool = ...;
definition boundedActiveNodeRepresented(address node) returns bool = ...;
rule successfulCallsPreserveBoundedNodeAddressShape(...)That rule says: before a state-changing call, assume the bounded node list is well shaped; after any successful state-changing call, the bounded node list must still be well shaped. This is a preservation rule, not a free pass. It does not say “pretend the contract is correct.” It says “from a good structural state, no successful function call may break the structure.”
The suite also has a direct registration rule:
rule registerCreatesActiveListedNode(env e, uint256 amount)That rule checks the base transition: after registerNode(amount) succeeds, the new node must be active and must appear at the newly appended registry index. So if the production contract forgot nodeAddresses.push(msg.sender), registration would not pass this rule.
In plain English, the combined ValidState layer checks:
- every listed node should be nonzero
- every listed node should be active
- listed nodes should not be duplicated
- a tracked active node should appear in the bounded list
- registration creates an active listed node
So when a mutant removes nodeAddresses.push(msg.sender), the active node can no longer be found in the bounded registry and the rule fails. The same structural layer also catches removal bugs, such as forgetting to clear active or forgetting to pop the removed node from the array.
After creating focused structural mutants, the suite killed 4 of 4:
ValidState focused structural mutation: killed 4 / 4Those focused mutants touched exactly the kind of behavior ValidState is supposed to protect: node-list append, node removal, active flags, and bucket reporter/price alignment.
Part 3: State Machine
The StateMachine family is about lifecycle rules.
The main properties are:
- inactive nodes cannot report
- inactive nodes cannot add stake
- registered nodes report at most once per bucket
- successful reporting updates the bucket and report count
- reporting in a later bucket requires the previous reported bucket median to be recorded
- current and future buckets cannot be finalized
- bucket median recording stores the sorted median
- slashing marks the offense and reduces stake
- exiting deactivates the node
Two rules capture the heart of bucket progression:
rule registeredNodeReportsAtMostOncePerBucket(...)
rule reportingInLaterBucketRequiresPriorMedian(...)The second rule is especially important. Without it, a node could keep reporting into later buckets while earlier reported buckets were never finalized. That would make the reward, inactivity, and slashing model much harder to reason about because reports would pile up without the protocol committing to their medians.
The final StateMachine run was green:
StateMachine v5: verifiedMutation testing again changed the suite. The first targeted mutation run killed only 3 of 10 mutants.
The killed mutants showed that some important rules were already working:
- changing the prior-bucket median guard was killed by
reportingInLaterBucketRequiresPriorMedian - deleting
node.reportCount++was killed bysuccessfulReportUpdatesBucketAndCount - deleting
_removeNodeduring exit was killed byexitDeactivatesNode
But five survivors were real StateMachine responsibilities. They were not random “maybe interesting” mutants anymore; they pointed at specific lifecycle promises that needed sharper rules:
recordBucketMedianalways reverting for current/future bucket checks- deleting
priceArray.sort()before median calculation - making a valid
slashNodepath always revert on the report-index guard - making a valid
slashNodepath always revert on the already-slashed guard - changing slash stake reduction to subtract zero
The workflow was:
- Run mutation on the StateMachine family.
- Classify each survivor by responsibility.
- Ignore the survivors that belonged to Accounting or token-modeling.
- Copy the exact StateMachine-owned survivors into a manual mutation set.
- Add rules that express the missing lifecycle promises.
- Rerun only those exact survivors.
For example, the deleted-sort mutant was:
uint256[] memory priceArray = bucket.prices;
// Mutant: priceArray.sort() is missing
uint256 medianPrice = priceArray.getMedian();
bucket.medianPrice = medianPrice;That survivor told me the suite needed to prove not just “recording a median can succeed,” but “recording a median stores the sorted median.” So I added a rule that seeds an intentionally unsorted three-price bucket:
highPrice, lowPrice, medianPriceThen it calls recordBucketMedian(bucketNumber) and asserts:
assert getBucketMedian(bucketNumber) == medianPrice;If sorting is removed, that rule fails because getMedian() is now applied to an unsorted array.
The slash survivors were handled the same way. I added a staged valid slash path and asserted that a valid slash:
- does not revert
- marks the offense as slashed
- reduces stake by
MISREPORT_PENALTY - does not deactivate a node when the remaining stake is still positive
That is what slashMarksOffenseAndReducesStake checks.
After those v5 rules were added, the exact manual survivor rerun killed all five:
StateMachine manual survivor mutation: killed 5 / 5The remaining initial survivors were not really StateMachine gaps. They were Accounting-owned reward math or honest-token transfer-failure modeling boundaries.
This is one of the practical lessons from the campaign: mutation numbers are not meaningful until you classify the mutants by family and by threat model.
Part 4: Accounting
The Accounting family is the most important part of this suite.
For a staking oracle, accounting is where a harmless-looking bug can become a real exploit:
- recording more stake than was transferred
- allowing rewards to be claimed twice
- paying more on exit than the node’s effective stake
- reducing a slashed node’s stake without paying the expected slasher reward
- paying the slasher from the wrong source or by the wrong percentage
The final Accounting rules cover the core token-flow promises:
rule registerEscrowsExactStake(...)
rule addStakeIncreasesStoredStakeAndEscrow(...)
rule claimRewardMintsUnclaimedReports(...)
rule exitPaysAtMostEffectiveStake(...)
rule exitBeforeWaitingPeriodReverts(...)
rule slashNodeReducesStakeAndPaysReward(...)The shape of these rules is intentionally direct. For example:
after
registerNode(amount)succeeds, stored stake increases by exactlyamount, and the oracle's modeled ORA balance increases by exactlyamount.
For rewards:
claimRewardmints exactlyunclaimedReportCount * REWARD_PER_REPORT.
For slashing:
slashNodereduces stake bymin(stakedAmount, MISREPORT_PENALTY)and pays the slasher exactly 10% of the penalty.
The slash rule is verified in a harness-staged finalized bucket. That is an honest modeling boundary: the suite proves the slash accounting once the slashable setup exists, but it does not yet prove an entire production-only multi-call sequence from registration to report to median recording to slashing.
The final Accounting run was green:
Accounting v7: verifiedMutation testing was strong here. A first entrypoint-focused mutation run killed 6 of 8 mutants. The two survivors both mutated the exitNode waiting-period guard. That led to a new rule:
rule exitBeforeWaitingPeriodReverts(...)After adding that rule, the rerun killed all targeted Accounting mutants:
Accounting targeted mutation v4: killed 8 / 8This is exactly what I want from mutation testing. It did not just produce a score. It pointed to a missing property, the suite got stronger, and the new rule killed the survivor.
Part 5: Views
View functions are easy to underestimate. They do not mutate state, so they can feel less dangerous than slashNode or exitNode. But in an oracle, views are part of the product's truth surface.
The Views family covers:
getCurrentBucketNumbergetEffectiveStakegetPastPricegetLatestPrice_checkPriceDeviatedbehavior through exposed harness helpersgetOutlierNodes
The effective stake formula is especially important:
effectiveStake = stakedAmount - missedBuckets * INACTIVITY_PENALTYwith a floor at zero.
The suite verifies that inactive nodes have zero effective stake, active nodes match the missed-bucket formula, and the edge case where reported buckets exceed the expected bucket count returns the stored stake instead of underflowing.
The price-deviation edge is also explicit:
- exactly 10% deviation is not slashable
- greater than 10% deviation is slashable
That matters because the production constant is:
uint256 public constant MAX_DEVIATION_BPS = 1000;or 10% in basis points. Getting the boundary wrong would either slash honest reporters at the limit or let slightly-too-far reporters escape.
The final Views run was green:
Views v8: verifiedThe Views mutation story was mixed and useful.
The suite eventually killed the prior effective-stake survivor by adding:
rule effectiveStakeReportsBeyondExpectedReturnsStake(...)For getOutlierNodes, focused mutation killed the validCount mutants, but two loop-index mutants remained under optimistic_loop. I tried strict-loop baselines at higher bounds, but the original rule itself failed on loop unwinding. That means those survivors are not clean evidence of a production bug. They are documented loop-modeling debt:
Views outlier focused mutation: killed 2 / 4
Residual: two loop-index mutants under optimistic_loopThis is an important distinction. A survivor can mean “the spec is weak,” “the mutant is equivalent,” “the model boundary hides the behavior,” or “the original cannot be cleanly checked under that loop strategy.” In this case, the debt is real, but it is about dynamic-array loop proof engineering, not a confirmed StakingOracle defect.
What The Suite Proves
Here is the honest coverage statement I would use for this campaign:
The
StakingOracleCertora suite verifies the core node lifecycle, bounded registry structure, bucket reporter/price alignment, report-state gates, ORA stake escrow, reward minting, exit payout bounds, staged slash accounting, effective-stake formula, recorded-median views, and main outlier filtering semantics under an honest ORA model and bounded loop assumptions.
More concretely:
- Node registration: successful registration creates an active listed node with exact stake escrow.
- Duplicate registration: duplicate node registration reverts.
- Node list shape: successful calls preserve bounded node-address shape.
- Bucket structure: reporter and price arrays stay aligned.
- Reporting: inactive nodes cannot report, and active nodes cannot report twice in one bucket.
- Bucket progression: later reporting requires the previous reported bucket median to be recorded.
- Median recording: current and future buckets cannot be finalized, and past buckets record the sorted median.
- Rewards: reward minting matches unclaimed reports times
REWARD_PER_REPORT. - Exit: early exit reverts; successful exit deactivates the node and pays at most effective stake.
- Slashing: staged slash accounting reduces stake by the expected penalty and pays the slasher 10%.
- Views: bucket number, effective stake, latest/past prices, and deviation edges are covered.
- Outliers: non-deviated and already-slashed reporters are excluded; mixed reporter buckets are checked.
What The Suite Does Not Prove
The final artifacts record four main boundaries.
1. Honest ORA token model
The suite models ORA as an honest ERC20-like token. That is appropriate for this challenge because OracleToken.sol is provided and simple.
It does not model:
- fee-on-transfer behavior
- rebasing behavior
- ERC777-style callbacks
- malicious return values
- fake token contracts
If this were a production protocol accepting arbitrary ERC20 assets, this would be a much larger soundness issue. Here it is documented modeling debt.
2. Bounded loop depth
Structural properties use configured loop bounds. This is normal in many Certora campaigns over dynamic arrays, but it means those checks are bounded evidence, not mathematical proofs over arbitrary array length.
The current suite uses:
loop_iter = 3for the relevant bounded structural reasoning.
3. Staged slash setup
The exact slash accounting rule uses a harness-staged finalized outlier bucket. That proves the arithmetic and state transition once the slashable condition exists.
It does not yet prove the full production-only sequence:
register -> report -> advance bucket -> record median -> slashI considered adding that flow, but for the current blog and challenge context I would keep the suite as-is. The staged slash rule already covers the core accounting risk, and the missing production-only flow is clearly documented as future strengthening work.
So the honest claim is not “the entire slashing lifecycle is fully proven end to end.” The honest claim is that the core slashing accounting and state transition are verified for a finalized outlier bucket, while a production-only multi-call slash flow remains future strengthening work.
4. Outlier loop-index mutation
The suite verifies the semantic behavior of getOutlierNodes for focused cases, including a mixed three-reporter bucket. It also kills the validCount mutants.
Two loop-index mutants remain under optimistic_loop, and strict-loop baselines fail on the original due to unwinding. That is residual proof-engineering debt, not a confirmed production bug.
What I Learned
The WhitelistOracle suite was mostly about list integrity and median behavior. The StakingOracle suite was about lifecycle integrity.
A few lessons stood out:
1. Staking oracle correctness is mostly state-machine correctness
The median calculation matters, but the bigger risk is lifecycle drift:
- a node that should be inactive can still report
- a node reports twice in one bucket
- a bucket cannot be finalized cleanly
- rewards can be claimed against the wrong count
- stake and token balance drift apart
The suite had to model the oracle as a protocol, not just as a price function.
2. Accounting rules should be exact whenever possible
For stake, rewards, exit, and slashing, vague properties are not enough. “Balance does not decrease too much” is weaker than:
the stored stake delta equals the token-transfer delta.
or:
the slasher receives exactly 10% of the penalty.
Exact-delta rules are harder to write, but much more valuable.
3. Non-inductive invariants should become preservation rules
Some structural truths are difficult to maintain as global invariants because of how dynamic arrays, removals, and bounded loops interact with the prover.
Instead of forcing a brittle invariant, I moved the node-address shape property into a successful-call preservation rule. That gave useful evidence without pretending the proof was stronger than it was.
4. Mutation testing needs targeting
The first ValidState mutation run survived everything because the random sample did not target ValidState’s actual responsibilities. The focused structural run killed 4 of 4.
That taught me not to read mutation results mechanically. A mutation run is only useful if the mutant family matches the spec family.
5. Modeling debt is part of the result
A good formal verification report is not just a list of green checks. It is also a map of the assumptions:
- this token model is honest
- this loop proof is bounded
- this slash setup is staged
- this view still has loop-index mutation debt
Those boundaries make the positive claims more credible, not less.
Closing Thought
The StakingOracle is the point where this oracle challenge stops being just about “what is the median price?” and starts being about incentives:
- who is allowed to report
- what they put at risk
- when the protocol accepts their report
- how it rewards honest participation
- how it punishes outliers
Certora was a good fit because those are not single-test questions. They are protocol-shape questions.
The current suite does not prove everything forever and under every possible model. It proves the core lifecycle and accounting properties under explicit assumptions, and it leaves a clear path for strengthening: production-only slash flow, deeper loop reasoning, and a more adversarial token model if the challenge ever moves beyond the provided ORA token.
That is a much stronger position than “the tests pass.”
It is also a natural bridge to the next oracle design: the OptimisticOracle, where correctness moves again, this time from staking economics to assertions, disputes, bonds, and settlement.