Funding Is Not Identity: A Conservative Approach to Onchain Entity Linking

Joseph Wang6 min read·Just now

Conservative run output on a 500+500 Arbitrum governance/control cohort. Node colors encode node/evidence types in the visualization, not governance-vs-control cohort membership.

Why Identifier Counts Are Not Entity Counts

Onchain systems love to count identifiers.

Wallet addresses. DIDs(decentralized identifiers). Governance voters. Delegates. Safes.

Those counts are useful, but they are not the same thing as entity counts. A single operator can control many wallets. A DAO team can manage multiple Safes. A bridge, exchange, or relayer can make unrelated addresses look connected.

`unmasking-did` is not a deanonymization tool and it is not a Sybil detector. It is an evidence-preserving linker: every cluster must explain which signals connected the identifiers, and what those signals are allowed to mean. The point is to measure coordination without collapsing it into identity.

Evidence Before Identity

`unmasking-did` separates primary evidence kinds from derived coordination signals.

Primary evidence kinds are stored as typed rows and used directly by link policy:

Derived coordination signals are computed at cluster or feature level — not stored as evidence rows:

It is easy for pipelines to blur these categories. A shared funder becomes evidence of common control; a DID becomes proof of identity. `unmasking-did` keeps the semantics separate: evidence type defines what a link means, while policy thresholds define how cautiously it can be used.

Fixed cohort seed → Alchemy ingest + Safe/ENS/DID enrichment → typed evidence rows → conservative link policy → monitoring layer with run isolation and Jaccard lineage → JSON, graph, and Markdown artifacts.

The Arbitrum Governance/Control Dataset

Before Arbitrum, I tested the pipeline on a small Scroll DAO Safe set: five governance Safes clustered through repeated Safe-owner evidence, while the Admin Multisig stayed separate. That was enough to validate control-style signals, but not enough to test funding-based coordination at scale.

The failure mode I wanted to test next was simple: would funding evidence accidentally turn a governance dataset into one giant entity?

Note that Full run specification can be found here.

The Naive Failure: `funded_by` Creates a 425-Address Cluster

The original linker allowed `funded_by` to behave too much like merge evidence. That produced a giant mixed cluster:

A 425-address component is visually persuasive. But if the glue is common infrastructure — service-like funders, routers, high-fanout sources — the component is not an interpretable entity cluster. It is infrastructure wearing the shape of identity.

Shared funding is not the same as shared control.

A single shared funder can support a “related candidate” label. It should not automatically merge addresses into a cluster.

Conservative Policy: Repeated + Short-Burst + Non-Service

The fix was not to remove `funded_by`. Funding still matters — it just needs to be demoted from naive identity evidence to conservative coordination evidence.

{
    "funded_by_policy": {
        "enabled": true,
        "min_shared_keys": 2,
        "min_short_burst_hits": 2,
        "service_fan_out_cap": 50,
        "short_burst_block_delta": 5000
     },
     "fan_out_cap": 50,
     "min_evidence": 1
}

In plain English:

one shared funder is not enough — require at least two shared `funded_by` keys
those links need at least two short-burst hits (block delta ≤ 5,000)
service-like keys are suppressed — routers, bridges, zero/system addresses do not merge clusters
high-fanout keys above the cap are treated as infrastructure

Timing is the key addition. Two addresses funded by the same source months apart are not the same signal as two addresses funded repeatedly in a tight block window.

Result: Top Cluster 425 → 5

The top cluster went from 425 to 5. In this run, we observed no governance/control mixed clusters among the top clusters shown after conservative filtering. Treat this as a consistency check, not proof of absence: it reflects what survives the policy — including service and fan-out suppression — not everything that ever co-occurred in raw flows.

Two concentration metrics are worth calling out. The Gini coefficient is 0.024 and the Nakamoto coefficient is 477 (computed over inferred cluster sizes: the number of clusters needed to account for 50% of all clustered addresses). Both confirm the output is not secretly dominated by a few large components.

On the ingestion side, 1,988 Alchemy calls produced 14,101 transfer rows and a 10 MB SQLite database, with no `db_size_stop` trigger. The canonical summary, report, and bounded graph artifacts are in here

Advanced metrics panel for run run-1777967119432576, policy profile arbitrum_gov_conservative_v1.

Why the Control Cohort Matters

Without a control group, every common funder can look suspicious. With one, the analysis can ask comparative questions under the same policy and window.

For the count rows below — short-burst clusters, high top-funder share, common-sink clusters, and `candidate_*` clusters — counts are computed over pure-by-cohort multi-address clusters only. Singleton clusters are included in the median cluster size and multi-address cluster-rate summaries.

In this run, control is equal or higher across these coordination-oriented indicators. That does not prove governance is clean; it means governance does not appear elevated relative to this activity-matched baseline under the current conservative policy.

One caveat: the control cohort is ARB token transfer participants in the same block window — an activity-matched comparison, not a random sample of the full Arbitrum population. It controls for on-chain activity level, not for all behavioral differences. The baseline is useful; it is not a population ground truth.

Any claim that governance addresses behave unusually would need to clear this baseline first. The control cohort sets the floor, not the verdict.

The review-priority buckets only rank which clusters are worth reviewing first. They are not risk labels, misconduct labels, or claims emitted by the core linker.

These labels are review-priority buckets, not risk or misconduct labels.

The Viewer

For this project, the viewer is mostly an audit surface: it shows the evidence rows behind a cluster so I can tell whether a merge came from Safe ownership, shared funding, or a policy artifact. Cluster counts alone do not tell you that.

The full narrative viewer at https://egpivo.github.io/unmasking-did/viewer/index.html

Clicking a graph node reveals its evidence type and degree. The interpretation boundary is explicit in the UI: coordination evidence, not claims about humans or malicious actors.

What This Is Not

This is not a Sybil detector. A real attack requires a target objective — airdrop farming, vote manipulation, reputation abuse — and this system does not observe objectives. What it produces is a cluster with a named evidence policy and an inspectable trail. Whether that cluster is interesting depends on context the pipeline does not have.

Next: From Snapshots to Monitoring

A single snapshot is easy to overfit.

For monitoring, the useful questions are whether clusters grow, split, disappear, or change funding patterns across windows.

The pipeline is designed for that shift. Each run has a fixed input snapshot, a frozen block window, and a policy profile that defines how evidence is interpreted. That makes runs comparable, not just reproducible.

The next useful version is boring on purpose: rerun the same policy every month, diff the clusters, and investigate only the changes that survive the same evidence rules.

Note that this post is originally from my blog post.

Funding Is Not Identity: A Conservative Approach to Onchain Entity Linking

Funding Is Not Identity: A Conservative Approach to Onchain Entity Linking

Why Identifier Counts Are Not Entity Counts

Evidence Before Identity

The Arbitrum Governance/Control Dataset

The Naive Failure: `funded_by` Creates a 425-Address Cluster

Conservative Policy: Repeated + Short-Burst + Non-Service

Result: Top Cluster 425 → 5

Why the Control Cohort Matters

The Viewer

What This Is Not

Next: From Snapshots to Monitoring

NexaPay — Accept Card Payments, Receive Crypto

Related Articles

Rep. Steven Horsford pitches PARITY Act as 'durable floor' for crypto tax at Consensus Miami

Is “Learn-to-Earn” Sustainable?

Solana’s 'Alpenglow' upgrade could arrive next quarter, co-founder Yakovenko says

Middle East tensions drive WTI crude oil prices higher, impacting Fed rate cut odds

Figure targets Fannie Mae and Freddie Mac in mortgage push, citing massive cost cuts for borrowers

Bernstein cites $4T tokenized credit opportunity for Figure Technology stock