Over the last decade, sponsors have invested heavily in tokenization and data linkage, and for good reason. The ability to connect de-identified claims, EHR, pharmacy, and lab data at population scale transformed what teams could credibly study. But there was a hidden cliff: data built for population-scale questions doesn't always deliver the long-term evidence regulators, payers, and sponsor teams need after a trial closes.
In our recent Endpoints News webinar, Connection ≠ Continuity: Rethinking Long-Term Evidence in the Age of Tokenization, panelists examined where that gap shows up in real-world data infrastructure, why it stays hidden until it's too late to address, and what it would take to close it.
The discussion stayed grounded in real programs, real failure modes, and the gap between current real-world data infrastructure and what's required to support long-term, decision-grade evidence generation for regulators and payers. Panelists included:
- Ryan Kilpatrick, VP & Head of Global Epidemiology, AbbVie
- Jen Webster, VP of RWD Strategy and Activation, Pfizer
- Donna Rivera, EVP of Clinical Evidence Modernization, Canal Row Advisors; former founding director of FDA's Oncology Real-World Evidence Program
- Dan Drozd, Chief Medical Officer, PicnicResearch
Where tokenization delivers
The panelists were clear that tokenization solves real problems. Linking de-identified claims, EHR, pharmacy, and lab data lets sponsors run population-scale studies without asking patients to re-consent or stay actively engaged. That efficiency has expanded what teams can credibly ask: long-term safety signals, comparative effectiveness across millions of patients, treatment patterns across care settings, and natural history work that would otherwise require enrolling cohorts from scratch.
But when tokenization is applied to long-term, patient-level follow-up, the gap between record connection and patient continuity starts to show.
Where tokenization breaks down for long-term evidence
The field has been focused on answering one question: can we connect records? The question that actually matters is different: not just can we connect records, but can we keep following this patient as their care evolves over time? Panelists named specific places where the assumption that tokenization preserves long-term optionality starts to fray, including:
- Match rate is not completeness. Drozd reframed what a clean linkage report actually tells you: "A match rate isn't the same as completeness. A patient can have a match and still have data that's too thin or fragmented or ambiguous to support whatever evidence generation needs might be at hand."
- Cohort loss isn’t random. Webster named three compounding failure modes that are easy to overlook and tend to affect more data than teams expect. Cohorts that include women of marrying age erode as names change — over a ten-year study, that loss is not random. Rare disease patients can have records spread across dozens of portals (Webster cited one patient with 37), well past what any vendor will realistically link. And as obesity care migrates to telehealth and direct-to-consumer GLP-1 channels, a meaningful share of treatment never enters a traditional health record at all. Each of these is a non-random gap, which is exactly what makes them dangerous in a long-term cohort.
- Selection bias is real. Kilpatrick described tokenized data as looking through binoculars: "You can see what comes into view... What you don't know is what's happening outside of that field of view. You don't know which care you're missing." The harder problem, he noted, is that the patients you can see may differ in unknown ways from the ones you can't — and that selection bias is structurally difficult to identify and harder still to correct.
In each example, the missingness isn't random, and it compounds.
The decisions that matter are made at enrollment
Across the full conversation, one message kept surfacing: the evidence questions that matter most emerge after a trial closes, but the infrastructure decisions that determine whether you can answer them are made before enrollment begins — and once the cohort starts to erode, it's too late to design around it. Rivera, drawing on her experience at FDA, was direct: "The best way to deal with any of these types of issues is always in design. It's always early."Patient-anchored, consent-based follow-up
One solution to these challenges is patient-anchored consent: following the actual patient rather than anchoring to a single care setting or linked record. PicnicResearch built ThumbPrint to operationalize this — establishing a direct patient relationship and authorization during the trial that travels with the participant regardless of where they seek care afterward.
Where tokenization delivers connection, ThumbPrint delivers continuity, preserving visibility across geographic moves, insurance changes, and the fragmented care that quietly erodes record-linked datasets. It can complement tokenization in an evidence strategy, or replace it — and the difference shows when that infrastructure is in place from the start.
Hear the rest of the conversation
Explore what patient-anchored evidence looks like in practice and:
- How to justify evidence-infrastructure investment across a string of assets
- What the FDA looks for when evaluating registry feasibility, and the pre-specification language that holds up under scrutiny
- A practical breakdown of when patient-anchored, consent-based post-trial follow-up is the right tool versus when tokenization still wins on efficiency
- Audience Q&A on whether informed consent at enrollment actually moves recruitment numbers, and what data linkage realistically looks like outside the U.S.
Watch the full webinar on tokenization and long-term evidence →