In non-interventional research, the data source is a core study design decision that determines which patients are observable, which outcomes can be captured, and which biases are introduced before the first record is abstracted. Most studies anchor that decision to a site or network and inherit their structural limitations. Direct-to-patient research starts from a different place.
Rather than routing data collection through an institution, direct-to-patient research has patients participate remotely and directly, authorizing release of their full medical history under federal right-of-access law, across every provider they have seen, regardless of EHR vendor, payer network, or geography. That distinction has specific scientific consequences. It reduces the structural biases that undermine evidence generated by conventional non-interventional studies, enables a data collection approach that site-anchored research cannot support, and produces a cohort that compounds in scientific value over time rather than eroding.
The bias case against site-anchored retrieval
Non-interventional research is vulnerable to four categories of bias with direct implications for regulatory and health technology assessment defensibility: selection bias, information bias, attrition bias, and confounding. As explored in depth in PicnicResearch's whitepaper on reducing bias in non-interventional research, each connects directly to how records are retrieved.
Site-based and network-anchored approaches introduce all four:
- Selection bias arises because enrollment is bounded by which institutions contribute data, systematically overrepresenting patients who have access to specialty care or receive care in an academic setting.
- Information bias occurs when clinical events, exposures, or outcomes occurring outside the network simply don't appear in the record.
- Attrition bias arises when patients transition between healthcare systems or are lost to follow-up, disproportionately affecting those whose clinical circumstances are most likely to change and whose outcomes are therefore most scientifically relevant.
- Residual confounding persists when covariate histories documented outside the study network are unreachable, leaving propensity models incomplete.
In each case the evidence is bound by the site.
How direct-to-patient research reduces bias
Direct-to-patient research reduces all four biases at their source by anchoring retrieval to the patient rather than the site. Change the architecture and you change the bias profile.
Because the patient is engaged directly, the observable population is no longer bounded by geography or network membership. Selection bias is structurally reduced: patients participate from anywhere, so the cohort can reflect the actual distribution of the treated population — including those managed in community settings or across multiple unaffiliated systems — rather than only those who attend contributing institutions.
Data completeness follows the same logic, and with it, information bias. Retrieval runs across every provider a patient has seen, not just those within a contributing network. PicnicResearch's visit capture validation demonstrated at least 87% completeness for specialist visits, confirmed against patient-reported care experience. An ISPOR 2025 comparison of single-network versus multi-site data collection put a concrete number on the gap: in a paroxysmal nocturnal hemoglobinuria cohort, patient-mediated multi-site retrieval captured an average of 207 total visits per patient versus 54 from an alternative advanced retrieval method. For emergency room visits, the counts were 8 versus 1. In a condition where acute events are the primary endpoint, that is not a marginal difference.
Attrition bias is addressed because participation is anchored to the patient rather than the site. When someone changes providers or relocates, the retrieval authorization follows them. Patients who would be censored in a site-based registry or study remain observable.
Because right-of-access retrieval can reach records predating enrollment by five or more years across providers, the covariate history available for adjustment is substantially more complete. Propensity models have more to work with, and residual confounding is reduced accordingly.
The ambispective cohort: what direct-to-patient makes possible
Multi-site retrieval does more than reduce bias. It makes a data collection approach scientifically viable that site-anchored research structurally cannot support: the ambispective cohort, combining retrospective record retrieval with prospective follow-up anchored to the same patients.
A prospective registry starts data collection at enrollment, missing everything before it. A purely retrospective study cannot collect what was never documented within the accessible dataset. The ambispective approach addresses both limits simultaneously. Because the right-of-access authorization covers historical records across providers, not just records generated after enrollment, it can retrieve years of longitudinal data across all prior sites of care, abstracted against a pre-specified protocol. Those same patients are then followed prospectively, with patient-reported outcomes (PROs) and functional measures collected remotely at defined intervals, producing a unified longitudinal dataset that neither approach alone can generate.
The published evidence reflects this. In long-chain fatty acid oxidation disorders, where no validated composite surrogate existed, the Odyssey study integrated major clinical events, laboratory trajectories, and linked PROs to support comparative effectiveness conclusions from real-world records. In hemophilia A, combining medical records with patient-reported treatment experience produced a characterization of disease burden that clinical records or PROs alone could not support.
For natural history studies, this means describing disease progression across a real continuous timeline rather than approximating it from disconnected populations at different disease stages. This is the kind of evidence that is increasingly load-bearing for surrogate endpoint validation under FDA's accelerated approval pathway. It is also the architecture that satisfies ICH M14's requirement that study design, data provenance, and analytical approach be pre-specified before data collection begins.
The cohort itself is a scientific asset
A direct-to-patient cohort gains scientific value over time instead of losing it. In site-based research, loss to follow-up is non-random: the patients most likely to leave are those whose circumstances are changing, which is precisely the population whose long-term outcomes a longitudinal evidence program needs. The result is a cohort that becomes less representative over time exactly as the evidence requirements grow.
Because participation in direct-to-patient research is anchored to the person, the retrieval authorization follows them when their care moves. PicnicResearch maintains 98% annual patient retention because continued participation doesn't require returning anywhere. As regulatory requirements sharpen, as post-marketing commitments come due, as new research questions emerge, the same cohort can be returned to and extended. The evidence base compounds rather than erodes — it becomes a living data asset.
That is what the retrieval architecture ultimately determines: not just which biases are introduced at the start of a study, but whether the evidence it generates can be trusted and built on over time.
This piece draws on PicnicResearch's whitepaper: Reducing bias in non-interventional research with greater visit and data density. To discuss your evidence program, speak with our scientific team.