Regulatory networks like EPA AQS collect certified PM2.5 measurements continuously. But the gap between raw sensor readings and reliable conclusions about what caused an air quality event is vast and typically filled by expert judgment, heuristics, or silence.
Existing tools aggregate, visualize, and alert. None produce structured, evidence-bounded interpretations that could withstand scrutiny from a regulator, a public health team, or a community advocacy group asking: what actually happened, and how confident are you?
Current environmental monitoring platforms report what sensors measured. They do not interpret why events occurred, how confidently, or what confounders were present. That interpretive gap is where policy decisions, accountability claims, and health assessments get made, with no systematic framework behind them.
Confident interpretation without site-level corroboration: one sensor elevated, conclusion drawn as if all agreed
No explicit recording of what evidence was missing when a conclusion was absent
Meteorological confounders not integrated: dust storms and wildfires treated identically to local source emissions
No audit trail: the basis for an interpretation cannot be reviewed, challenged, or reproduced
ECF is a deterministic reasoning engine that weighs multiple evidence streams simultaneously and only produces an interpretation when those streams converge. It does not generalize. It does not extrapolate. It works within the evidence that was actually collected.
The core insight is that a PM2.5 elevation event is only trustworthy as a regional transport episode if multiple sites agree on timing and magnitude, meteorological conditions are consistent with transport, and no local confounders are present to explain away the concordance. ECF operationalizes that logic as a deterministic score.
For each PM2.5 episode: an interpretation status (regional transport, inter-site divergence, uncertain, or insufficient evidence), a reliability score R(E) from 0 to 1, and a complete evidence record stored in the Trustworthy Event Registry (TER). Every field in the TER is traceable to a specific computation.
Structured interpretations anchored to specific evidence: each conclusion cites which signals supported it
Principled abstention: when evidence is insufficient, the system records that explicitly instead of guessing
Meteorological integration: ERA5 reanalysis data incorporated to contextualize transport vs. local source episodes
Full reproducibility: the same input data always produces the same output. No stochastic elements.
Four stages transform certified sensor readings into structured, auditable interpretations stored permanently in the Trustworthy Event Registry.
EPA AQS certified PM2.5 data across all available years is ingested and episodically segmented. Each elevation event is identified by duration, magnitude, and site coverage. The evidence base grows as AQS data becomes available.
For each episode, ECF computes a reliability score R(E) from three factors: signal concordance C(E) across monitoring sites, inter-site agreement D(N), and a confounder penalty L(E). When R(E) exceeds threshold, an interpretation is registered.
A real-time tier uses AirNow near-real-time data to surface current air quality conditions for the study region. This tier does not produce TER entries: it is observational context, not certified interpretation.
ERA5 reanalysis meteorological data provides wind direction, boundary layer height, and temperature inversion context. This allows the engine to distinguish regional transport events from local source contributions without relying on probabilistic models.
ECF integrates only certified or reanalysis data sources. No nowcast estimates or modeled concentrations are used in interpretation. Every evidence record in the TER is traceable to a specific source dataset and API call.
Federal Reference Method PM2.5 measurements from two Delaware County monitoring sites (FIPS 42045), sampled hourly at 88101. Submitted to EPA, quality-assured, and certified. Data lags 6-12 months from collection to AQS availability.
Regulatory CertifiedOpen-Meteo ERA5 hourly reanalysis data: 10m wind speed and direction, boundary layer height, surface pressure, and temperature. Available with a 5-day archive lag. Used to classify transport conditions and detect meteorological confounders.
ECMWF ReanalysisEPA AirNow API provides current and recent PM2.5 concentrations for the study area. Used exclusively for the real-time monitoring tier, not for TER interpretation. AirNow data is preliminary and not suitable for certified episode attribution.
Monitoring OnlyThe ECF reliability score R(E) is computed for every episode using a deterministic formula with three interpretable factors. There are no learned weights, no probability distributions, and no opaque features. Every component of R(E) is directly auditable.
The formula encodes a single principle: a PM2.5 episode deserves an interpretation only when the signal is strong (C), multiple sites agree (D), and no confounders undermine the conclusion (L). When that standard is not met, R(E) falls below threshold and the engine abstains.
22% of 116 episodes produced no interpretation in ECF v0.1. These episodes are not failures: they are cases where the evidence was genuinely insufficient. Recording that explicitly is more scientifically defensible than inferring an answer the data cannot support.
C(E)
Signal Concordance: normalized agreement in peak timing and magnitude across monitoring sites in the episode window
D(N)
Inter-Site Agreement: discount factor for episodes where fewer than N=2 independent sites corroborate the elevation simultaneously
L(E)
Confounder Load: penalty for presence of confounding meteorological or source conditions that could explain the elevation without transport
R(E)
Reliability Score: 0 to 1. Interpretations are registered only when R(E) exceeds the calibrated episode threshold.
The TER is the output of ECF: a structured dataset where each row is a PM2.5 episode and each column is either a measured quantity or a derived interpretation field. 34 columns per episode. All fields are traceable to source data.
Four interpretation statuses. Each episode receives exactly one. When the evidence supports a specific conclusion, one of the first two is assigned. When it does not, the system assigns one of the latter two and records which evidence was missing.
Multi-site concordant elevation consistent with upwind source transport. Wind direction and timing align. High R(E). Interpretation: confirmed transport episode.
Sites disagree on magnitude or timing. Spatially heterogeneous signal suggests local source proximity differential rather than regional transport.
Concordance metrics are borderline. Evidence is present but below the confidence threshold needed for a reliable conclusion. Engine abstains.
Fewer than two sites active during the episode window, or meteorological data unavailable. No interpretation is possible from available evidence.
ECF v0.1 · Delaware County PA · 2022-2025
The TEI platform gives direct access to the Trustworthy Event Registry: browse all 116 episodes, inspect per-episode evidence records, and review the ECF interpretation rationale for each event. The platform runs on your local network and is available to authorized research partners.
The TEI platform file is included in this site. Full live data requires the ECF API server running at localhost:8000. Contact us to arrange authorized access for your team.
ECF is designed to be geography-extensible. The framework applies to any multi-site EPA AQS network. Delaware County PA is the first deployment. Partner with us to extend ECF to your region or use case.
Access the TER dataset and ECF methodology for peer-reviewed research on PM2.5 episode classification, principled abstention, and evidence-bounded environmental interpretation. Co-authorship welcome.
Discuss Research AccessExtend the ECF pipeline to a new EPA AQS monitoring network. The framework is fully parameterized for any FIPS county code and AQS parameter. Partnership includes methodology transfer and calibration support.
Extend ECF to Your RegionTEI is designed for regulatory defensibility. Every interpretation record includes its full evidence chain. Contact us to arrange a structured review of the ECF methodology for regulatory, compliance, or environmental advocacy contexts.
Request Methodology Review