Overview
r4subcore is the foundation of the R4SUB (R for
Regulatory Submission) ecosystem. It defines three responsibilities that
every other R4SUB pillar package depends on:
- Run context — a lightweight metadata envelope that ties every piece of evidence back to a specific study and execution event.
- Evidence schema — a fixed 17-column contract that standardises how findings from heterogeneous sources (validators, parsers, manual checks) are stored and exchanged.
- Validation helpers — functions that enforce the schema at every ingestion point so downstream scoring and profiling can always trust the data.
Creating a Run Context
A run context captures who ran the assessment, on which
study, in which environment, and at what time. Every evidence row you
create is stamped with the run_id and study_id
from its context, ensuring full traceability.
ctx <- r4sub_run_context(study_id = "STUDY001", environment = "DEV")
#> ℹ Run context created: "R4S-20260316112310-wl4dieex"
print(ctx)
#> <r4sub_run_context>
#> run_id: R4S-20260316112310-wl4dieex
#> study_id: STUDY001
#> environment: DEV
#> user: runner
#> created_at: 2026-03-16 11:23:10run_id is generated automatically from the timestamp,
but you can supply your own if you need reproducibility in tests or
pipelines.
ctx_custom <- r4sub_run_context(
study_id = "STUDY001",
environment = "PROD",
run_id = "RUN-2024-001"
)
#> ℹ Run context created: "RUN-2024-001"
ctx_custom$run_id
#> [1] "RUN-2024-001"Understanding the Evidence Schema
The evidence schema defines the 17 columns that every evidence table
must contain. Call evidence_schema() to inspect the
contract at any time:
schema <- evidence_schema()
# Column names in canonical order
names(schema)
#> [1] "run_id" "study_id" "asset_type" "asset_id"
#> [5] "source_name" "source_version" "indicator_id" "indicator_name"
#> [9] "indicator_domain" "severity" "result" "metric_value"
#> [13] "metric_unit" "message" "location" "evidence_payload"
#> [17] "created_at"| Column | Type | Required | Notes |
|---|---|---|---|
run_id |
character | yes | Set from run context |
study_id |
character | yes | Set from run context |
asset_type |
character | yes | One of: dataset, define, program, validation, spec, other |
asset_id |
character | yes | e.g. “ADSL”, “define.xml” |
source_name |
character | yes | Tool or package that produced the finding |
source_version |
character | nullable | Version of the source tool |
indicator_id |
character | yes | e.g. “P21-001”, “U-001” |
indicator_name |
character | yes | Human-readable indicator name |
indicator_domain |
character | yes | One of: quality, trace, risk, usability |
severity |
character | yes | One of: info, low, medium, high, critical |
result |
character | yes | One of: pass, fail, warn, na |
metric_value |
double | nullable | Numeric score (0–1 scale typical) |
metric_unit |
character | nullable | e.g. “score”, “proportion”, “count” |
message |
character | nullable | Human-readable finding description |
location |
character | nullable | e.g. “ADSL.USUBJID” |
evidence_payload |
character | nullable | JSON string for extended detail |
created_at |
POSIXct | yes | Set automatically if omitted |
Controlled vocabulary helpers
canon_severity() and canon_result()
normalise common aliases to the canonical values accepted by the
schema:
canon_severity(c("ERROR", "warning", "Minor", "CRITICAL"))
#> [1] "high" "medium" "low" "critical"
canon_result(c("PASS", "Failed", "Warning", "N/A"))
#> [1] "pass" "fail" "warn" "na"Building Evidence with as_evidence()
as_evidence() is the main ingestion function. You supply
a data frame that contains at minimum the required columns, pass a run
context, and the function:
- fills
run_idandstudy_idfrom the context, - fills nullable columns with appropriately-typed
NA, - sets
created_atto the current time if absent, - validates the result before returning it.
raw <- data.frame(
asset_type = "validation",
asset_id = "ADSL",
source_name = "pinnacle21",
indicator_id = "P21-SD0001",
indicator_name = "Missing variable label",
indicator_domain = "quality",
severity = "high",
result = "fail",
message = "Variable AGEU is missing a label",
location = "ADSL.AGEU",
metric_value = 0,
metric_unit = "score",
stringsAsFactors = FALSE
)
ev <- as_evidence(raw, ctx = ctx)
#> ✔ Evidence table created: 1 rowYou can inspect the resulting evidence table:
Validating Evidence
validate_evidence() runs the same checks that
as_evidence() calls internally. Use it when you receive
evidence produced externally and want to confirm it meets the contract
before processing:
validate_evidence(ev) # returns TRUE invisibly if everything is validBinding Multiple Evidence Tables
When combining evidence from different sources or indicators, use
bind_evidence(). It validates each table individually
before combining, preventing schema violations from silently
propagating:
# A second finding — a passed check on the same dataset
raw2 <- data.frame(
asset_type = "dataset",
asset_id = "ADSL",
source_name = "r4subcore",
indicator_id = "Q-NROW-001",
indicator_name = "Dataset row count",
indicator_domain = "quality",
severity = "info",
result = "pass",
message = "ADSL has 254 subjects",
metric_value = 254,
metric_unit = "count",
stringsAsFactors = FALSE
)
ev2 <- as_evidence(raw2, ctx = ctx)
#> ✔ Evidence table created: 1 row
combined <- bind_evidence(ev, ev2)
#> ✔ Bound 2 evidence tables: 2 total rows
nrow(combined)
#> [1] 2Quick Overview with evidence_summary()
evidence_summary() aggregates an evidence table by
domain, severity, result, and source, giving a one-page digest of the
findings:
evidence_summary(combined)
#> indicator_domain severity result source_name n
#> 1 quality high fail pinnacle21 1
#> 2 quality info pass r4subcore 1Exporting and Importing Evidence
Evidence tables can be persisted and reloaded in CSV, RDS, or JSON
format. The exported file retains the full schema so
import_evidence() can re-validate it on the way back
in.
# Export to CSV
tmp <- tempfile(fileext = ".csv")
export_evidence(combined, file = tmp, format = "csv")
# Import and re-validate
ev_reloaded <- import_evidence(tmp, format = "csv")
nrow(ev_reloaded)RDS is the most faithful format because it preserves POSIXct without any string-conversion round-trip. JSON is useful when evidence needs to be consumed by non-R tooling.
What’s Next
Once you have an evidence table, the other R4SUB pillar packages consume it directly:
- r4subusability — four usability indicators (label quality, Define-XML completeness, annotation coverage, reviewer guide presence).
- r4subscore — weighted scoring across all domains into a single submission readiness index.
- r4subprofile — regulatory authority profile templates (FDA, EMA, PMDA, ANVISA, Health Canada, NMPA) that map scores to submission requirements.