Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Official economic statistics like GDP, inflation, and employment figures are published as preliminary estimates that undergo subsequent revisions as agencies incorporate new source data, update seasonal adjustments, or apply benchmarking changes. These revisions can reshape our understanding of economic turning points, policy effectiveness, and forecast accuracy.
reviser, developed by Marc Burri and Philipp Wegmüller, provides R users with tools for analyzing vintage datasets that capture multiple published versions of the same time series. The package enables systematic comparison of what was known at each release date against later reported values through a unified workflow that can:
- convert release vintages between wide and tidy formats
- calculate revisions relative to earlier or final releases
- quantify bias, dispersion, and serial dependence in revision patterns
- determine when a release becomes statistically equivalent to the eventual benchmark
- forecast future revisions using state-space models
The package targets analysts already working with real-time macroeconomic data who need capabilities beyond manual revision triangle plotting, with a design philosophy centered on keeping the entire workflow within R.
The package underwent rOpenSci statistical software peer review, benefiting from feedback by reviewers Alex Gibberd and Tanguy Barthelemy, along with editor Rebecca Killick.
Understanding revision patterns matters
Revisions aren't simply measurement error. They reveal how information flows into the data production pipeline:
- Some revisions incorporate genuinely new information unavailable at first release
- Others represent noise that could have been filtered earlier
- Still others stem from methodological updates or benchmark adjustments
Distinguishing between these sources matters when evaluating early releases, constructing nowcasts, or testing whether initial publications efficiently summarize available information.
The reviser vignettes organize the analysis workflow into three stages:
- Structure vintage data consistently
- Measure and test revision properties
- Model the revision process for predicting future changes
Working with GDP vintages
Analysis begins by reshaping data into tidy vintage format, where each row represents an observed value, its reference date, and publication date. The package includes a GDP example dataset in long vintage format. To examine U.S. GDP growth, visualize estimate evolution during the 2008-09 financial crisis, and test for systematic bias in early releases:
library(reviser) library(dplyr) library(tsbox) gdp_us <- gdp |> filter(id == "US") |> tsbox::ts_pc() |> tsbox::ts_span(start = "1980-01-01") gdp_wide <- vintages_wide(gdp_us) gdp_long <- vintages_long(gdp_wide, keep_na = FALSE)
With vintages in tidy form, plotting how published paths changed over time becomes straightforward. The y-axis shows quarter-on-quarter GDP growth rates:
plot_vintages(
gdp_long |>
filter(
pub_date >= as.Date("2009-01-01"),
pub_date < as.Date("2010-01-01"),
time > as.Date("2008-01-01"),
time < as.Date("2010-01-01")
),
type = "line",
title = "Revisions of GDP during the 2008-09 global financial crisis",
ylab = "Quarter-on-quarter GDP growth rate"
)
GDP growth vintages for the United States during the 2008-09 global financial crisis.
During volatile periods, vintage paths can diverge substantially, meaning the narrative from the first release may differ markedly from the story told a year later.
With tidy vintage data, comparing early releases to a later benchmark becomes direct:
final_release <- get_nth_release(gdp_long, n = 10) early_releases <- get_nth_release(gdp_long, n = 0:6) summary_tbl <- get_revision_analysis(early_releases, final_release) Warning: Both 'release' and 'pub_date' columns are present in 'df. The 'release' column will be used for grouping. summary_tbl |> select(id, release, `Bias (mean)`, `Bias (robust p-value)`, `Noise/Signal`) === Revision Analysis Summary === # A tibble: 7 × 5 id release `Bias (mean)` `Bias (robust p-value)` `Noise/Signal` <chr> <chr> <dbl> <dbl> <dbl> 1 US release_0 -0.014 0.52 0.22 2 US release_1 -0.015 0.425 0.202 3 US release_2 -0.013 0.507 0.205 4 US release_3 -0.003 0.851 0.194 5 US release_4 -0.014 0.326 0.157 6 US release_5 -0.021 0.181 0.152 7 US release_6 -0.018 0.202 0.13 === Interpretation === id=US, release=release_0: • No significant bias detected (p = 0.52 ) • Moderate revision volatility (Noise/Signal = 0.22 ) id=US, release=release_1: • No significant bias detected (p = 0.425 ) • Moderate revision volatility (Noise/Signal = 0.202 ) id=US, release=release_2: • No significant bias detected (p = 0.507 ) • Moderate revision volatility (Noise/Signal = 0.205 ) id=US, release=release_3: • No significant bias detected (p = 0.851 ) • Moderate revision volatility (Noise/Signal = 0.194 ) id=US, release=release_4: • No significant bias detected (p = 0.326 ) • Moderate revision volatility (Noise/Signal = 0.157 ) id=US, release=release_5: • No significant bias detected (p = 0.181 ) • Moderate revision volatility (Noise/Signal = 0.152 ) id=US, release=release_6: • No significant bias detected (p = 0.202 ) • Moderate revision volatility (Noise/Signal = 0.13 )
This is where reviser extends beyond reshaping and plotting. The revision summary provides metrics that applied research frequently requires but typically rebuilds case-by-case: mean bias, quantiles, volatility, noise-to-signal ratios, and hypothesis tests for bias, serial correlation, and news-versus-noise interpretations.
In this example, early U.S. GDP releases show no evidence of systematic bias relative to the later benchmark. The package also supports efficient-release diagnostics that test not just whether revisions exist, but when additional revisions cease adding meaningful information:
efficient_release <- get_first_efficient_release(early_releases, final_release) summary(efficient_release) Efficient release: 0 Model summary: Call: stats::lm(formula = formula, data = df_wide) Residuals: Min 1Q Median 3Q Max -0.89186 -0.12669 0.02046 0.11475 0.97986 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.00299 0.02223 0.134 0.893 release_0 0.97412 0.01692 57.567 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.2518 on 166 degrees of freedom (10 observations deleted due to missingness) Multiple R-squared: 0.9523, Adjusted R-squared: 0.952 F-statistic: 3314 on 1 and 166 DF, p-value: < 2.2e-16 Test summary: Linear hypothesis test: (Intercept) = 0 release_0 = 1 Model 1: restricted model Model 2: final ~ release_0 Note: Coefficient covariance matrix supplied. Res.Df Df F Pr(>F) 1 168 2 166 2 1.9283 0.1486
This result is difficult to extract from a revision triangle alone but straightforward to formalize with a standardized workflow. In this sample, the first release is already statistically close to the later benchmark, suggesting subsequent revisions add relatively little systematic information.
From descriptive analysis to revision forecasting
While revision summaries serve as the primary use case for many analysts, reviser also includes model-based tools for treating revisions as an explicit latent-data problem. This matters when making decisions based on preliminary data while needing a structured approach to estimate how those figures will likely change.
Two vignettes demonstrate nowcasting revisions with:
- the generalized Kishor-Koenig family via
kk_nowcast() - the Jacobs-Van Norden model via
jvn_nowcast()
Both approaches formulate the revision problem in state-space form, enabling estimation of news and noise dynamics across successive data releases. For technical users, this transforms revision analysis from retrospective diagnosis into a forecasting framework.
Here's a compact kk_nowcast() example following the Kishor-Koenig workflow from the vignette for the Euro Area. The approach first identifies an efficient release e, then estimates the revision system on the corresponding release panel. In this euro area example, the efficient-release step selects e = 2, treating the third published release as the earliest one already close to the later benchmark. This substantive finding suggests most economically relevant signal arrives within the first few releases, while later revisions represent smaller adjustments:
gdp_ea <- reviser::gdp |>
tsbox::ts_pc() |>
dplyr::filter(
id == "EA",
time >= min(pub_date),
time <= as.Date("2020-01-01")
)
releases <- get_nth_release(gdp_ea, n = 0:14)
final_release <- get_nth_release(gdp_ea, n = 15)
efficient_release <- get_first_efficient_release(releases, final_release)
fit_kk <- kk_nowcast(
df = efficient_release$data,
e = efficient_release$e,
model = "KK",
method = "MLE"
)
summary(fit_kk)
=== Kishor-Koenig Model ===
Convergence: Success
Log-likelihood: 125.7
AIC: -231.41
BIC: -198.23
Parameter Estimates:
Parameter Estimate Std.Error
F0 0.633 0.131
G0_0 0.950 0.031
G0_1 -0.037 0.152
G0_2 -0.181 0.220
G1_0 -0.009 0.011
G1_1 0.594 0.061
G1_2 0.194 0.092
v0 0.380 0.068
eps0 0.008 0.001
eps1 0.001 0.000
plot(fit_kk)
Diagnostic plot from the Kishor-Koenig nowcast example.
The fitted object contains estimated parameters, filtered and smoothed latent states, and built-in plotting methods for the implied efficient-release path — giving you a direct route from descriptive revision analysis to a state-space nowcast of future revisions. For practitioners, the headline result isn't any individual coefficient. What matters is that the model converges cleanly on this sample, compresses the revision process into a compact latent-state representation, and offers a principled way to assess whether a new data release is likely to be revised materially down the line. By separating persistent signal from transitory revision noise, the output becomes genuinely actionable whenever you need to decide how much weight to place on the latest figures.
What reviser adds
What distinguishes reviser isn't any single function — it's the coherence of the end-to-end workflow:
- Explicit, consistent conventions for handling vintage data;
- Descriptive revision analysis and formal statistical testing within a single unified API;
- Efficient-release analysis tied directly to the applied question of which release to trust;
- Nowcasting tools that extend the same workflow, rather than requiring a separate modeling stack.
For anyone working with real-time macroeconomic data, that integration matters. Revision analysis has traditionally been fragmented — scattered across custom scripts, one-off spreadsheets, and ad hoc package combinations that share no common data structure. reviser addresses that friction directly.
Try it and push it further
Install the package from the rOpenSci R-universe:
install.packages(
"reviser",
repos = c("https://ropensci.r-universe.dev", "https://cloud.r-project.org")
)
Then explore the documentation and source:
Feedback from users working with different datasets would be particularly valuable. A real-time dataset with an unusual release structure would serve as a useful stress test. If you encounter gaps in the workflow or have a use case worth sharing, open an issue or contribute an example. Revision analysis becomes more powerful — and more reproducible — when workflows can be compared and adapted across datasets rather than rebuilt from scratch each time.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? Click here if you have a blog, or here if you don't.