AI & ML

Building Better R Tools: What Teaching R Reveals About Software Design

Apr 09, 2026 5 min read views
This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers. (You can report issues about the content on this page here)
Want to share your content on R-bloggers? Click here if you have a blog, or here if you don't.

As Jumping Rivers approaches its tenth anniversary—having been founded in 2016—it's a fitting moment to reflect on a decade of growth and share some of the operational lessons learned along the way.

Regular readers will know that Jumping Rivers is a data science consultancy and training provider. What may be less well known is that the company offers a catalogue of over 50 courses spanning R, Python, Git, SQL, and more.

This post pulls back the curtain on the internal processes that make managing that catalogue tractable, covering practical techniques—code packaging and automated CI/CD—that are relevant to any large-scale coding project.

The challenge

Maintaining a catalogue of this size creates three distinct operational challenges.

1. Multilingual support

The course catalogue breaks down roughly as follows:

  • 50% R
  • 30% Python
  • 5% R and Python
  • 15% other (Git, SQL, Tableau, Posit, and more)

Any standardisation solution must be compatible with both R and Python at a minimum, and ideally extend to languages like SQL and Git as well.

2. Maintenance

The R and Python ecosystems move fast. The languages themselves are updated regularly, and the package landscapes on CRAN and PyPI evolve continuously. Code that ran without issue a year ago—or even yesterday—can break silently when a dependency is updated. Tracking this across 50+ courses without automation would be unmanageable.

3. Demand

Jumping Rivers delivers over 100 training events per year. For a relatively small team, that workload demands efficiency. Ideally, the mechanics of building course materials, provisioning cloud training environments, and handling course administration should be automated, freeing trainers to focus entirely on delivery quality.

The solution

The team draws on techniques it already applies for clients—automated reporting pipelines and packaging of reusable code—and turns them inward on its own training infrastructure.

Automated reporting

Consider the common scenario of a document that must be refreshed regularly—a monthly revenue presentation, for instance. Manually regenerating charts and tables and pasting them into a report is error-prone and time-consuming. A better approach is to use an automated reporting framework, and two strong open-source options exist: R Markdown and Quarto.

Both follow the same core pattern:

  • A YAML header at the top of the document specifies configuration and formatting:

    ---
    title: "Introduction to Python"
    authors:
    - "Myles Mitchell"
    date: "2026-04-02"
    output: pdf
    ---
    
  • The document body is written in Markdown and can contain executable code chunks alongside prose:

    ## Introduction
    At it's most basic, Python is essentially a calculator.
    We can run basic calculations as follows:
    ```{python}
    2 + 1
    ```
    We can also assign the output of a calculation to a
    variable so that it can be reused later:
    ```{python}
    x = 2 + 1
    print(x)
    ```
    

Code chunks can be configured to execute at render time, embedding any resulting outputs—plots, tables, summary statistics—directly into the final document. When the underlying data changes, regenerating the report is a single action.

Jumping Rivers applies this same principle to course notes and presentation slides. Embedding live code directly into course materials introduces a useful forcing function: any faulty or outdated example will produce a visible error at build time, before it ever reaches a learner.

Critically, both R Markdown and Quarto support R, Python, SQL, and Git syntax highlighting, and can render to HTML, PDF, and other formats—covering the full breadth of the course catalogue.

Flow chart illustrating the automated reporting workflow with Quarto. Starting with a text-based .qmd file, this is converted into a Markdown format using Jupyter or knitr. Pandoc is then used to convert this into a variety of output formats including HTML, PDF and Word.

Internal R packages

Automated reporting solves the build problem, but with 50 courses, setting up each one from scratch would quickly become repetitive. A core principle in software engineering is to avoid duplication: shared logic belongs in functions, and functions belong in packages—self-contained, reusable, and testable units that reduce the number of places where bugs can hide.

Jumping Rivers applies this philosophy to its training infrastructure by packaging all reusable course assets—logos, template files, and styling—into a collection of internal R packages. When a new course is created, the developer focuses only on what is unique to that course:

  • Code examples
  • Notes
  • Exercises
  • Presentation slides

Everything else is handled automatically: the visual appearance of the materials, and the build routines that convert R Markdown or Quarto source files into finished HTML.

Separate internal packages handle the administrative side of training, including:

  • Generating pricing quotes for clients.
  • Producing post-course completion certificates.
  • Spinning up bespoke Posit Workbench environments for each course.
  • Summarising and reporting on attendee feedback.

GitLab CI/CD

Packaging and automated reporting create standardised, repeatable processes—but they don't address the challenge of monitoring 50+ live courses for issues. That's where CI/CD (Continuous Integration / Continuous Delivery and Deployment) becomes essential.

A CI/CD framework provides structure for software development through:

  • Automated unit testing.
  • Branching strategies and code review workflows.
  • Versioning and deployment of software.

Cloud platforms like GitLab and GitHub extend version control with collaborative development tools, including CI/CD pipelines for automated testing and deployment, branch protection rules that enforce code review and testing gates, and source code tagging and versioning.

Each course at Jumping Rivers lives in its own GitLab repository. CI/CD pipeline definitions, along with the internal R packages, are maintained in a separate central repository and propagated downstream to all course repositories.

Flow chart illustrating how we have standardised our GitLab training repositories. The templates are defined in a central repository and pushed downstream to our course repositories.

When a new course repository is created, it is automatically seeded with the standard CI/CD configuration. Every course is then subject to the same checks:

  • Verifying that course notes build without errors.
  • Requiring code review before any changes are merged into the main branch.
  • Building and archiving the rendered HTML notes and scripts as versioned artifacts.

Pipelines are triggered by any commit to a course repository, and are also scheduled to run monthly across all courses—with failures surfaced immediately to the relevant trainer.

GitLab's hierarchical repository structure provides an additional benefit. Within the Jumping Rivers GitLab project, all training repositories sit within a dedicated "training" subgroup. Any environment variables or settings defined at the subgroup level cascade automatically to every course repository beneath it, eliminating the need to configure each one individually.

In summary

The principles behind Jumping Rivers' training infrastructure are broadly applicable to any large coding project:

  • Avoid duplication: migrate reusable logic and assets into standalone packages.
  • Adopt CI/CD: use GitLab, GitHub, or a comparable platform to automate testing, review, and deployment.
  • Automate the routine: free up human attention for the work that actually requires it.

This infrastructure has been a decade in the making and continues to evolve. For a more detailed walkthrough, see this talk by Myles at SatRdays London 2024.

Further reading on automated reporting:

Further reading on packaging source code:

For updates and revisions to this article, see the original post.

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? Click here if you have a blog, or here if you don't.
Continue reading: Using R to Teach R: Lessons for Software Development