{"slug": "claude-r-tidyverse-expert", "title": "Claude R Tidyverse Expert", "summary": "This document outlines current best practices for R development using modern tidyverse patterns, emphasizing the use of the native pipe (`|>`) over the legacy magrittr pipe, and recommending `join_by()` for joins. It also covers key techniques such as embracing function arguments with `{{ }}`, using `.by` for per-operation grouping, and employing `pick()`, `across()`, and `reframe()` for efficient and readable code.", "body_md": "# Modern R Development Guide\n\n*This document captures current best practices for R development, emphasizing modern tidyverse patterns, performance, and style. Last updated: August 2025*\n\n## Core Principles\n\n1. **Use modern tidyverse patterns** - Prioritize dplyr 1.1+ features, native pipe, and current APIs\n2. **Profile before optimizing** - Use profvis and bench to identify real bottlenecks\n3. **Write readable code first** - Optimize only when necessary and after profiling\n4. **Follow tidyverse style guide** - Consistent naming, spacing, and structure\n\n## Modern Tidyverse Patterns\n\n### Pipe Usage (`|>` not `%>%`)\n- **Always use native pipe `|>` instead of magrittr `%>%`**\n- R 4.3+ provides all needed features\n``` r\n# Good - Modern native pipe\ndata |> \n  filter(year >= 2020) |>\n  summarise(mean_value = mean(value))\n\n# Avoid - Legacy magrittr pipe  \ndata %>% \n  filter(year >= 2020) %>%\n  summarise(mean_value = mean(value))\n```\n\n### Join Syntax (dplyr 1.1+)\n- **Use `join_by()` instead of character vectors for joins**\n- **Support for inequality, rolling, and overlap joins**\n``` r\n# Good - Modern join syntax\ntransactions |> \n  inner_join(companies, by = join_by(company == id))\n\n# Good - Inequality joins\ntransactions |>\n  inner_join(companies, join_by(company == id, year >= since))\n\n# Good - Rolling joins (closest match)\ntransactions |>\n  inner_join(companies, join_by(company == id, closest(year >= since)))\n\n# Avoid - Old character vector syntax\ntransactions |> \n  inner_join(companies, by = c(\"company\" = \"id\"))\n```\n\n### Multiple Match Handling\n- **Use `multiple` and `unmatched` arguments for quality control**\n``` r\n# Expect 1:1 matches, error on multiple\ninner_join(x, y, by = join_by(id), multiple = \"error\")\n\n# Allow multiple matches explicitly  \ninner_join(x, y, by = join_by(id), multiple = \"all\")\n\n# Ensure all rows match\ninner_join(x, y, by = join_by(id), unmatched = \"error\")\n```\n\n### Data Masking and Tidy Selection\n- **Understand the difference between data masking and tidy selection**\n- **Use `{{}}` (embrace) for function arguments**\n- **Use `.data[[]]` for character vectors**\n\n``` r\n# Data masking functions: arrange(), filter(), mutate(), summarise()\n# Tidy selection functions: select(), relocate(), across()\n\n# Function arguments - embrace with {{}}\nmy_summary <- function(data, group_var, summary_var) {\n  data |>\n    group_by({{ group_var }}) |>\n    summarise(mean_val = mean({{ summary_var }}))\n}\n\n# Character vectors - use .data[[]]\nfor (var in names(mtcars)) {\n  mtcars |> count(.data[[var]]) |> print()\n}\n\n# Multiple columns - use across()\ndata |> \n  summarise(across({{ summary_vars }}, ~ mean(.x, na.rm = TRUE)))\n```\n\n### Modern Grouping and Column Operations\n- **Use `.by` for per-operation grouping (dplyr 1.1+)**\n- **Use `pick()` for column selection inside data-masking functions**\n- **Use `across()` for applying functions to multiple columns**\n- **Use `reframe()` for multi-row summaries**\n\n``` r\n# Good - Per-operation grouping (always returns ungrouped)\ndata |>\n  summarise(mean_value = mean(value), .by = category)\n\n# Good - Multiple grouping variables\ndata |>\n  summarise(total = sum(revenue), .by = c(company, year))\n\n# Good - pick() for column selection\ndata |>\n  summarise(\n    n_x_cols = ncol(pick(starts_with(\"x\"))),\n    n_y_cols = ncol(pick(starts_with(\"y\")))\n  )\n\n# Good - across() for applying functions\ndata |>\n  summarise(across(where(is.numeric), mean, .names = \"mean_{.col}\"), .by = group)\n\n# Good - reframe() for multi-row results\ndata |>\n  reframe(quantiles = quantile(x, c(0.25, 0.5, 0.75)), .by = group)\n\n# Avoid - Old persistent grouping pattern\ndata |>\n  group_by(category) |>\n  summarise(mean_value = mean(value)) |>\n  ungroup()\n```\n\n## Modern rlang Patterns for Data-Masking\n\n### Core Concepts\n\n**Data-masking** allows R expressions to refer to data frame columns as if they were variables in the environment. rlang provides the metaprogramming framework that powers tidyverse data-masking.\n\n#### Key rlang Tools\n- **Embracing `{{}}`** - Forward function arguments to data-masking functions\n- **Injection `!!`** - Inject single expressions or values\n- **Splicing `!!!`** - Inject multiple arguments from a list\n- **Dynamic dots** - Programmable `...` with injection support\n- **Pronouns `.data`/`.env`** - Explicit disambiguation between data and environment variables\n\n### Function Argument Patterns\n\n#### Forwarding with `{{}}`\n**Use `{{}}` to forward function arguments to data-masking functions:**\n\n``` r\n# Single argument forwarding\nmy_summarise <- function(data, var) {\n  data |> dplyr::summarise(mean = mean({{ var }}))\n}\n\n# Works with any data-masking expression\nmtcars |> my_summarise(cyl)\nmtcars |> my_summarise(cyl * am)\nmtcars |> my_summarise(.data$cyl)  # pronoun syntax supported\n```\n\n#### Forwarding `...` (No Special Syntax Needed)\n``` r\n# Simple dots forwarding\nmy_group_by <- function(.data, ...) {\n  .data |> dplyr::group_by(...)\n}\n\n# Works with tidy selections too\nmy_select <- function(.data, ...) {\n  .data |> dplyr::select(...)\n}\n\n# For single-argument tidy selections, wrap in c()\nmy_pivot_longer <- function(.data, ...) {\n  .data |> tidyr::pivot_longer(c(...))\n}\n```\n\n#### Names Patterns with `.data`\n**Use `.data` pronoun for programmatic column access:**\n\n``` r\n# Single column by name\nmy_mean <- function(data, var) {\n  data |> dplyr::summarise(mean = mean(.data[[var]]))\n}\n\n# Usage - completely insulated from data-masking\nmtcars |> my_mean(\"cyl\")  # No ambiguity, works like regular function\n\n# Multiple columns with all_of()\nmy_select_vars <- function(data, vars) {\n  data |> dplyr::select(all_of(vars))\n}\n\nmtcars |> my_select_vars(c(\"cyl\", \"am\"))\n```\n\n### Injection Operators\n\n#### When to Use Each Operator\n\n| Operator | Use Case | Example |\n|----------|----------|---------|\n| `{{ }}` | Forward function arguments | `summarise(mean = mean({{ var }}))` |\n| `!!` | Inject single expression/value | `summarise(mean = mean(!!sym(var)))` |\n| `!!!` | Inject multiple arguments | `group_by(!!!syms(vars))` |\n| `.data[[]]` | Access columns by name | `mean(.data[[var]])` |\n\n#### Advanced Injection with `!!`\n``` r\n# Create symbols from strings\nvar <- \"cyl\"\nmtcars |> dplyr::summarise(mean = mean(!!sym(var)))\n\n# Inject values to avoid name collisions\ndf <- data.frame(x = 1:3)\nx <- 100\ndf |> dplyr::mutate(scaled = x / !!x)  # Uses both data and env x\n\n# Use data_sym() for tidyeval contexts (more robust)\nmtcars |> dplyr::summarise(mean = mean(!!data_sym(var)))\n```\n\n#### Splicing with `!!!`\n``` r\n# Multiple symbols from character vector\nvars <- c(\"cyl\", \"am\")\nmtcars |> dplyr::group_by(!!!syms(vars))\n\n# Or use data_syms() for tidy contexts\nmtcars |> dplyr::group_by(!!!data_syms(vars))\n\n# Splice lists of arguments\nargs <- list(na.rm = TRUE, trim = 0.1)\nmtcars |> dplyr::summarise(mean = mean(cyl, !!!args))\n```\n\n### Dynamic Dots Patterns\n\n#### Using `list2()` for Dynamic Dots Support\n``` r\nmy_function <- function(...) {\n  # Collect with list2() instead of list() for dynamic features\n  dots <- list2(...)\n  # Process dots...\n}\n\n# Enables these features:\nmy_function(a = 1, b = 2)           # Normal usage\nmy_function(!!!list(a = 1, b = 2))  # Splice a list\nmy_function(\"{name}\" := value)      # Name injection\nmy_function(a = 1, )               # Trailing commas OK\n```\n\n#### Name Injection with Glue Syntax\n``` r\n# Basic name injection\nname <- \"result\"\nlist2(\"{name}\" := 1)  # Creates list(result = 1)\n\n# In function arguments with {{\nmy_mean <- function(data, var) {\n  data |> dplyr::summarise(\"mean_{{ var }}\" := mean({{ var }}))\n}\n\nmtcars |> my_mean(cyl)        # Creates column \"mean_cyl\"\nmtcars |> my_mean(cyl * am)   # Creates column \"mean_cyl * am\"\n\n# Allow custom names with englue()\nmy_mean <- function(data, var, name = englue(\"mean_{{ var }}\")) {\n  data |> dplyr::summarise(\"{name}\" := mean({{ var }}))\n}\n\n# User can override default\nmtcars |> my_mean(cyl, name = \"cylinder_mean\")\n```\n\n### Pronouns for Disambiguation\n\n#### `.data` and `.env` Best Practices\n``` r\n# Explicit disambiguation prevents masking issues\ncyl <- 1000  # Environment variable\n\nmtcars |> dplyr::summarise(\n  data_cyl = mean(.data$cyl),    # Data frame column\n  env_cyl = mean(.env$cyl),      # Environment variable\n  ambiguous = mean(cyl)          # Could be either (usually data wins)\n)\n\n# Use in loops and programmatic contexts\nvars <- c(\"cyl\", \"am\")\nfor (var in vars) {\n  result <- mtcars |> dplyr::summarise(mean = mean(.data[[var]]))\n  print(result)\n}\n```\n\n### Programming Patterns\n\n#### Bridge Patterns\n**Converting between data-masking and tidy selection behaviors:**\n\n``` r\n# across() as selection-to-data-mask bridge\nmy_group_by <- function(data, vars) {\n  data |> dplyr::group_by(across({{ vars }}))\n}\n\n# Works with tidy selection\nmtcars |> my_group_by(starts_with(\"c\"))\n\n# across(all_of()) as names-to-data-mask bridge  \nmy_group_by <- function(data, vars) {\n  data |> dplyr::group_by(across(all_of(vars)))\n}\n\nmtcars |> my_group_by(c(\"cyl\", \"am\"))\n```\n\n#### Transformation Patterns\n``` r\n# Transform single arguments by wrapping\nmy_mean <- function(data, var) {\n  data |> dplyr::summarise(mean = mean({{ var }}, na.rm = TRUE))\n}\n\n# Transform dots with across()\nmy_means <- function(data, ...) {\n  data |> dplyr::summarise(across(c(...), ~ mean(.x, na.rm = TRUE)))\n}\n\n# Manual transformation (advanced)\nmy_means_manual <- function(.data, ...) {\n  vars <- enquos(..., .named = TRUE)\n  vars <- purrr::map(vars, ~ expr(mean(!!.x, na.rm = TRUE)))\n  .data |> dplyr::summarise(!!!vars)\n}\n```\n\n### Error-Prone Patterns to Avoid\n\n#### Don't Use These Deprecated/Dangerous Patterns\n``` r\n# Avoid - String parsing and eval (security risk)\nvar <- \"cyl\" \ncode <- paste(\"mean(\", var, \")\")\neval(parse(text = code))  # Dangerous!\n\n# Good - Symbol creation and injection\n!!sym(var)  # Safe symbol injection\n\n# Avoid - get() in data mask (name collisions)\nwith(mtcars, mean(get(var)))  # Collision-prone\n\n# Good - Explicit injection or .data\nwith(mtcars, mean(!!sym(var)))  # Safe\n# or\nmtcars |> summarise(mean(.data[[var]]))  # Even safer\n```\n\n#### Common Mistakes\n``` r\n# Don't use {{ }} on non-arguments\nmy_func <- function(x) {\n  x <- force(x)  # x is now a value, not an argument\n  quo(mean({{ x }}))  # Wrong! Captures value, not expression\n}\n\n# Don't mix injection styles unnecessarily\n# Pick one approach and stick with it:\n# Either: embrace pattern\nmy_func <- function(data, var) data |> summarise(mean = mean({{ var }}))\n# Or: defuse-and-inject pattern  \nmy_func <- function(data, var) {\n  var <- enquo(var)\n  data |> summarise(mean = mean(!!var))\n}\n```\n\n### Package Development with rlang\n\n#### Import Strategy\n``` r\n# In DESCRIPTION:\nImports: rlang\n\n# In NAMESPACE, import specific functions:\nimportFrom(rlang, enquo, enquos, expr, !!!, :=)\n\n# Or import key functions:\n#' @importFrom rlang := enquo enquos\n```\n\n#### Documentation Tags\n``` r\n#' @param var <[`data-masked`][dplyr::dplyr_data_masking]> Column to summarize\n#' @param ... <[`dynamic-dots`][rlang::dyn-dots]> Additional grouping variables  \n#' @param cols <[`tidy-select`][dplyr::dplyr_tidy_select]> Columns to select\n```\n\n#### Testing rlang Functions\n``` r\n# Test data-masking behavior\ntest_that(\"function supports data masking\", {\n  result <- my_function(mtcars, cyl)\n  expect_equal(names(result), \"mean_cyl\")\n  \n  # Test with expressions\n  result2 <- my_function(mtcars, cyl * 2)\n  expect_true(\"mean_cyl * 2\" %in% names(result2))\n})\n\n# Test injection behavior\ntest_that(\"function supports injection\", {\n  var <- \"cyl\"\n  result <- my_function(mtcars, !!sym(var))\n  expect_true(nrow(result) > 0)\n})\n```\n\nThis modern rlang approach enables clean, safe metaprogramming while maintaining the intuitive data-masking experience users expect from tidyverse functions.\n\n## Performance Best Practices\n\n## Performance Tool Selection Guide\n\n### When to Use Each Performance Tool\n\n#### Profiling Tools Decision Matrix\n\n| Tool | Use When | Don't Use When | What It Shows |\n|------|----------|----------------|---------------|\n| **`profvis`** | Complex code, unknown bott", "url": "https://wpnews.pro/news/claude-r-tidyverse-expert", "canonical_source": "https://gist.github.com/sj-io/3828d64d0969f2a0f05297e59e6c15ad", "published_at": "2025-08-21 09:25:36+00:00", "updated_at": "2026-05-22 07:07:37.950908+00:00", "lang": "en", "topics": ["developer-tools", "data", "open-source"], "entities": ["tidyverse", "dplyr", "R", "profvis", "bench", "magrittr"], "alternates": {"html": "https://wpnews.pro/news/claude-r-tidyverse-expert", "markdown": "https://wpnews.pro/news/claude-r-tidyverse-expert.md", "text": "https://wpnews.pro/news/claude-r-tidyverse-expert.txt", "jsonld": "https://wpnews.pro/news/claude-r-tidyverse-expert.jsonld"}}