Skip to contents

Applies the DuckHTS `duckdb_munge(...)` table macro to rows from a SQL query or table expression, using either an upstream-style preset, a named column map, or a two-column mapping file. When no mapping mode is provided, the bundled `colheaders.tsv` alias file is used by default.

Usage

rduckhts_munge(
  con,
  query,
  fasta_ref = NULL,
  preset = NULL,
  column_map = NULL,
  column_map_file = NULL,
  iffy_tag = "IFFY",
  mismatch_tag = "REF_MISMATCH",
  ns = NULL,
  nc = NULL,
  ne = NULL
)

Arguments

con

A DuckDB connection with DuckHTS loaded

query

SQL query or table expression to normalize

fasta_ref

Path to the reference FASTA. When NULL (default), operates in fai-only mode: alleles pass through as-is without reference matching or allele swapping, matching upstream `–fai`-only behavior.

preset

Optional preset such as `"PLINK"`, `"PLINK2"`, `"REGENIE"`, `"SAIGE"`, `"BOLT"`, `"METAL"`, `"PGS"`, or `"SSF"`

column_map

Optional named character vector mapping canonical munge names such as `"CHR"`, `"BP"`, `"A1"`, `"A2"` to source column names

column_map_file

Optional path to a two-column TSV mapping file in the upstream `source<TAB>canonical` format

iffy_tag

FILTER tag for ambiguous reference resolution

mismatch_tag

FILTER tag for reference mismatches

ns, nc, ne

Optional global overrides for sample counts

Value

A data frame with normalized GWAS-VCF-style variant/effect columns.