Compute Polygenic Scores — rduckhts

Calls the DuckHTS `bcftools_score(...)` table function to compute sample-level polygenic scores from one genotype VCF/BCF file and one summary-statistics file.

Usage

rduckhts_score(
  con,
  bcf_path,
  summary_path,
  use = NULL,
  columns = "PLINK",
  columns_file = NULL,
  q_score_thr = NULL,
  use_variant_id = FALSE,
  counts = FALSE,
  samples = NULL,
  force_samples = FALSE,
  regions = NULL,
  regions_file = NULL,
  regions_overlap = 1,
  targets = NULL,
  targets_file = NULL,
  targets_overlap = 0,
  apply_filters = NULL,
  include = NULL,
  exclude = NULL
)

Arguments

con: A DuckDB connection with DuckHTS loaded
bcf_path: Path to genotype VCF/BCF file
summary_path: Path to summary-statistics file
use: Optional dosage source (`"GT"`, `"DS"`, `"HDS"`, `"AP"`, `"GP"`, `"AS"`)
columns: Optional summary preset (`"PLINK"`, `"PLINK2"`, `"REGENIE"`, `"SAIGE"`, `"BOLT"`, `"METAL"`, `"PGS"`, `"SSF"`, `"GWAS-SSF"`)
columns_file: Optional two-column summary header mapping file
q_score_thr: Optional comma-separated p-value thresholds (e.g. `"1e-8,1e-6,1e-4"`)
use_variant_id: Logical; if TRUE, match variants by ID instead of CHR+BP
counts: Logical; if TRUE, include per-threshold matched-variant counts
samples: Optional comma-separated list of sample names to subset (e.g. `"SAMP1,SAMP2"`)
force_samples: Logical; if TRUE, ignore missing samples instead of erroring
regions: Optional comma-separated region list (e.g. `"1:1000-2000,2:50-90"`)
regions_file: Optional path to a regions file
regions_overlap: Overlap mode for regions (`0`, `1`, or `2`). Default 1 (trim to region).
targets: Optional comma-separated targets list
targets_file: Optional path to a targets file
targets_overlap: Overlap mode for targets (`0`, `1`, or `2`). Default 0 (record must start in region).
apply_filters: Optional comma-separated FILTER names to keep (e.g. `"PASS,."`)
include: Optional site expression (currently unsupported)
exclude: Optional site expression (currently unsupported)

Value

A data frame with one row per sample and score/count columns.