gmlst typing¶
Type FASTA or FASTQ samples with MLST, cgMLST, or scheme-free workflows.
mlst¶
Type samples against MLST schemes only.
Usage¶
gmlst typing mlst [OPTIONS] SAMPLES...
Options¶
| Option | Description | Default |
|---|---|---|
-s, --scheme TEXT |
Scheme name to type against. Required. | - |
-b, --backend [blastn\|kma\|minimap2\|nucmer] |
Alignment backend. | blastn |
--min-id FLOAT |
Minimum percent identity for confident allele calls. | 95.0 |
--min-cov FLOAT |
Minimum coverage fraction for confident allele calls. | 0.95 |
--min-depth FLOAT |
Minimum depth threshold for depth-aware workflows. | 10.0 |
--format [tsv\|json\|pretty] |
Output format. | tsv |
-o, --output PATH |
Write results to a file instead of stdout. | - |
--cache-dir PATH |
Override the scheme cache directory. | - |
--force-reindex |
Rebuild cached indexes before typing. | False |
--no-header |
Omit the header row in TSV output. | False |
-t, --threads INTEGER |
Backend thread count. | 1 |
--max-workers INTEGER |
Number of samples to process in parallel. | 1 |
--count-same-copy |
Report same-allele multicopy events such as 1,1 when supported. |
False |
-q, --quiet |
Reduce console output. | False |
--novel-allele |
Save novel allele candidates during typing. | False |
--novel-profile |
Save complete novel profiles. Requires --novel-allele. |
False |
--data-dir, --output-dir PATH |
Directory for novel allele and profile outputs. | - |
-h, --help |
Show the help message and exit. | - |
Examples¶
gmlst typing mlst -s saureus_1 sample.fasta
gmlst typing mlst -s saureus_1 --format json samples/*.fasta -o results.json
gmlst typing mlst -s saureus_1 -b kma reads/sample_R1.fastq.gz reads/sample_R2.fastq.gz
gmlst typing mlst -s saureus_1 --novel-allele --novel-profile --data-dir novel_data sample.fasta
Notes¶
typing mlstaccepts assembled FASTA input and backend-supported FASTQ input.- Paired FASTQ files are auto-detected when names match common patterns such as
_R1/_R2,_1/_2, or.1/.2, including.fastq,.fq, and.gzvariants. --novel-allelesaves high-coverage non-public calls for later review.--novel-profileadds complete novel profiles and only works together with--novel-allele.- JSON output is the best input for downstream novel extraction with
gmlst utils extract.
cgmlst¶
Type samples against cgMLST/wgMLST schemes only.
Usage¶
gmlst typing cgmlst [OPTIONS] SAMPLES...
Options¶
| Option | Description | Default |
|---|---|---|
-s, --scheme TEXT |
Scheme name to type against. Required. | - |
-b, --backend [blastn\|kma\|minimap2\|nucmer] |
Alignment backend. | minimap2 |
--cgmlst-mode [standard\|chew-fast\|chew-ultrafast\|chew-bsr\|chew-balanced] |
Runtime mode for cgMLST FASTA workflows. | standard |
--min-id FLOAT |
Minimum percent identity for confident allele calls. | 95.0 |
--min-cov FLOAT |
Minimum coverage fraction for confident allele calls. | 0.95 |
--min-depth FLOAT |
Minimum depth threshold for depth-aware workflows. | 10.0 |
--format [tsv\|json\|pretty] |
Output format. | tsv |
-o, --output PATH |
Write results to a file instead of stdout. | - |
--cache-dir PATH |
Override the scheme cache directory. | - |
--force-reindex |
Rebuild cached indexes before typing. | False |
--no-header |
Omit the header row in TSV output. | False |
-t, --threads INTEGER |
Backend thread count. | 1 |
--max-workers INTEGER |
Number of samples to process in parallel. | 1 |
--count-same-copy |
Report same-allele multicopy events such as 1,1 when supported. |
False |
-q, --quiet |
Reduce console output. | False |
--prefilter-k INTEGER |
K-mer size used by the prefilter. | 31 |
--prefilter-top-n INTEGER |
Number of candidate loci to keep during prefiltering. | 20 |
--prefilter-min-loci-fraction FLOAT |
Minimum retained locus fraction needed to keep the prefilter active. | 0.3 |
--no-prefilter |
Disable prefiltering explicitly. | False |
--novel-allele |
Save novel allele candidates during typing. | False |
--novel-profile |
Save complete novel profiles. Requires --novel-allele. |
False |
--data-dir, --output-dir PATH |
Directory for novel allele and profile outputs. | - |
--cds-coordinates-out PATH |
Export predicted CDS coordinates as TSV. | - |
--call-policy [default\|chewbbaca] |
Per-locus reporting policy. | default |
--chew-cds-gate / --no-chew-cds-gate |
Enable or disable CDS-gated chewBBACA-style classification. | chew-cds-gate |
-h, --help |
Show the help message and exit. | - |
cgMLST Modes¶
| Mode | Description |
|---|---|
standard |
Conservative baseline. |
chew-fast |
Exact-hash + minimap2 prefilter with targeted rescue. |
chew-ultrafast |
Aggressive speed profile with bounded second-pass rescue. |
chew-bsr |
Protein-level exact-hash on top of chew-fast. |
chew-balanced |
Hash-first with targeted blastn fallback. |
Examples¶
gmlst typing cgmlst -s vparahaemolyticus_3 sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode chew-fast sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --prefilter-k 31 --prefilter-top-n 20 sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --call-policy chewbbaca --cds-coordinates-out cds.tsv sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 -b kma -t 8 reads_R1.fastq.gz reads_R2.fastq.gz
Notes¶
typing cgmlstis tuned for large locus sets, where the per-locus profile matters more than a compact legacy ST.- FASTA assemblies use the full cgMLST mode system. FASTQ input follows a KMA-first path. If you request
-b minimap2with FASTQ files,gmlstauto-switches to-b kmaand treats--cgmlst-modeas compatibility-onlystandardbehavior. --call-policy chewbbacarequires FASTA assemblies. Raw calls stay unchanged, while output labels follow chewBBACA-style classes.- CDS-gated classification is enabled by default for
--call-policy chewbbaca. Use--no-chew-cds-gateonly when you want classification from any matched sequence context. - Prefiltering is most useful for FASTA assembly workflows. For
-b kmaand the default-b minimap2path,gmlstcan skip prefiltering and use its persistent full-index route instead.
Environment Variables¶
| Variable | Purpose |
|---|---|
GMLST_MINIMAP2_FASTA_SPEED_PROFILE |
Select minimap2 FASTA speed profile for cgMLST FASTA runs. |
GMLST_CGMLST_MINIMAP2_ULTRA_SECOND_PASS_MAX_LOCI |
Control the rescue budget for the chew-ultrafast second pass. |
GMLST_CGMLST_FASTQ_KMA_AUTO_THREADS |
Auto-raise KMA threads for cgMLST FASTQ runs when possible. |
GMLST_CGMLST_KMA_FASTQ_MEM_MODE |
Toggle KMA -mem_mode for cgMLST FASTQ mapping. |
GMLST_CGMLST_KMA_FASTQ_MEM_CONFIRM_MAX_LOCI |
Limit strict KMA confirmation after the FASTQ mem-mode pass. |
GMLST_CGMLST_PREFILTER_MAX_LOCI |
Auto-skip prefiltering above a configured locus count. |
GMLST_CGMLST_EXACT_HASH_PREFILTER |
Enable chewBBACA-style DNA exact-hash pre-resolution. |
GMLST_CGMLST_MINIMAP2_HASH_PREFILTER |
Enable experimental minimap2 hash-first prefiltering. |
GMLST_CGMLST_CDS_PREDICTION_MODE |
Set Pyrodigal CDS prediction mode. |
GMLST_CGMLST_CDS_TRAINING_FILE |
Use a fixed Pyrodigal training file. |
GMLST_CGMLST_CDS_CLOSED_ENDS |
Control Pyrodigal closed-end CDS prediction behavior. |
GMLST_CGMLST_CDS_COORDINATES_OUT |
Export predicted CDS coordinates through an environment override. |
GMLST_CGMLST_MINIMAP2_HASH_REFINE_MAX_LOCI |
Limit second-pass minimap2 refinement when no mode override sets it. |
GMLST_CGMLST_EVIDENCE_FALLBACK_BACKEND |
Pick the targeted low-confidence fallback backend. |
GMLST_CGMLST_EVIDENCE_FALLBACK_MAX_LOCI |
Limit how many loci go through targeted fallback. |
tgmlst¶
Run scheme-free typing pipeline (tgMLST).
Usage¶
gmlst typing tgmlst [OPTIONS] SAMPLES...
Options¶
| Option | Description | Default |
|---|---|---|
--format [tsv\|json\|pretty] |
Output format. | tsv |
-o, --output PATH |
Write results to a file instead of stdout. | - |
--no-header |
Omit the header row in TSV output. | False |
-q, --quiet |
Reduce console output. | False |
--hash-strategy [safe\|fast\|ultra\|strict\|blast] |
Hash and rescue strategy for scheme-free calling. | safe |
--save-scheme PATH |
Save the discovered scheme for reuse. | - |
--load-scheme PATH |
Reuse a previously saved tgMLST scheme. | - |
--stats |
Print extra summary statistics. | False |
--max-workers INTEGER |
Number of samples to process in parallel. | - |
-t, --threads INTEGER |
Backend thread count. | - |
--assemble-timeout FLOAT |
Timeout for assembly-related steps. | - |
--error-report PATH |
Write per-sample errors to a report file. | - |
--fail-on-error |
Stop on the first processing error. | False |
--summary-report PATH |
Write a summary report file. | - |
-h, --help |
Show the help message and exit. | - |
Examples¶
gmlst typing tgmlst sample.fna
gmlst typing tgmlst --stats sample.fna
gmlst typing tgmlst --save-scheme discovered_scheme.json sample.fna
gmlst typing tgmlst --load-scheme discovered_scheme.json another_sample.fna --format json
Notes¶
tgmlstis scheme-free. It discovers loci from the data instead of requiring a named public scheme up front.--save-schemeand--load-schememake the workflow reusable. A common pattern is to discover a scheme on one batch, save it, then load it for later related samples.--format json,--summary-report, and--error-reportare the best fit when you need structured downstream processing.
Common typing notes¶
Output format details¶
typing mlst and typing cgmlst use compact allele markers in TSV output.
| Marker | Meaning |
|---|---|
23 |
Exact known allele call. |
~23 |
Non-exact high-coverage call, usually the closest allele or a novel-style locus represented by the nearest allele ID. |
15? |
Partial call with insufficient coverage for a confident full call. |
- |
Locus not found. |
JSON output is the best choice when you need structured per-locus metadata, including novel_sequence fields for downstream extraction.
Multicopy loci¶
- Conflicting multicopy calls can appear as comma-separated values such as
1,2. - Same-allele copy counts such as
1,1are optional and currently exposed through--count-same-copy. - When conflicting multicopy loci are present,
STis reported as-so the final profile is not overcalled. - A practical review pattern is a fast first pass, followed by a targeted
blastnrerun with--count-same-copyfor flagged samples.