FAQ and Troubleshooting¶
This page collects common questions and practical fixes for gmlst. For installation and first-run instructions, see installation.md and quickstart.md. For command syntax, see commands.md.
General Questions¶
What is gmlst?¶
gmlst is a Python CLI for bacterial genome typing. It supports classical MLST, larger cgMLST and wgMLST schemes, and scheme-free typing with gmlst typing tgmlst.
What is the difference between MLST and cgMLST?¶
- MLST usually uses a small number of housekeeping loci, often 7
- cgMLST uses a much larger set of core-genome loci
- wgMLST goes broader still, often including accessory loci
In practice, MLST is lighter and easier to compare quickly. cgMLST gives finer resolution, but needs more compute and more complete locus recovery.
Why does gmlst support multiple alignment backends?¶
Different inputs and workloads need different tools.
blastnis a good baseline for assembly-based MLSTkmais well suited to read-based workflows and cgMLST FASTQ pathsminimap2is fast and flexible for assemblies and some read workflowsnucmercan be useful for assembly comparison and alternate evidence
The project keeps the output model normalized so downstream calling logic can stay consistent across backends.
Installation Issues¶
pixi: command not found¶
Pixi is not installed, or your shell has not picked up the new PATH yet.
curl -fsSL https://pixi.sh/install.sh | bash
Then open a new terminal and verify:
pixi --version
Python 3.12 is required¶
Your Python is too old for the package.
Check:
python --version
python3 --version
If you want the easiest path, use pixi instead of a system Python install:
git clone https://github.com/indexofire/gmlst.git
cd gmlst
pixi install
pixi run gmlst --version
pip install gmlst fails¶
Common reasons:
- Python is not 3.12
- build tools or environment are inconsistent
- external tools are missing and later commands fail
Try a clean virtual environment:
python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install gmlst
If you still hit environment problems, use pixi.
blastn not found¶
blastn is part of BLAST+. If you installed with pixi, run inside the pixi environment. If you used pip, install BLAST+ separately.
Check:
gmlst utils check -b blastn
Or:
pixi run gmlst utils check -b blastn
I get permission errors during install¶
Do not use sudo pip or sudo pixi for normal local installs. Use a user-local environment.
Good options:
- pixi environment
- Python virtual environment
- conda or mamba environment
Scheme Management Issues¶
Scheme 'X' not found in catalog¶
The name must match the canonical scheme_name, not just the organism name.
Check available schemes:
gmlst scheme list
gmlst scheme list -p pubmlst
gmlst scheme list -p enterobase -t cgmlst
Then use the exact scheme name, for example:
gmlst scheme download -s saureus_1
Scheme download fails¶
Typical causes:
- temporary network failure
- provider-side outage or rate limit
- a download tool that does not work well in your environment
Try a different download tool:
gmlst scheme download -s saureus_1 --download-tool curl
gmlst scheme download -s saureus_1 --download-tool requests
gmlst scheme download -s saureus_1 --download-tool wget
If you are behind a proxy or firewall, test provider access separately.
Where is the cache stored?¶
The cache root is auto-detected: $CONDA_PREFIX/share/gmlst in conda environments, $VIRTUAL_ENV/.cache/gmlst in virtualenvs, or ~/.cache/gmlst by default. Each environment gets its own isolated cache.
You can override it:
export GMLST_CACHE_DIR="$HOME/work/gmlst-cache"
gmlst scheme list
To share a cache across conda environments:
export GMLST_CACHE_DIR="$HOME/.cache/gmlst"
I want to use a different cache directory for one command¶
Some commands also support --cache-dir.
gmlst scheme list --cache-dir /tmp/gmlst-cache
gmlst scheme download -s saureus_1 --cache-dir /tmp/gmlst-cache
A scheme is missing from scheme list¶
It may be blocked by gmlst/data/blocked_schemes.json.
That file is used to hide or reject selected schemes in list, show, download, and update flows. If you are maintaining the project and need to change block policy, edit that JSON file carefully.
Provider URL overrides for self-hosted BIGSdb¶
You can point gmlst at a custom BIGSdb host with environment variables.
export GMLST_PRIVATE_BIGSDB_URL="http://127.0.0.1:9000/api/db"
export GMLST_PRIVATE_BIGSDB_NAME="labdb"
export GMLST_PRIVATE_BIGSDB_LABEL="Lab BIGSdb"
gmlst scheme list -p labdb
For built-in hosts you can also override base URLs:
export GMLST_PUBMLST_BASE_URL="https://rest.pubmlst.org/db"
export GMLST_PASTEUR_BASE_URL="https://bigsdb.pasteur.fr/api/db"
Typing Issues¶
ST is -¶
An ST of - means there is no exact profile match. Common reasons:
- one or more loci are missing
- one or more loci are partial
- one or more loci are closest or novel-like instead of exact
- multicopy conflicts exist, such as
1,2
What to do next:
gmlst typing mlst -s saureus_1 --format json sample.fasta -o sample.json
gmlst typing mlst -s saureus_1 --format pretty sample.fasta
The JSON output gives more detail for each locus.
I see multicopy alleles like 1,2¶
That means multiple conflicting hits were found for the locus. This can happen with duplicated loci, repeated sequence, or ambiguous mapping.
If you are reviewing assemblies with blastn, try:
gmlst typing mlst -s saureus_1 -b blastn --count-same-copy sample.fasta
This can expose same-allele copy notation such as 1,1 in supported cases.
Many loci are missing¶
Common causes:
- wrong scheme for the organism
- low assembly quality
- truncated contigs
- FASTQ data used with a backend that does not support FASTQ
Check the scheme first:
gmlst scheme show -s saureus_1
gmlst scheme list -n aureus
Then try a backend that fits the input type:
- FASTA:
blastn,minimap2,nucmer,kma - FASTQ:
kmaorminimap2
Typing is slower than expected¶
Try these checks:
gmlst typing mlst -s saureus_1 --max-workers 4 samples/*.fasta -o results.tsv
gmlst typing cgmlst -s vparahaemolyticus_3 -t 8 sample.fasta
More tips are in the Performance Tips section below.
FASTQ input fails with blastn or nucmer¶
That is expected. blastn and nucmer are assembly-oriented in this project. Use kma or minimap2 for FASTQ input.
Example:
gmlst typing mlst -s saureus_1 -b kma reads_R1.fastq.gz reads_R2.fastq.gz
gmlst typing mlst -s saureus_1 -b minimap2 reads_R1.fastq.gz reads_R2.fastq.gz
FASTQ-specific Issues¶
How does paired-end detection work?¶
gmlst detects common naming patterns such as:
_R1and_R2_1and_2.1and.2
Supported file extensions include .fastq, .fq, and .gz variants.
My FASTQ files are not being treated as a pair¶
Rename them to a standard pair pattern, or pass the pair explicitly in matching order.
Example:
gmlst typing mlst -s saureus_1 -b kma sample_R1.fastq.gz sample_R2.fastq.gz
Which backends support FASTQ?¶
- supported:
kma,minimap2 - not supported for FASTQ typing here:
blastn,nucmer
cgMLST-specific Issues¶
Why did typing cgmlst switch my FASTQ run to KMA?¶
This is expected. For FASTQ input, the CLI enforces a KMA-first policy. If you request -b minimap2 for cgMLST FASTQ, gmlst switches to kma and treats --cgmlst-mode as compatibility-only, effectively standard.
This behavior is documented in architecture.md and commands.md.
Which cgMLST mode should I use?¶
Start with:
standardif you want the most conservative default behaviorchew-fastfor common FASTA cgMLST workchew-ultrafastfor larger FASTA batches where speed matters morechew-balancedif you want more fallback reviewchew-bsrif protein-level exact matching matters for your analysis
Examples:
gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode standard sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode chew-fast sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode chew-ultrafast sample.fna
Large cgMLST schemes run very slowly¶
For large schemes, increase threads for backends that benefit from them, especially KMA.
gmlst typing cgmlst -s vparahaemolyticus_3 -b kma -t 8 sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 -b kma -t 16 sample.fna
Output Questions¶
What do TSV markers mean?¶
23, exact allele call~23, closest or non-exact high-coverage call15?, partial call-, missing locus
Should I use JSON or TSV?¶
Use TSV when you want compact, table-friendly output. Use JSON when you need full structured details such as per-locus metadata and novel_sequence fields.
Example:
gmlst typing mlst -s saureus_1 --format tsv sample.fasta -o result.tsv
gmlst typing mlst -s saureus_1 --format json sample.fasta -o result.json
GrapeTree export does not work for my scheme¶
GrapeTree export is intended for public or custom schemes that have the expected profile structure. If export fails, confirm the scheme is a downloaded public scheme or a custom scheme built through gmlst scheme create.
Example workflow:
gmlst scheme create -t mlst -s saureus_1 --data-dir novel_data --desc "Lab collection"
gmlst scheme export -s custom_1 --format grapetree -o mst.tsv
Configuration Questions¶
Which environment variables matter most?¶
Common ones are:
GMLST_CACHE_DIR, override cache root (auto-detected from conda/venv by default)GMLST_TMPDIR, override temporary working directoryGMLST_MINIMAP2_KMER_ENGINE, choosepython,kmc, orauto- provider URL overrides, such as
GMLST_PUBMLST_BASE_URL
Example:
export GMLST_CACHE_DIR="$HOME/.cache/gmlst"
export GMLST_TMPDIR="$PWD/.tmp/gmlst"
export GMLST_MINIMAP2_KMER_ENGINE=auto
Where are temporary files written?¶
By default they go to the system temporary area. Override with:
export GMLST_TMPDIR="$PWD/.tmp/gmlst"
Performance Tips¶
Which backend is usually fastest?¶
There is no single answer, but these are good starting points:
- assembly MLST: start with
blastnorminimap2 - FASTQ typing: start with
kma - FASTA cgMLST: start with
minimap2and a chew-style mode if appropriate - large FASTQ cgMLST: expect KMA-first behavior
How should I tune threads?¶
Use sample-level parallelism for many independent samples and backend threads for expensive per-sample work.
Examples:
gmlst typing mlst -s saureus_1 --max-workers 8 samples/*.fasta -o results.tsv
gmlst typing cgmlst -s vparahaemolyticus_3 -t 8 sample.fna
Should I process samples in batches?¶
Yes, especially for large cohorts. Reusing the cache and indexes is one of the easiest ways to save time.
Error Messages¶
Unknown backend 'X'¶
You requested a backend that is not registered in gmlst/aligners/__init__.py. Check supported backend names:
gmlst typing mlst --help
gmlst utils check -b blastn
Unknown provider 'X'¶
The provider is not registered in gmlst/database/providers/__init__.py.
Check available providers with:
gmlst scheme list
No typing result produced for input sample¶
This usually means the run failed before a valid typing result was assembled. Check input format, backend compatibility, scheme availability, and command output just before the failure.
Failed to download errors¶
These usually point to network or provider-side issues. Retry with a different --download-tool, then verify the remote host is reachable.
--verbose and --quiet cannot be used together¶
These flags are mutually exclusive. Pick one:
gmlst --verbose scheme list
gmlst --quiet scheme list
Still Stuck?¶
When reporting a bug or opening an issue, include:
- the exact command you ran
- your install method, pixi or pip
- input type, FASTA or FASTQ
- backend name
- scheme name
- relevant stderr output
- whether the problem happens again with
--format json
That makes it much easier to reproduce and fix the issue.