FAQ and Troubleshooting¶

This page collects common questions and practical fixes for gmlst. For installation and first-run instructions, see installation.md and quickstart.md. For command syntax, see commands.md.

General Questions¶

What is gmlst?¶

gmlst is a Python CLI for bacterial genome typing. It supports classical MLST, larger cgMLST and wgMLST schemes, and scheme-free typing with gmlst typing tgmlst.

What is the difference between MLST and cgMLST?¶

MLST usually uses a small number of housekeeping loci, often 7
cgMLST uses a much larger set of core-genome loci
wgMLST goes broader still, often including accessory loci

In practice, MLST is lighter and easier to compare quickly. cgMLST gives finer resolution, but needs more compute and more complete locus recovery.

Why does gmlst support multiple alignment backends?¶

Different inputs and workloads need different tools.

blastn is a good baseline for assembly-based MLST
kma is well suited to read-based workflows and cgMLST FASTQ paths
minimap2 is fast and flexible for assemblies and some read workflows
nucmer can be useful for assembly comparison and alternate evidence

The project keeps the output model normalized so downstream calling logic can stay consistent across backends.

Installation Issues¶

`pixi: command not found`¶

Pixi is not installed, or your shell has not picked up the new PATH yet.

curl -fsSL https://pixi.sh/install.sh | bash

Then open a new terminal and verify:

pixi --version

`Python 3.12 is required`¶

Your Python is too old for the package.

Check:

python --version
python3 --version

If you want the easiest path, use pixi instead of a system Python install:

git clone https://github.com/indexofire/gmlst.git
cd gmlst
pixi install
pixi run gmlst --version

`pip install gmlst` fails¶

Common reasons:

Python is not 3.12
build tools or environment are inconsistent
external tools are missing and later commands fail

Try a clean virtual environment:

python3.12 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install gmlst

If you still hit environment problems, use pixi.

`blastn` not found¶

blastn is part of BLAST+. If you installed with pixi, run inside the pixi environment. If you used pip, install BLAST+ separately.

Check:

gmlst utils check -b blastn

Or:

pixi run gmlst utils check -b blastn

I get permission errors during install¶

Do not use sudo pip or sudo pixi for normal local installs. Use a user-local environment.

Good options:

pixi environment
Python virtual environment
conda or mamba environment

Scheme Management Issues¶

`Scheme 'X' not found in catalog`¶

The name must match the canonical scheme_name, not just the organism name.

Check available schemes:

gmlst scheme list
gmlst scheme list -p pubmlst
gmlst scheme list -p enterobase -t cgmlst

Then use the exact scheme name, for example:

gmlst scheme download -s saureus_1

Scheme download fails¶

Typical causes:

temporary network failure
provider-side outage or rate limit
a download tool that does not work well in your environment

Try a different download tool:

gmlst scheme download -s saureus_1 --download-tool curl
gmlst scheme download -s saureus_1 --download-tool requests
gmlst scheme download -s saureus_1 --download-tool wget

If you are behind a proxy or firewall, test provider access separately.

Where is the cache stored?¶

The cache root is auto-detected: $CONDA_PREFIX/share/gmlst in conda environments, $VIRTUAL_ENV/.cache/gmlst in virtualenvs, or ~/.cache/gmlst by default. Each environment gets its own isolated cache.

You can override it:

export GMLST_CACHE_DIR="$HOME/work/gmlst-cache"
gmlst scheme list

To share a cache across conda environments:

export GMLST_CACHE_DIR="$HOME/.cache/gmlst"

I want to use a different cache directory for one command¶

Some commands also support --cache-dir.

gmlst scheme list --cache-dir /tmp/gmlst-cache
gmlst scheme download -s saureus_1 --cache-dir /tmp/gmlst-cache

A scheme is missing from `scheme list`¶

It may be blocked by gmlst/data/blocked_schemes.json.

That file is used to hide or reject selected schemes in list, show, download, and update flows. If you are maintaining the project and need to change block policy, edit that JSON file carefully.

Provider URL overrides for self-hosted BIGSdb¶

You can point gmlst at a custom BIGSdb host with environment variables.

export GMLST_PRIVATE_BIGSDB_URL="http://127.0.0.1:9000/api/db"
export GMLST_PRIVATE_BIGSDB_NAME="labdb"
export GMLST_PRIVATE_BIGSDB_LABEL="Lab BIGSdb"
gmlst scheme list -p labdb

For built-in hosts you can also override base URLs:

export GMLST_PUBMLST_BASE_URL="https://rest.pubmlst.org/db"
export GMLST_PASTEUR_BASE_URL="https://bigsdb.pasteur.fr/api/db"

Typing Issues¶

ST is `-`¶

An ST of - means there is no exact profile match. Common reasons:

one or more loci are missing
one or more loci are partial
one or more loci are closest or novel-like instead of exact
multicopy conflicts exist, such as 1,2

What to do next:

gmlst typing mlst -s saureus_1 --format json sample.fasta -o sample.json
gmlst typing mlst -s saureus_1 --format pretty sample.fasta

The JSON output gives more detail for each locus.

I see multicopy alleles like `1,2`¶

That means multiple conflicting hits were found for the locus. This can happen with duplicated loci, repeated sequence, or ambiguous mapping.

If you are reviewing assemblies with blastn, try:

gmlst typing mlst -s saureus_1 -b blastn --count-same-copy sample.fasta

This can expose same-allele copy notation such as 1,1 in supported cases.

Many loci are missing¶

Common causes:

wrong scheme for the organism
low assembly quality
truncated contigs
FASTQ data used with a backend that does not support FASTQ

Check the scheme first:

gmlst scheme show -s saureus_1
gmlst scheme list -n aureus

Then try a backend that fits the input type:

FASTA: blastn, minimap2, nucmer, kma
FASTQ: kma or minimap2

Typing is slower than expected¶

Try these checks:

gmlst typing mlst -s saureus_1 --max-workers 4 samples/*.fasta -o results.tsv
gmlst typing cgmlst -s vparahaemolyticus_3 -t 8 sample.fasta

More tips are in the Performance Tips section below.

FASTQ input fails with `blastn` or `nucmer`¶

That is expected. blastn and nucmer are assembly-oriented in this project. Use kma or minimap2 for FASTQ input.

Example:

gmlst typing mlst -s saureus_1 -b kma reads_R1.fastq.gz reads_R2.fastq.gz
gmlst typing mlst -s saureus_1 -b minimap2 reads_R1.fastq.gz reads_R2.fastq.gz

FASTQ-specific Issues¶

How does paired-end detection work?¶

gmlst detects common naming patterns such as:

_R1 and _R2
_1 and _2
.1 and .2

Supported file extensions include .fastq, .fq, and .gz variants.

My FASTQ files are not being treated as a pair¶

Rename them to a standard pair pattern, or pass the pair explicitly in matching order.

Example:

gmlst typing mlst -s saureus_1 -b kma sample_R1.fastq.gz sample_R2.fastq.gz

Which backends support FASTQ?¶

supported: kma, minimap2
not supported for FASTQ typing here: blastn, nucmer

cgMLST-specific Issues¶

Why did `typing cgmlst` switch my FASTQ run to KMA?¶

This is expected. For FASTQ input, the CLI enforces a KMA-first policy. If you request -b minimap2 for cgMLST FASTQ, gmlst switches to kma and treats --cgmlst-mode as compatibility-only, effectively standard.

This behavior is documented in architecture.md and commands.md.

Which cgMLST mode should I use?¶

Start with:

standard if you want the most conservative default behavior
chew-fast for common FASTA cgMLST work
chew-ultrafast for larger FASTA batches where speed matters more
chew-balanced if you want more fallback review
chew-bsr if protein-level exact matching matters for your analysis

Examples:

gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode standard sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode chew-fast sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 --cgmlst-mode chew-ultrafast sample.fna

Large cgMLST schemes run very slowly¶

For large schemes, increase threads for backends that benefit from them, especially KMA.

gmlst typing cgmlst -s vparahaemolyticus_3 -b kma -t 8 sample.fna
gmlst typing cgmlst -s vparahaemolyticus_3 -b kma -t 16 sample.fna

Output Questions¶

What do TSV markers mean?¶

23, exact allele call
~23, closest or non-exact high-coverage call
15?, partial call
-, missing locus

Should I use JSON or TSV?¶

Use TSV when you want compact, table-friendly output. Use JSON when you need full structured details such as per-locus metadata and novel_sequence fields.

Example:

gmlst typing mlst -s saureus_1 --format tsv sample.fasta -o result.tsv
gmlst typing mlst -s saureus_1 --format json sample.fasta -o result.json

GrapeTree export does not work for my scheme¶

GrapeTree export is intended for public or custom schemes that have the expected profile structure. If export fails, confirm the scheme is a downloaded public scheme or a custom scheme built through gmlst scheme create.

Example workflow:

gmlst scheme create -t mlst -s saureus_1 --data-dir novel_data --desc "Lab collection"
gmlst scheme export -s custom_1 --format grapetree -o mst.tsv

Configuration Questions¶

Which environment variables matter most?¶

Common ones are:

GMLST_CACHE_DIR, override cache root (auto-detected from conda/venv by default)
GMLST_TMPDIR, override temporary working directory
GMLST_MINIMAP2_KMER_ENGINE, choose python, kmc, or auto
provider URL overrides, such as GMLST_PUBMLST_BASE_URL

Example:

export GMLST_CACHE_DIR="$HOME/.cache/gmlst"
export GMLST_TMPDIR="$PWD/.tmp/gmlst"
export GMLST_MINIMAP2_KMER_ENGINE=auto

Where are temporary files written?¶

By default they go to the system temporary area. Override with:

export GMLST_TMPDIR="$PWD/.tmp/gmlst"

Performance Tips¶

Which backend is usually fastest?¶

There is no single answer, but these are good starting points:

assembly MLST: start with blastn or minimap2
FASTQ typing: start with kma
FASTA cgMLST: start with minimap2 and a chew-style mode if appropriate
large FASTQ cgMLST: expect KMA-first behavior

How should I tune threads?¶

Use sample-level parallelism for many independent samples and backend threads for expensive per-sample work.

Examples:

gmlst typing mlst -s saureus_1 --max-workers 8 samples/*.fasta -o results.tsv
gmlst typing cgmlst -s vparahaemolyticus_3 -t 8 sample.fna

Should I process samples in batches?¶

Yes, especially for large cohorts. Reusing the cache and indexes is one of the easiest ways to save time.

Error Messages¶

`Unknown backend 'X'`¶

You requested a backend that is not registered in gmlst/aligners/__init__.py. Check supported backend names:

gmlst typing mlst --help
gmlst utils check -b blastn

`Unknown provider 'X'`¶

The provider is not registered in gmlst/database/providers/__init__.py.

Check available providers with:

gmlst scheme list

`No typing result produced for input sample`¶

This usually means the run failed before a valid typing result was assembled. Check input format, backend compatibility, scheme availability, and command output just before the failure.

`Failed to download` errors¶

These usually point to network or provider-side issues. Retry with a different --download-tool, then verify the remote host is reachable.

`--verbose and --quiet cannot be used together`¶

These flags are mutually exclusive. Pick one:

gmlst --verbose scheme list
gmlst --quiet scheme list

Still Stuck?¶

When reporting a bug or opening an issue, include:

the exact command you ran
your install method, pixi or pip
input type, FASTA or FASTQ
backend name
scheme name
relevant stderr output
whether the problem happens again with --format json

That makes it much easier to reproduce and fix the issue.

FAQ and Troubleshooting¶

General Questions¶

What is gmlst?¶

What is the difference between MLST and cgMLST?¶

Why does gmlst support multiple alignment backends?¶

Installation Issues¶

pixi: command not found¶

Python 3.12 is required¶

pip install gmlst fails¶

blastn not found¶

I get permission errors during install¶

Scheme Management Issues¶

Scheme 'X' not found in catalog¶

Scheme download fails¶

Where is the cache stored?¶

I want to use a different cache directory for one command¶

A scheme is missing from scheme list¶

Provider URL overrides for self-hosted BIGSdb¶

Typing Issues¶

ST is -¶

I see multicopy alleles like 1,2¶

Many loci are missing¶

Typing is slower than expected¶

FASTQ input fails with blastn or nucmer¶

FASTQ-specific Issues¶

How does paired-end detection work?¶

My FASTQ files are not being treated as a pair¶

Which backends support FASTQ?¶

cgMLST-specific Issues¶

Why did typing cgmlst switch my FASTQ run to KMA?¶

Which cgMLST mode should I use?¶

Large cgMLST schemes run very slowly¶

Output Questions¶

What do TSV markers mean?¶

Should I use JSON or TSV?¶

GrapeTree export does not work for my scheme¶

Configuration Questions¶

Which environment variables matter most?¶

Where are temporary files written?¶

Performance Tips¶

Which backend is usually fastest?¶

How should I tune threads?¶

Should I process samples in batches?¶

Error Messages¶

Unknown backend 'X'¶

Unknown provider 'X'¶

No typing result produced for input sample¶

Failed to download errors¶

--verbose and --quiet cannot be used together¶

Still Stuck?¶

`pixi: command not found`¶

`Python 3.12 is required`¶

`pip install gmlst` fails¶

`blastn` not found¶

`Scheme 'X' not found in catalog`¶

A scheme is missing from `scheme list`¶

ST is `-`¶

I see multicopy alleles like `1,2`¶

FASTQ input fails with `blastn` or `nucmer`¶

Why did `typing cgmlst` switch my FASTQ run to KMA?¶

`Unknown backend 'X'`¶

`Unknown provider 'X'`¶

`No typing result produced for input sample`¶

`Failed to download` errors¶

`--verbose and --quiet cannot be used together`¶