Reproducible Bioinformatics with Pixi, Nextflow & Snakemake
1 Session 1 — Introduction
1.1 What is reproducibility?
- Ability to re-run analyses and get the same results
- Requires: environments, versions, workflows, metadata
- Reproducibility in bioinformatics is notoriously difficult.
- Traditional Conda/Mamba environments helps but often drift over time, break across platforms, or install different dependency versions depending on when and where they are solved.
1.2 Why Pixi?
Pixi solves these issues through:
✅ Automatic lockfiles ensuring identical environments everywhere
✅ Multi-platform resolution (Linux, macOS Intel/ARM, Windows)
✅ Local project‑scoped environments for clean reproducible analysis
✅ Task runner replacing Makefiles, bash scripts & fragile command chains
✅ Rust‑based solver → significantly faster than Conda
This makes Pixi ideal for:
- Bioinformatics pipelines
- Teaching environments
- HPC systems
- Snakemake / Nextflow workflows
- Collaborative research groups
1.3 Conda vs Pixi: A Quick Comparison
- No strict version pinning
- Environment drift
- Slow solving
- Not fully reproducible
- Strict lockfiles
- Zero solver
- Fast, deterministic builds
- Perfect for workflows
2 Session 2 — Pixi Basics
2.1 Install Pixi
On Linux:
# Install Pixi via the official script
curl -fsSL https://pixi.sh/install.sh | bash
# Restart your shell or source the profile to update PATH
source ~/.bashrc # or ~/.zshrc
# Verify installation
pixi --versionIf the version prints — you’re ready.
2.2 Create a Pixi Project
we like to analyse RNA‑seq data, so let’s create a project for that. In your terminal:
# create a new directory for your project and initialize Pixi
mkdir rnaseq-qc
# Navigate into the project directory
cd rnaseq-qc
# Initialize a new Pixi project with conda-forge and bioconda channels
pixi init --channel conda-forge --channel biocondaThis creates:
| File | Purpose |
|---|---|
| pixi.toml | Human-edited project configuration |
2.3 Add your platform
pixi platform add linux-64This ensures that the environment is solved for your specific platform. If you want to share with collaborators on different platforms, add those too:
pixi platform add osx-arm64
pixi platform add osx-64
pixi platform add win-642.4 Add tools or packages
add bioinformatics tools from conda channels:
pixi add fastqc samtools python=3.11add tools from PyPI channels:
pixi add --pypi multiqcPixi automatically updates the lockfile. ## Install & test tools @ pixi environment ### Install tools in the environment:
# Install the environment based on pixi.toml and pixi.lock
pixi installThis creates: | File | Purpose | |———–|——————————————-| | pixi.lock | Automatically generated lockfile | | .pixi/ | fully environment directory |
This creates a fully reproducible environment in .pixi/ with the exact versions of all dependencies. You can share the pixi.toml and pixi.lock files with collaborators to ensure they get the same environment.
2.4.1 Test the tools to confirm they are installed correctly, you can run:
# Check versions to confirm correct installation
pixi run fastqc --version
pixi run samtools --version2.4.2 interactive shell:
pixi shell
# Inside the shell, you can run any command with the environment activated
fastqc --version
samtools --versionto quit the shell when done:
exit2.5 Use the Pixi environment for data analysis
Now that you have your environment set up, you can run your bioinformatics analyses using pixi run to ensure reproducibility. But first, let’s create some sample data to work with. ### Create Sample Data (for demo)
mkdir -p data genome_filesCreate a dummy fastq files and reference files for testing. Useful for RNA‑seq, WGS, metagenomics, QC teaching, alignment modules.
create a forward reads fastq file or R1:
cat << 'EOF' | gzip > data/sample_R1.fastq.gz
@READ_0001/1
ACGTTGACCTGATCGTAGGCTAATCGTAGGCTATGCTAGCTAGCA
+
IIIIIIIIHIIIHIIIIIIIIIIIIHIIIGIIIIHIIIIIIIII
@READ_0002/1
TTGACCGTAGCTAGCTAGGATCGTAGCATGATGCTAGCTAGGTCA
+
IIIIIIHIGIIHIIIIIIIIIIIIHIIIIIIGIIIIHIIIIIII
EOFcreate a reverse reads fastq file or R2:
cat << 'EOF' | gzip > data/sample_R2.fastq.gz
@READ_0001/2
TGCTAGCTAGCATAGCCTACGATTAGCCTACGATCAGGTCAACGT
+
IIIIIIIIIIIIHIIIHIIHIIIIIIIHIIIIIGIIIIHIIIII
@READ_0002/2
TGACCTAGCTAGCATGCTACGATCCTAGCTAGCTAGCTACGGCAA
+
IIIIHIIIIIIIIIIIHIIIIGIIIIIIHIIIIHIIIIIIIIII
EOFcreate reference.fa file:
cat << 'EOF' > data/reference.fa
>chrDemo
ATGCGTACGTTAGCGTACGTAGCTAGCTAGGCTAGCTAGGCGTACGATCGTAGGCTAACGTTAGCGATCGTAGCTAGCTAGGATCGTACGATCGTACGATCGTAGCTAGCGTTA
EOF2.5.1 Run analyseis with Pixi:
Run ctools with pixi run to ensure they use the exact same environment every time, regardless of where or when you run them.
Recommended (reproducible):
pixi run fastqc data/sample_R1.fastq.gz
pixi run fastqc data/sample_R2.fastq.gz- Interactive debugging:
pixi shell→ runs commands in an interactive shell with the environment activated. Useful for testing commands or exploring the environment.
2.6 Add Tasks in pixi.toml
Inside pixi.toml: Add tasks to automate your workflow. Edit pixi.toml:
[tasks]
qc = "fastqc data/*fastq.gz -o results/"
report = "multiqc results/ -o reports/"Or add tasks via command line:
# Add tasks to pixi.toml
pixi task add qc "fastqc data/*fastq.gz* -o results/"
# Add a report task that depends on the qc task
pixi task add report "multiqc results/ -o reports/"
# Add a clean task to remove results (optional)
pixi task add clean "rm -rf results/*"2.6.1 Run Tasks individually:
# Run individual tasks
pixi run qc
# Run the report task (which depends on qc)
pixi run report2.7 Hands-on Exercise on Run Tasks:
- Run the
qctask to perform quality control on the sample fastq files. Check theresults/directory to see the output. - Run the
reporttask to generate a multiqc report from the QC results. Check thereports/directory for the multiqc report. - Try running the
reporttask without runningqcfirst. What happens? (Hint: it should fail because thereporttask depends on the output ofqc).
- Next session (Now, let’s add a
pipelinetask that depends on bothqcandreportto run the entire workflow in one command.)
3 Session 3 — Pipelines
3.1 Automation with Pipelines
You can define a pipeline task that depends on multiple tasks to run them in the correct order.
3.1.1 Add Tasks as dependencies in a pipeline:
# In pixi.toml, add dependencies to ensure tasks run in the correct order
report = { cmd = "multiqc results/ -o reports/", depends-on = ["qc"] }
# This ensures that 'report' will only run after 'qc' has successfully completed.
pipeline = { depends-on = ["qc", "report"] }3.2 Run the pipeline (all tasks in order):
# Now you can run the entire pipeline with one command, and Pixi will handle the task dependencies for you.
pixi run pipelineOR add the pipeline task via command line:
nano pixi.tomlcopy & paste the followings in the [tasks] section at pixi.toml:
[tasks]
qc = "fastqc data/*.fastq* -o results/"
report = { cmd = "multiqc results/ -o reports/", depends-on = ["qc"] }
pipeline = { depends-on = ["qc", "report"] }Now run the pipeline to run all tasks in the correct order:
pixi run pipeline4 Session 4 — Nextflow + Pixi
See the bar up to the right of this page for the Nextflow run, or click here to jump to the Nextflow run.
5 Session 5 — Snakemake + Pixi
See the bar up to the right of this page for the Snakemake run, or click here to jump to the Snakemake run.
6 Wrap-up — Sharing & Best Practices
6.2 Multi-Environment Features
Example:
# QC tools feature
[feature.qc.dependencies]
fastqc = "*"
python = ">=3.11"
[feature.qc.pypi-dependencies]
multiqc = "*"
# Alignment tools feature
[feature.alignment.dependencies]
bwa = "*"
samtools = "*"
star = "*"
# Python 2 legacy tool (conflicts with QC)
[feature.legacy.dependencies]
python = "2.7.*"
htseq = "*"
# Define environments from features
[environments]
qc = ["qc"] # Just QC tools
alignment = ["alignment"] # Just alignment tools
legacy = ["legacy"] # Just Python 2 tools
default = ["qc", "alignment"] # Everything except legacyRun in a specific environment: !! alert fix the command below to use pixi run instead of pixi shell for reproducibility!!!
pixi run -e qc multiqc results/
pixi run -e alignment bwa index data/reference.fa
pixi run -e legacy htseq-count # Uses Python 26.3 Best Practices
- Commit
pixi.toml+pixi.lock - Do not commit
.pixi/ - Use tasks for all analyses
- Use features when tools conflict
- Use pixi run, not pixi shell, in workflows
- Use distinct environments for teaching modules
6.4 some shortcuts:
# List all packages:
pixi list
# Search for a package:
pixi search samtools
# By default, pixi search doesn’t list all available versions. To list more package versions, use the -l <int> flag
pixi search -l 40 samtools
# Update a specific package:
pixi update samtools
# Update all packages:
pixi update
# Update all packages including dependencies:
pixi upgrade
# Removing Packages
pixi remove samtools
# Remove environment binaries (will be recreated from lock file):
pixi clean
# Remove downloaded package cache:
pixi clean cache
# Remove all environments and cache (use with caution):
pixi clean all
# Check for updates to Pixi itself:
pixi self-update
# Get help on any command:
pixi --help
# Get help on a specific command:
pixi run --help
# Get help on a specific task:
pixi run qc --help
# Get help on a specific environment:
pixi run -e qc --help
# Get help on a specific feature:
pixi run -f qc --help
# Get help on a specific package:
pixi run -p samtools --help
# Get help on a specific version of a package:
pixi run -p samtools=1.16 --help
# Run a command in an interactive shell with the environment activated:
pixi shell
# Inside the shell, you can run any command with the environment activated
fastqc --version
samtools --version
# Exit the shell when done
exit6.5 Git (to do):
# Manage git tracking for Pixi files:
# Safe to delete (regenerated from pixi.lock):
.pixi/ directory
# Ignore the .pixi/ directory in git:
echo ".pixi/" >> .gitignore
# Must keep (commit to git):
pixi.toml - your configuration
pixi.lock - exact package versions6.6 Resources
Documentation: https://pixi.sh/latest GitHub: https://github.com/prefix-dev/pixi Examples: https://github.com/prefix-dev/pixi/tree/main/examples Tutorial: https://pixi.sh/latest/tutorials/python/