Processing

Used the launcher instead?

Double-click it, a browser tab opens, and you can set up your project and start processing from there - no terminal needed. The rest of this page is for the CLI workflow.

pixel-patrol process is the first step in the Pixel Patrol workflow. It scans your images and produces a .parquet report file containing everything Pixel Patrol knows about your dataset - file metadata, image dimensions, pixel statistics, quality metrics, and thumbnails.

Answer the questions below and we'll walk you through each decision together, building your command as we go. By the end you'll understand not just what to run, but why each flag is there.

Make sure you're inside a virtual environment

Before running any pixel-patrol command, activate the virtual environment where you installed it. If you followed the uv instructions, run:

source .venv/bin/activate        # macOS / Linux
.venv\Scripts\Activate.ps1       # Windows

Not sure? Go through the Installation tutorial first.

Explore all options first

Before going through the questions, it's worth running pixel-patrol process --help to see every available flag and its default. The wizard covers all of them, but --help gives you the full picture at a glance.

Your command

pixel-patrol process <path/to/images/> \
  -o report.parquet

⚠️ There's no Python-API equivalent for SLURM clusters - pixel-patrol-slurm is a CLI launcher around dask_jobqueue.SLURMCluster. Switch to the CLI tab above for a ready-to-run SLURM command, or see Connecting to an external Dask cluster to wire up a SLURMCluster by hand from a script.

📁 Where are your images?

This is the root folder of your dataset - the BASE_DIRECTORY argument in the command. Pixel Patrol will scan it recursively, so you don't need to list subdirectories separately. Use an absolute path (e.g. /data/my-experiment/) or a path relative to where you'll run the command. No images of your own yet? The repo ships a small example dataset at examples/datasets/WHOI_processed_color/ (40 plankton images, four tampered variants of the same originals, ~1.3 MB total) - point BASE_DIRECTORY there and follow along. It's the very dataset behind the example report used in the next tutorial.

🔬 What format are your images?

Pixel Patrol uses loaders to open image files and extract their content. This sets the --loader flag and determines what ends up in your report: without a loader you only get basic file system info (names, sizes, extensions); with one you also get image dimensions, pixel type, acquisition metadata, and the pixel data needed for statistics and thumbnails. Choose the one that matches your file format.

TIFF, CZI, ND2, LIF, PNG, JPG, or other common formats --loader bioio

Recommended for most microscopy and general image datasets

Zarr datasets --loader zarr

TIFF only - lightweight loader --loader tifffile

Faster when your dataset is exclusively TIFF files

Just basic file info - no image reading

Collects file names, sizes, and extensions only. No loader required.

🗃️ Restrict to specific file extensions? (optional)

By default the loader processes all file formats it supports. If your folder contains mixed file types and you only want to process some of them, list the extensions here - each becomes a -e flag. Leave blank to process everything the loader supports. Example: tif, nd2, czi

🗂️ Do you have multiple experimental conditions or groups?

Use this if your images are organized into subfolders - one per condition, batch, or timepoint. Specifying subfolders does two things: it limits processing to only those folders (others are ignored), and it sets each one as a labeled group in the report, shown in different colors for easy comparison. This grouping is the default - you can always regroup interactively in the viewer later. If you skip this, all images under the base folder are processed as one group.

No - process everything as one group

Yes - I have subfolders to compare

Which subfolders should be compared?

Comma-separated paths relative to your base directory - each becomes a -p flag and a labeled group. Can be immediate subfolders or deeper (e.g. batch_1/control). Only include the ones you want to compare. Example: control, treated_a, treated_b

💾 Where should the output report be saved?

A path (relative or absolute) for the output .parquet file - set by -o. This file holds all image metadata, pixel statistics, and thumbnails, and can be shared with collaborators who can open it in the online viewer without installing anything.

🏷️ Give your project a name (optional)

Sets --name. Shown in the viewer header and embedded in the report file. If left empty, the name defaults to the base directory folder name.

📝 Add a description (optional)

Sets --description. Free-form text shown below the title in the viewer and embedded in the report. Useful for recording what the dataset is, who processed it, or any caveats.

🖥️ Are you running on an HPC cluster?

Pixel Patrol processes images in parallel using Dask. On a local machine it auto-detects a sensible number of workers based on your CPUs and RAM - no configuration needed. On a cluster you can harness many more resources, which makes a real difference for large datasets with thousands of images or very large volumes.

No - running locally

Yes - using a cluster

Is your cluster managed by SLURM?

SLURM is the most widely used job scheduler on HPC clusters. If your cluster uses it, pixel-patrol-slurm handles everything: it submits worker jobs, waits for them to come online, runs the processing, and cleans up - all in one command. If you're using a different setup (e.g. you already have a running Dask cluster), choose the second option and provide the scheduler address instead.

Yes - SLURM

No - I have a Dask scheduler URL

e.g. from a manually started Dask cluster

SLURM cluster settings

First, install the SLURM wrapper in the same environment:
pip install pixel-patrol-slurm

Number of jobs

Cores per job

Memory per job

Partition (optional)

Wall time

Dask scheduler URL

The address of your running Dask scheduler, set by --scheduler.

✓ Your command is ready above. Once it finishes, open the report with pixel-patrol view report.parquet.

Advanced options

These options are rarely needed for a first run. They let you fine-tune processing behaviour for specific scenarios.

Slice size --slice-size (optional)

Controls the per-dimension granularity of statistics in the report. By default, non-spatial dimensions (Z, T, C, S) each step by 1 - one row of stats per slice. Set a higher step (e.g. Z=5) for coarser, smaller output, or -1 to collapse a dimension entirely. Only relevant if you have multidimensional data and care about per-slice statistics. Comma-separated for multiple dims. Example: Z=5, C=1

Run only these processors --processors-include (optional)

By default all installed processors run: raster-basic, raster-histogram, thumbnail, raster-quality, raster-compression. List specific ones here to run only those - takes precedence over exclude. Useful for speeding up processing when you only need a subset of metrics.

Skip these processors --processors-exclude (optional)

Exclude specific processors while running all others. Ignored if include is set. Example: raster-quality to skip the quality metrics and speed up a large run.

Worker count --max-workers (optional)

Number of parallel Dask workers (default: auto, based on CPUs and RAM). Lower this if processing causes out-of-memory errors. Use 1 to disable parallelism entirely, which is useful for debugging.

MB per task --mb-per-task (default: 512)

Memory budget per Dask task in MB. Controls how files are batched: increase (e.g. 2048) for datasets with many tiny files to reduce scheduling overhead; decrease (e.g. 128) for large 3D volumes or container files with large sub-images to keep individual tasks short and prevent memory spikes.

Max images per task --max-images-per-task (default: 200)

Maximum number of files (or sub-images) grouped into a single task. Lower this if individual tasks are timing out or you want finer-grained progress reporting.

Rows per part --rows-per-part (default: 10000)

Number of result rows buffered in memory before being flushed to a temporary file on disk. Only relevant for very large datasets where memory is tight. Rarely needs changing.

Write a debug log --log-file

Writes a detailed debug log file alongside the output parquet. Useful for diagnosing slow or failed runs.

Enable debug logging

Next step

Once processing finishes, open your report with:

pixel-patrol view report.parquet

Or drag the .parquet file into the online viewer - no install needed on the recipient's side.