Skip to content

Python API

All public functions are in pixel_patrol_base.api.

from pixel_patrol_base import api

create_project

api.create_project(name, base_dir, loader=None, output_path=None) -> Project

Creates a new project.

Parameter Type Description
name str Project name, embedded in the report. Required - cannot be empty or whitespace-only.
base_dir str \| Path Root directory containing your dataset.
loader str \| None Loader plugin ID, e.g. "bioio". None = basic file info only.
output_path str \| Path \| None Where to save the .parquet file. Defaults to <base_dir>/<name>.parquet.
project = api.create_project(
    "my-experiment",
    base_dir="data/",
    loader="bioio",
    output_path="reports/my-experiment.parquet",
)

add_paths

api.add_paths(project, paths) -> Project

Adds subdirectory paths to the project. If specified, only files within those paths are processed and they become the default grouping when the report opens. Paths are relative to base_dir or absolute.

If not called, all supported files under base_dir are processed together.

api.add_paths(project, ["control", "treated"])
api.add_paths(project, "/abs/path/to/condition")  # absolute path also works

process_files

api.process_files(project, **kwargs) -> Project

Processes all files in the project paths and writes the .parquet report. Most parameters correspond directly to the CLI options. API-specific notes:

  • slice_size - dict mapping dimension name to block size, e.g. {"Z": 1, "Y": 512}. See slice-size.
  • processors_included / processors_excluded - sets of processor IDs, e.g. {"raster-basic", "thumbnail"}. See Available processors.
  • selected_file_extensions - set of extensions, e.g. {"tif", "nd2"}, or "all".
  • progress_callback - Callable[[int, int], None] called with (done, total) after each completed record. total is -1 until the full count is known.
api.process_files(
    project,
    selected_file_extensions={"tif", "nd2"},
    max_workers=8,
    mb_per_task=256,
    description="Batch 3 - fluorescence dataset",
)

External Dask cluster

from dask.distributed import Client

with Client("tcp://host:8786"):
    api.process_files(project)

view

api.view(source, port=8052, open_browser=True, **kwargs) -> None

Opens a report in the interactive viewer, backed by a local DuckDB server. source can be a processed Project object or a path to an existing .parquet file. Parameters correspond to the CLI options. API-specific notes:

  • filter_by - dict of the form {"col": {"op": "eq"|"in"|"gt"|..., "value": "val"}}.
  • dimensions - dict of dimension letter to slice index string, e.g. {"z": "0", "t": "0"}.
  • widgets_excluded - set of widget IDs, e.g. {"histogram", "mosaic"}. See Available widgets.
api.view(
    "report.parquet",
    group_col="path",
    filter_by={"file_extension": {"op": "in", "value": "tif,nd2"}},
    dimensions={"z": "0"},
    widgets_excluded={"histogram"},
)

load

api.load(src) -> tuple[DataFrame, ProjectMetadata]

Loads a saved .parquet report. Returns a Polars DataFrame and a metadata object.

records_df, metadata = api.load("report.parquet")
print(f"{metadata.project_name}: {len(records_df)} records")
print(records_df.head())

build_viewer

api.build_viewer(output) -> Path

Builds a static viewer for sharing or hosting.

  • If output ends in .html / .htm: writes a single self-contained HTML file.
  • Otherwise: writes a GitHub Pages-style site folder.
api.build_viewer("viewer.html")     # single file
api.build_viewer("gh-pages-out/")  # site folder

Warning

The static viewer runs entirely in the browser and may not be able to load very large parquet files (e.g. 5 GB+). For large reports use api.view() instead, which is backed by a local Python server with native DuckDB.