Create an Extension

A softly glowing cartoon shark

Pixel Patrol is built to be extended - without forking it. An extension is a regular, installable Python package that can add any combination of:

a custom loader - read a file format Pixel Patrol doesn't support out of the box
a custom processor - compute new metrics and add them as report columns
custom viewer plugins - visualize anything in the report with your own widgets

All three are optional, and one package can mix and match freely. The contracts for each (PixelPatrolLoader, PixelPatrolProcessor, the plugin object shape) are defined as typing.Protocols in pixel_patrol_base.core.contracts, not base classes - your classes just need to match the expected shape (the right NAME, methods, attributes, ...), with no import or inheritance from pixel_patrol_base required. That's what keeps extensions standalone, decoupled packages.

This page walks through all three pieces using Pixel HAI Watch - a complete, working, slightly playful example bundled with Pixel Patrol. Its twist: there are no real images. .parquet tables are read as if they were tiny snapshots from a deep-sea shark camera - each table's numeric columns become a pixel grid, and a key/value pair tucked into the file's metadata stands in for the kind of instrument metadata real loaders extract (channel names, pixel sizes, acquisition stamps, ...). Every snippet below is taken directly from it - open examples/minimal-extension/ alongside this page and follow along.

⚙ What are you building?

Answer the three questions below and the cards for pieces you don't need will dim out - so you can focus on the ones that matter for your extension.

Do you need to read a file format Pixel Patrol doesn't support yet?

Do you want to compute your own metrics from the image data?

Do you want a custom chart or visualization in the report viewer?

Click the ✓ in the corner of each card to mark it as reviewed and track your progress.

0 / 5 pieces reviewed

📦 Anatomy of an Extension always relevant

Any extension is a regular, installable Python package. Here's Pixel HAI Watch's layout - the role of each file is the blueprint for your own:

pixel_patrol_hai_watch/
├── my_loader.py             custom loader      - reads .parquet "dive patches"
├── my_processor.py          custom processor   - counts the glows in each patch
├── plugin_registry.py       registers loader, processor, and viewer extension
└── viewer/
    ├── extension.json                manifest listing the viewer plugins
    ├── plugin_dives_logged.js        metadata widget    - dives logged, by depth zone & site
    └── plugin_glow_by_depth.js       image-data widget  - glow sightings, by depth zone

Pixel Patrol finds all of this through Python entry points: three optional groups in pyproject.toml, each pointing at a function in your plugin_registry module.

[project.entry-points."pixel_patrol.loader_plugins"]
my_extension_loaders = "my_package.plugin_registry:register_loader_plugins"

[project.entry-points."pixel_patrol.processor_plugins"]
my_extension_processors = "my_package.plugin_registry:register_processor_plugins"

[project.entry-points."pixel_patrol.viewer_extensions"]
my_extension_viewer = "my_package.plugin_registry:get_viewer_extension_dir"

# plugin_registry.py
from pathlib import Path
from my_package.my_loader import MyLoader
from my_package.my_processor import MyProcessor

def register_loader_plugins():
    return [MyLoader]

def register_processor_plugins():
    return [MyProcessor]

def get_viewer_extension_dir():
    return Path(__file__).parent / "viewer"

💡

You only need to declare the entry-point groups your extension actually uses. A viewer-only extension can omit the loader/processor groups entirely - and a loader-only one can skip the viewer group just as easily.

✅

Once your package is installed in the same environment as pixel_patrol_base, everything is discovered automatically at runtime - no explicit registration or path needed when you call create_project(..., loader="your-loader-name") or serve_viewer(report_path).

📥 The Loader

🧭

Build this if Pixel Patrol can't read your file format - or doesn't read it (and its metadata) the way you want. Maybe your images live in a proprietary instrument format, a database export, or - like here - something delightfully unconventional.

A loader turns a file into a Record - pixel data plus metadata - that the rest of the pipeline can work with. Implement the PixelPatrolLoader protocol:

Member	Type	Required?	Purpose
`NAME`	`str`	yes	unique identifier passed to `create_project(..., loader=...)`
`SUPPORTED_EXTENSIONS`	`set[str]`	yes	file extensions this loader can read (lower-case, no dot)
`OUTPUT_SCHEMA`	`dict[str, type]`	yes	extra metadata columns this loader adds to the report, with their types
`read_header(path)`	`(Path) -> FileInfo`	yes	cheap shape/dtype/dim-order probe, no pixel data loaded
`load(path)`	`(Path) -> Record`	yes	loads one image and returns a `Record`
`load_range(path, start, stop)`	`(Path, int, int) -> Iterator[(str, Record)]`	yes	yields sub-images for container formats; raise `NotImplementedError` otherwise
`FOLDER_EXTENSIONS`	`set[str]`	no	"extensions" that mark a folder as one loadable unit (e.g. OME-Zarr stores); defaults to empty
`CONTAINER_EXTENSIONS`	`set[str]`	no	extensions that may contain more than one image (multi-series OME-TIFF, LMDB, ...); defaults to empty
`OUTPUT_SCHEMA_PATTERNS`	`list[tuple[str, type]]`	no	regex/type pairs for dynamically-named metadata columns (e.g. `pixel_size_X`); defaults to empty
`is_folder_supported(path)`	`(Path) -> bool`	no	whether a folder (not a file) should be treated as one image; only relevant if `FOLDER_EXTENSIONS` is non-empty

SharkCamLoader (NAME = "shark-cam") reads each table with pyarrow.parquet, stacks its columns into a 2-D array, decodes one field out of the schema metadata, and wraps it all with record_from(...):

class SharkCamLoader:
    NAME = "shark-cam"

    SUPPORTED_EXTENSIONS = {"parquet"}
    FOLDER_EXTENSIONS    = set()
    CONTAINER_EXTENSIONS = set()

    OUTPUT_SCHEMA          = {"depth_zone": str}
    OUTPUT_SCHEMA_PATTERNS = []

    def is_folder_supported(self, path):
        return False

    def read_header(self, file_path):
        meta = pq.ParquetFile(file_path).metadata
        return FileInfo(shape=(meta.num_rows, meta.num_columns), dtype=np.uint8, dim_order=("Y", "X"))

    def load(self, file_path):
        table = pq.read_table(file_path)

        # Each column is one pixel column (X); stacking them rebuilds the YX grid.
        columns = [table.column(name).to_numpy(zero_copy_only=False) for name in table.column_names]
        pixels = np.column_stack(columns).astype(np.uint8)

        raw_meta = table.schema.metadata or {}
        log_entry = {k.decode(): v.decode() for k, v in raw_meta.items()}
        meta = {
            "depth_zone": log_entry.get("depth_zone", "unknown"),
            "dim_order":  "YX",
        }
        return record_from(pixels, meta, kind="intensity")

    def load_range(self, file_path, start, stop):
        raise NotImplementedError("shark-cam is not a container format")

🔬 How a parquet table becomes a "dive patch"

Each file holds a small grid of uint8 columns - read column-by-column and stacked side by side, the table is the pixel grid (rows → Y, columns → X). The playful field, depth_zone (sunlit/twilight/midnight/abyss - which layer of the ocean the snapshot was taken in), is decoded straight out of table.schema.metadata - exactly the slot real formats (OME-XML in TIFFs, EXIF in JPEGs, ...) use to carry instrument and acquisition info.

✅

Declaring kind="intensity" and dim_order="YX" is what makes the built-in processors (basic metrics, histogram, thumbnail) pick the patches up automatically, right alongside your custom one. To a Pixel Patrol pipeline, a "dive patch" parquet table behaves just like any other 2-D image - that's the whole point of the exercise.

💡

read_header is called for every file during the initial scan and must stay cheap - it's your chance to report shape, dtype, and dimension order without paying the cost of loading pixel data.

⚙️ The Processor

🧭

Build this if you want to compute a metric on images - any images, regardless of who loaded them. Quality scores, object counts, anything beyond what the built-in processors already cover.

A processor receives loaded records and returns derived values that get merged into the report as new columns. Implement the PixelPatrolProcessor protocol - every member below is required:

Member	Type	Purpose
`NAME`	`str`	unique identifier (shown in pipeline logs)
`CHUNK_KIND`	`ChunkKind`	`LEAF` (user-configured tiles/slices) or `MEMORY` (whole record, memory-safe chunking)
`INPUT`	`RecordSpec`	which records this processor runs on (`kinds`, `axes`, `capabilities`, ...)
`OUTPUT`	`"features" \| "record"`	whether `run_chunk` returns columns to merge, or a brand-new `Record`
`OUTPUT_SCHEMA`	`dict[str, type]`	the columns this processor adds, with their types
`run_chunk(record)`	`(Record) -> dict`	does the actual computation on one chunk
`get_aggregation(name)`	`(str) -> Callable \| None`	how to combine multiple chunks' values for column `name` into the per-image value

GlowSpotterProcessor (NAME = "glow-spotter") runs on every intensity record with X/Y axes, uses CHUNK_KIND.LEAF, and adds one column - glow_count:

class GlowSpotterProcessor:
    NAME       = "glow-spotter"
    CHUNK_KIND = ChunkKind.LEAF
    INPUT      = RecordSpec(axes={"X", "Y"}, kinds={"intensity"})
    OUTPUT     = "features"

    OUTPUT_SCHEMA          = {"glow_count": int}
    OUTPUT_SCHEMA_PATTERNS = []

    def run_chunk(self, record):
        arr = record.data.compute() if hasattr(record.data, "compute") else np.asarray(record.data)
        arr = arr.astype(np.float32)

        threshold = np.median(arr) + 60.0
        glow_count = int(np.sum(arr > threshold))
        return {"glow_count": glow_count}

    def get_aggregation(self, col):
        if col != "glow_count":
            return None
        # Glows are independent per pixel, so chunk counts simply add up.
        return lambda rows, g_dims: sum(r["glow_count"] for r in rows)

🔬 How "glows" get counted

A pixel counts as part of a glow when it stands out clearly from the patch's overall brightness - brighter than its median by more than 60. Sunlit patches have almost none; the deeper and darker it gets, the more glows light up - exactly the way real bioluminescence concentrates in the dark, by construction. get_aggregation sums each chunk's glow_count into the per-image total - a pattern that works whenever the thing you're counting is independent per pixel, so splitting an image into pieces and adding the pieces' counts back up reconstructs the whole.

💡

CHUNK_KIND shapes how your data arrives, and which unit your computation needs to handle. LEAF - the more common pick for metric processors - tiles large images into memory-safe pieces and runs your computation on each one; MEMORY hands you the whole record at once, which is only safe when you know it comfortably fits in memory.

💡

OUTPUT = "features" merges your columns into the existing report - the right choice for almost any custom metric. "record" is for processors that produce a brand-new derived image instead (a mask, a projection, ...).

📊 The Viewer Plugin

🧭

Build this if you want to visualise report data in the browser - your own extension's columns, anyone else's, or any mix - with a chart the built-in widgets don't cover.

A viewer plugin is a small JavaScript module that renders a custom widget in the report viewer's sidebar, with full access to the report's data through an in-browser DuckDB instance (the table is always called pp_data). It exports one default object:

export default {
  id:    'my-widget',          // unique across all loaded plugins
  label: 'My Widget',          // shown in the sidebar widget list
  group: 'My Extension Name',  // optional - gives the widget its own sidebar section
  scope: 'image',              // optional - 'file' | 'image' | 'slice', shown as a badge
                               // describing what one datapoint in this widget represents

  requires(schema) {
    // return false to hide the widget when its columns are absent
    return schema.allCols.includes('my_column');
  },

  async render(container, ctx) {
    const rows = await ctx.queryRows(`
      SELECT my_column, COUNT(*) AS cnt
      FROM pp_data
      ${ctx.where}
      GROUP BY 1 ORDER BY 2 DESC
    `);
    // write into `container` using plain DOM, Plotly (window.Plotly), or any CDN library
  },
};

`ctx` field	Type	Description
`ctx.queryRows(sql)`	`async → object[]`	query returning plain JS objects
`ctx.query(sql)`	`async → Arrow Table`	raw Arrow result (for binary/blob columns)
`ctx.querySample(cols, n)`	`async → object[]`	sampled scalar shorthand
`ctx.schema`	`object`	`{ metricCols, groupCols, dimensionInfo, allCols, blobCols }`
`ctx.state`	`object`	`{ palette, groupCol, filter, dimensions }`
`ctx.colorMap`	`object`	`{ groupValue: hexColor }` - matches the colors used everywhere else in the report
`ctx.color.getColors(palette, n)`	`(string, number) -> string[]`	`n` colors from the named palette - for ad-hoc groupings (e.g. a column other than the active group-by) not covered by `colorMap`
`ctx.color.getPaletteNames()`	`() -> string[]`	palette names accepted by `ctx.color.getColors`
`ctx.where`	`string`	SQL `WHERE` clause for the active filter (or `''`) - merge with `AND` if your query needs its own
`ctx.groups`	`string[]`	distinct values of the active group column
`ctx.filteredCount` / `ctx.totalRows`	`number`	row counts

Pixel HAI Watch ships two plugins on purpose - one per kind of data a loader can surface. plugin_glow_by_depth.js plots glow_count (computed by the processor, from real pixel data) against depth_zone (read straight from the loader's metadata), as a jittered scatter colored by site:

const DEPTH_ORDER = ['sunlit', 'twilight', 'midnight', 'abyss'];

export default {
  id:    'glow-by-depth',
  label: 'Glow Sightings by Depth',
  group: 'Pixel HAI Watch',
  scope: 'image',

  requires(schema) {
    return ['depth_zone', 'glow_count'].every(c => schema.allCols.includes(c));
  },

  async render(container, ctx) {
    const rows = await ctx.queryRows(`
      SELECT "depth_zone" AS depth_zone, "imported_path_short" AS site, "glow_count" AS glows
      FROM pp_data
      WHERE "depth_zone" IS NOT NULL AND "glow_count" IS NOT NULL
        ${ctx.where ? 'AND ' + ctx.where.replace(/^WHERE\s+/i, '') : ''}
    `);

    const zones   = DEPTH_ORDER.filter(z => rows.some(r => r.depth_zone === z));
    const sites   = [...new Set(rows.map(r => r.site))].sort();
    const xJitter = () => (Math.random() - 0.5) * 0.5;   // keeps overlapping points visible

    // A scatter (rather than a box/violin) because each category holds only a
    // handful of points - exactly where distributional summaries would mislead.
    Plotly.newPlot(container, sites.map(site => {
      const sub = rows.filter(r => r.site === site);
      return {
        type: 'scatter', mode: 'markers', name: site,
        x: sub.map(r => zones.indexOf(r.depth_zone) + xJitter()),
        y: sub.map(r => Number(r.glows)),
        marker: { size: 12, color: ctx.colorMap[site] ?? '#888' },
      };
    }), { title: { text: 'How much bioluminescent glow shows up at each depth?' } });
  },
};

Both plugins are listed in a small manifest, loaded automatically by the viewer:

{
  "name": "Pixel HAI Watch Extension",
  "plugins": ["./plugin_dives_logged.js", "./plugin_glow_by_depth.js"]
}

✅

Both plugins declare group: 'Pixel HAI Watch', so they get their own named section in the sidebar instead of being lumped under "Other Widgets" - a small touch that makes an extension feel like a first-class part of the report.

💡

See the viewer README for the full plugin-writing guide, the complete ctx reference, and the extension-manifest format.

🚀 Run, Package & Share always relevant

Pixel Patrol discovers loaders, processors, and viewer extensions through Python entry points - which only works if your package is installed in the same environment as pixel_patrol_base. So first, make sure you're in that environment, then install this package into it:

uv pip install -e .

Then try it locally - this (re)generates the tiny dataset (if missing), processes it with the custom loader and processor, and opens the viewer with both widgets loaded:

uv run python create_and_show_report.py

Once it works, two ways to get it in front of others:

📦 Pip package

Because the JS viewer plugins are bundled inside the Python package, a recipient just installs it - pip install pixel-patrol-hai-watch - and any report opened with serve_viewer(...) picks up the plugins automatically. No extra arguments, no separate hosting.

🌐 GitHub Pages

No Python required on the recipient's side. A bundled GitHub Actions workflow deploys the viewer/ folder; the manifest then lives at a public URL you can pass straight to the hosted viewer:
?extension=https://<org>.github.io/<repo>/extension.json (repeat &extension= to chain several).

🌱

Ready to grow your own? Copy examples/minimal-extension/, decide which piece(s) you actually need (see the questions at the top of this page), update the pyproject.toml metadata, and replace the example identifiers with your own - one piece at a time. The protocols will tell you exactly what's still missing as you go, and nothing stops you from running an unfinished extension while you build it out.