Create an Extension

Pixel Patrol is built to be extended - without forking it. An extension is a regular, installable Python package that can add any combination of:
- a custom loader - read a file format Pixel Patrol doesn't support out of the box
- a custom processor - compute new metrics and add them as report columns
- custom viewer plugins - visualize anything in the report with your own widgets
All three are optional, and one package can mix and match freely. The contracts for each (PixelPatrolLoader, PixelPatrolProcessor, the plugin object shape) are defined as typing.Protocols in pixel_patrol_base.core.contracts, not base classes - your classes just need to match the expected shape (the right NAME, methods, attributes, ...), with no import or inheritance from pixel_patrol_base required. That's what keeps extensions standalone, decoupled packages.
This page walks through all three pieces using Pixel HAI Watch - a complete, working, slightly playful example bundled with Pixel Patrol. Its twist: there are no real images. .parquet tables are read as if they were tiny snapshots from a deep-sea shark camera - each table's numeric columns become a pixel grid, and a key/value pair tucked into the file's metadata stands in for the kind of instrument metadata real loaders extract (channel names, pixel sizes, acquisition stamps, ...). Every snippet below is taken directly from it - open examples/minimal-extension/ alongside this page and follow along.
Answer the three questions below and the cards for pieces you don't need will dim out - so you can focus on the ones that matter for your extension.
Click the ✓ in the corner of each card to mark it as reviewed and track your progress.
Any extension is a regular, installable Python package. Here's Pixel HAI Watch's layout - the role of each file is the blueprint for your own:
pixel_patrol_hai_watch/
├── my_loader.py custom loader - reads .parquet "dive patches"
├── my_processor.py custom processor - counts the glows in each patch
├── plugin_registry.py registers loader, processor, and viewer extension
└── viewer/
├── extension.json manifest listing the viewer plugins
├── plugin_dives_logged.js metadata widget - dives logged, by depth zone & site
└── plugin_glow_by_depth.js image-data widget - glow sightings, by depth zone
Pixel Patrol finds all of this through Python entry points: three optional groups in pyproject.toml, each pointing at a function in your plugin_registry module.
[project.entry-points."pixel_patrol.loader_plugins"]
my_extension_loaders = "my_package.plugin_registry:register_loader_plugins"
[project.entry-points."pixel_patrol.processor_plugins"]
my_extension_processors = "my_package.plugin_registry:register_processor_plugins"
[project.entry-points."pixel_patrol.viewer_extensions"]
my_extension_viewer = "my_package.plugin_registry:get_viewer_extension_dir"
# plugin_registry.py
from pathlib import Path
from my_package.my_loader import MyLoader
from my_package.my_processor import MyProcessor
def register_loader_plugins():
return [MyLoader]
def register_processor_plugins():
return [MyProcessor]
def get_viewer_extension_dir():
return Path(__file__).parent / "viewer"
pixel_patrol_base, everything is discovered automatically at runtime - no explicit registration or path needed when you call create_project(..., loader="your-loader-name") or serve_viewer(report_path).A loader turns a file into a Record - pixel data plus metadata - that the rest of the pipeline can work with. Implement the PixelPatrolLoader protocol:
| Member | Type | Required? | Purpose |
|---|---|---|---|
NAME | str | yes | unique identifier passed to create_project(..., loader=...) |
SUPPORTED_EXTENSIONS | set[str] | yes | file extensions this loader can read (lower-case, no dot) |
OUTPUT_SCHEMA | dict[str, type] | yes | extra metadata columns this loader adds to the report, with their types |
read_header(path) | (Path) -> FileInfo | yes | cheap shape/dtype/dim-order probe, no pixel data loaded |
load(path) | (Path) -> Record | yes | loads one image and returns a Record |
load_range(path, start, stop) | (Path, int, int) -> Iterator[(str, Record)] | yes | yields sub-images for container formats; raise NotImplementedError otherwise |
FOLDER_EXTENSIONS | set[str] | no | "extensions" that mark a folder as one loadable unit (e.g. OME-Zarr stores); defaults to empty |
CONTAINER_EXTENSIONS | set[str] | no | extensions that may contain more than one image (multi-series OME-TIFF, LMDB, ...); defaults to empty |
OUTPUT_SCHEMA_PATTERNS | list[tuple[str, type]] | no | regex/type pairs for dynamically-named metadata columns (e.g. pixel_size_X); defaults to empty |
is_folder_supported(path) | (Path) -> bool | no | whether a folder (not a file) should be treated as one image; only relevant if FOLDER_EXTENSIONS is non-empty |
SharkCamLoader (NAME = "shark-cam") reads each table with pyarrow.parquet, stacks its columns into a 2-D array, decodes one field out of the schema metadata, and wraps it all with record_from(...):
class SharkCamLoader:
NAME = "shark-cam"
SUPPORTED_EXTENSIONS = {"parquet"}
FOLDER_EXTENSIONS = set()
CONTAINER_EXTENSIONS = set()
OUTPUT_SCHEMA = {"depth_zone": str}
OUTPUT_SCHEMA_PATTERNS = []
def is_folder_supported(self, path):
return False
def read_header(self, file_path):
meta = pq.ParquetFile(file_path).metadata
return FileInfo(shape=(meta.num_rows, meta.num_columns), dtype=np.uint8, dim_order=("Y", "X"))
def load(self, file_path):
table = pq.read_table(file_path)
# Each column is one pixel column (X); stacking them rebuilds the YX grid.
columns = [table.column(name).to_numpy(zero_copy_only=False) for name in table.column_names]
pixels = np.column_stack(columns).astype(np.uint8)
raw_meta = table.schema.metadata or {}
log_entry = {k.decode(): v.decode() for k, v in raw_meta.items()}
meta = {
"depth_zone": log_entry.get("depth_zone", "unknown"),
"dim_order": "YX",
}
return record_from(pixels, meta, kind="intensity")
def load_range(self, file_path, start, stop):
raise NotImplementedError("shark-cam is not a container format")
🔬 How a parquet table becomes a "dive patch"
uint8 columns - read column-by-column and stacked side by side, the table is the pixel grid (rows → Y, columns → X). The playful field, depth_zone (sunlit/twilight/midnight/abyss - which layer of the ocean the snapshot was taken in), is decoded straight out of table.schema.metadata - exactly the slot real formats (OME-XML in TIFFs, EXIF in JPEGs, ...) use to carry instrument and acquisition info.kind="intensity" and dim_order="YX" is what makes the built-in processors (basic metrics, histogram, thumbnail) pick the patches up automatically, right alongside your custom one. To a Pixel Patrol pipeline, a "dive patch" parquet table behaves just like any other 2-D image - that's the whole point of the exercise.read_header is called for every file during the initial scan and must stay cheap - it's your chance to report shape, dtype, and dimension order without paying the cost of loading pixel data.A processor receives loaded records and returns derived values that get merged into the report as new columns. Implement the PixelPatrolProcessor protocol - every member below is required:
| Member | Type | Purpose |
|---|---|---|
NAME | str | unique identifier (shown in pipeline logs) |
CHUNK_KIND | ChunkKind | LEAF (user-configured tiles/slices) or MEMORY (whole record, memory-safe chunking) |
INPUT | RecordSpec | which records this processor runs on (kinds, axes, capabilities, ...) |
OUTPUT | "features" | "record" | whether run_chunk returns columns to merge, or a brand-new Record |
OUTPUT_SCHEMA | dict[str, type] | the columns this processor adds, with their types |
run_chunk(record) | (Record) -> dict | does the actual computation on one chunk |
get_aggregation(name) | (str) -> Callable | None | how to combine multiple chunks' values for column name into the per-image value |
GlowSpotterProcessor (NAME = "glow-spotter") runs on every intensity record with X/Y axes, uses CHUNK_KIND.LEAF, and adds one column - glow_count:
class GlowSpotterProcessor:
NAME = "glow-spotter"
CHUNK_KIND = ChunkKind.LEAF
INPUT = RecordSpec(axes={"X", "Y"}, kinds={"intensity"})
OUTPUT = "features"
OUTPUT_SCHEMA = {"glow_count": int}
OUTPUT_SCHEMA_PATTERNS = []
def run_chunk(self, record):
arr = record.data.compute() if hasattr(record.data, "compute") else np.asarray(record.data)
arr = arr.astype(np.float32)
threshold = np.median(arr) + 60.0
glow_count = int(np.sum(arr > threshold))
return {"glow_count": glow_count}
def get_aggregation(self, col):
if col != "glow_count":
return None
# Glows are independent per pixel, so chunk counts simply add up.
return lambda rows, g_dims: sum(r["glow_count"] for r in rows)
🔬 How "glows" get counted
get_aggregation sums each chunk's glow_count into the per-image total - a pattern that works whenever the thing you're counting is independent per pixel, so splitting an image into pieces and adding the pieces' counts back up reconstructs the whole.CHUNK_KIND shapes how your data arrives, and which unit your computation needs to handle. LEAF - the more common pick for metric processors - tiles large images into memory-safe pieces and runs your computation on each one; MEMORY hands you the whole record at once, which is only safe when you know it comfortably fits in memory.OUTPUT = "features" merges your columns into the existing report - the right choice for almost any custom metric. "record" is for processors that produce a brand-new derived image instead (a mask, a projection, ...).A viewer plugin is a small JavaScript module that renders a custom widget in the report viewer's sidebar, with full access to the report's data through an in-browser DuckDB instance (the table is always called pp_data). It exports one default object:
export default {
id: 'my-widget', // unique across all loaded plugins
label: 'My Widget', // shown in the sidebar widget list
group: 'My Extension Name', // optional - gives the widget its own sidebar section
scope: 'image', // optional - 'file' | 'image' | 'slice', shown as a badge
// describing what one datapoint in this widget represents
requires(schema) {
// return false to hide the widget when its columns are absent
return schema.allCols.includes('my_column');
},
async render(container, ctx) {
const rows = await ctx.queryRows(`
SELECT my_column, COUNT(*) AS cnt
FROM pp_data
${ctx.where}
GROUP BY 1 ORDER BY 2 DESC
`);
// write into `container` using plain DOM, Plotly (window.Plotly), or any CDN library
},
};
ctx field | Type | Description |
|---|---|---|
ctx.queryRows(sql) | async → object[] | query returning plain JS objects |
ctx.query(sql) | async → Arrow Table | raw Arrow result (for binary/blob columns) |
ctx.querySample(cols, n) | async → object[] | sampled scalar shorthand |
ctx.schema | object | { metricCols, groupCols, dimensionInfo, allCols, blobCols } |
ctx.state | object | { palette, groupCol, filter, dimensions } |
ctx.colorMap | object | { groupValue: hexColor } - matches the colors used everywhere else in the report |
ctx.color.getColors(palette, n) | (string, number) -> string[] | n colors from the named palette - for ad-hoc groupings (e.g. a column other than the active group-by) not covered by colorMap |
ctx.color.getPaletteNames() | () -> string[] | palette names accepted by ctx.color.getColors |
ctx.where | string | SQL WHERE clause for the active filter (or '') - merge with AND if your query needs its own |
ctx.groups | string[] | distinct values of the active group column |
ctx.filteredCount / ctx.totalRows | number | row counts |
Pixel HAI Watch ships two plugins on purpose - one per kind of data a loader can surface. plugin_glow_by_depth.js plots glow_count (computed by the processor, from real pixel data) against depth_zone (read straight from the loader's metadata), as a jittered scatter colored by site:
const DEPTH_ORDER = ['sunlit', 'twilight', 'midnight', 'abyss'];
export default {
id: 'glow-by-depth',
label: 'Glow Sightings by Depth',
group: 'Pixel HAI Watch',
scope: 'image',
requires(schema) {
return ['depth_zone', 'glow_count'].every(c => schema.allCols.includes(c));
},
async render(container, ctx) {
const rows = await ctx.queryRows(`
SELECT "depth_zone" AS depth_zone, "imported_path_short" AS site, "glow_count" AS glows
FROM pp_data
WHERE "depth_zone" IS NOT NULL AND "glow_count" IS NOT NULL
${ctx.where ? 'AND ' + ctx.where.replace(/^WHERE\s+/i, '') : ''}
`);
const zones = DEPTH_ORDER.filter(z => rows.some(r => r.depth_zone === z));
const sites = [...new Set(rows.map(r => r.site))].sort();
const xJitter = () => (Math.random() - 0.5) * 0.5; // keeps overlapping points visible
// A scatter (rather than a box/violin) because each category holds only a
// handful of points - exactly where distributional summaries would mislead.
Plotly.newPlot(container, sites.map(site => {
const sub = rows.filter(r => r.site === site);
return {
type: 'scatter', mode: 'markers', name: site,
x: sub.map(r => zones.indexOf(r.depth_zone) + xJitter()),
y: sub.map(r => Number(r.glows)),
marker: { size: 12, color: ctx.colorMap[site] ?? '#888' },
};
}), { title: { text: 'How much bioluminescent glow shows up at each depth?' } });
},
};
Both plugins are listed in a small manifest, loaded automatically by the viewer:
{
"name": "Pixel HAI Watch Extension",
"plugins": ["./plugin_dives_logged.js", "./plugin_glow_by_depth.js"]
}
group: 'Pixel HAI Watch', so they get their own named section in the sidebar instead of being lumped under "Other Widgets" - a small touch that makes an extension feel like a first-class part of the report.ctx reference, and the extension-manifest format.Pixel Patrol discovers loaders, processors, and viewer extensions through Python entry points - which only works if your package is installed in the same environment as pixel_patrol_base. So first, make sure you're in that environment, then install this package into it:
Then try it locally - this (re)generates the tiny dataset (if missing), processes it with the custom loader and processor, and opens the viewer with both widgets loaded:
Once it works, two ways to get it in front of others:
pip install pixel-patrol-hai-watch - and any report opened with serve_viewer(...) picks up the plugins automatically. No extra arguments, no separate hosting.viewer/ folder; the manifest then lives at a public URL you can pass straight to the hosted viewer:?extension=https://<org>.github.io/<repo>/extension.json (repeat &extension= to chain several).examples/minimal-extension/, decide which piece(s) you actually need (see the questions at the top of this page), update the pyproject.toml metadata, and replace the example identifiers with your own - one piece at a time. The protocols will tell you exactly what's still missing as you go, and nothing stops you from running an unfinished extension while you build it out.