Xpublish

Publish Xarray datasets via a REST API - enabling web-based access to scientific data in NetCDF, Zarr, and other formats.

Technologies

  • Python
  • FastAPI
  • Xarray
  • Zarr
  • NetCDF
  • REST API

Overview

Xpublish is a lightweight Python library that enables scientists to publish Xarray datasets through a REST API. It bridges the gap between data stored in files (NetCDF, Zarr, etc.) and web-based data access, making scientific datasets accessible via standard HTTP requests.

The Problem

Scientific datasets are often stored in domain-specific formats (NetCDF, HDF5, Zarr) that require specialized software to access. This creates barriers for:

  • Web-based visualization applications
  • Remote data access and analysis
  • Integration with modern web services
  • Sharing data with collaborators

The Solution

Xpublish wraps Xarray datasets in a FastAPI server, automatically creating REST endpoints that:

  • Serve dataset metadata
  • Provide data subsetting capabilities
  • Support multiple output formats
  • Enable efficient remote data access

Key Features

  • Zero Configuration: Publish any Xarray dataset with minimal code
  • Zarr Support: Serve datasets in cloud-optimized Zarr format
  • OpenDAP-like Access: Subset and slice data via URL parameters
  • Plugin System: Extend functionality with custom routers
  • Production Ready: Built on FastAPI for performance and reliability

Quick Example

import xarray as xr
import xpublish

# Load dataset
ds = xr.open_dataset('data.nc')

# Publish via REST API
ds.rest.serve()

Now the dataset is accessible at http://localhost:9000 with endpoints for:

  • Metadata: /
  • Zarr: /.zarr/
  • Custom queries: /subset?var=temperature&time=2020-01-01

Technical Architecture

  • Web Framework: FastAPI for high-performance async API
  • Data Layer: Xarray for multi-dimensional array operations
  • Storage Backends: NetCDF, Zarr, HDF5, in-memory
  • Serialization: Multiple format support (JSON, Zarr, NetCDF subset)

Use Cases

Remote Data Access

Enable users to access large datasets remotely without downloading entire files:

# Client-side
import xarray as xr
ds = xr.open_zarr('http://server.com/.zarr')
subset = ds.sel(time='2020', lat=slice(30, 40))

Web Applications

Build interactive visualization tools that fetch data on-demand from Xpublish endpoints.

Data Catalogs

Integrate with intake or other catalog systems to provide standardized data access.

Cloud Deployments

Deploy alongside datasets in cloud object storage for scalable data serving.

Contributions

As part of the xarray-contrib ecosystem, I contributed to:

  • Community standards and best practices
  • Integration with oceanographic workflows
  • Use cases from real-time data streaming
  • Testing with large-scale ocean observational datasets

Integration with Other Projects

Xpublish complements tools I've worked on:

  • Interactive Oceans: Used for serving OOI data via REST APIs
  • Echopype: Potential integration for ocean sonar data access
  • Pangeo Ecosystem: Part of the cloud-native data access stack

Community Impact

Xpublish enables:

  • Easier data sharing among researchers
  • Reduced data transfer overhead
  • Modern web-based data visualization
  • Cloud-native scientific workflows

Technical Innovation

  • Lazy Loading: Data only loaded when requested
  • Efficient Subsetting: Serve only the requested data slices
  • Format Agnostic: Works with any Xarray-compatible format
  • Extensible: Plugin system for custom functionality

Future Directions

  • Enhanced authentication and authorization
  • Improved caching strategies
  • Better integration with cloud services
  • Standards compliance (OGC, OPeNDAP)