Getting started¶

Install¶

pipconda / mambauv

pip install pysdp                # core: catalog, raster, extract, download
pip install "pysdp[dask]"        # + lazy chunked COG reads via Dask
pip install "pysdp[stac]"        # + pystac-client and odc-stac for STAC
pip install "pysdp[exact]"       # + exactextract for fractional zonal stats
pip install "pysdp[download]"    # + obstore/fsspec for faster downloads
pip install "pysdp[hub]"         # + dask-gateway for JupyterHub clusters
pip install "pysdp[all]"         # everything

Conda-forge support lands alongside the first stable release. In the meantime, pip install pysdp inside a conda environment works well.

uv add pysdp                     # add to current project
uv pip install "pysdp[all]"      # into an existing env

Dependencies¶

Core runtime deps include xarray, rioxarray, rasterio, geopandas, pystac, scipy, xvec, and requests. These all have wheels on PyPI for Linux, macOS (Intel + Apple Silicon), and Windows — no GDAL system install needed.

Python 3.11, 3.12, and 3.13 are supported; pySDP follows SPEC 0 for version windows.

Quick start¶

Discover what's in the catalog¶

import pysdp

# All current (non-deprecated) SDP products
cat = pysdp.get_catalog()
cat.shape  # e.g. (140, 18)

# Narrow by domain, type, time-series shape
ug_climate_daily = pysdp.get_catalog(
    domains=["UG"],
    types=["Climate"],
    timeseries_types=["Daily"],
)
ug_climate_daily[["CatalogID", "Product", "Resolution"]]

Open a raster¶

# Single-layer product
dem = pysdp.open_raster("R3D009")   # UG bare-earth DEM, 3 m
dem

# Daily time-series, sliced by date range
tmax = pysdp.open_raster(
    "R4D004",
    date_start="2021-11-02",
    date_end="2021-11-04",
)
tmax.sizes   # {'time': 3, 'y': ..., 'x': ...}

pySDP returns an xarray.Dataset:

The data variable is named from the product's canonical short name (e.g. UG_dem_3m_v1).
CRS is set to EPSG:32613 (UTM zone 13N) on every SDP raster.
For time-series, the time coordinate is a pandas.DatetimeIndex (Daily → actual date, Monthly → first-of-month, Yearly → Jan 1).

Extract at points and polygons¶

import geopandas as gpd

sites = gpd.GeoDataFrame(
    {"site": ["Roaring Judy", "Gothic", "Galena Lake"]},
    geometry=gpd.points_from_xy(
        [-106.853186, -106.988934, -107.072569],
        [38.716995, 38.958446, 39.021644],
    ),
    crs="EPSG:4326",
)

# Bilinear interpolation at points (auto-reprojects to raster CRS)
elevations = pysdp.extract_points(dem, sites)

# Zonal mean over polygons (centroid-based; set exact=True for fractional coverage)
watersheds = gpd.read_file("my_watersheds.gpkg")
watershed_elev = pysdp.extract_polygons(dem, watersheds, stats="mean")

For time-series rasters, extraction output is long-form: one row per (geometry × time). Pivot to wide if you want the rSDP-style layout:

# tmax is the Daily time-series from above; extract at the 3 field sites
samples = pysdp.extract_points(tmax, sites)
wide = samples.pivot_table(index="site", columns="time", values="bayes_tmax_est")

Download to local disk¶

# By catalog_id (expands Yearly/Monthly to all catalog slices)
pysdp.download(
    catalog_ids=["R1D001", "R3D009"],
    output_dir="~/sdp-data",
)

# By URL (for hand-picked subsets — e.g. selective daily slices)
pysdp.download(
    urls=[
        "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0305_est.tif",
        "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0306_est.tif",
    ],
    output_dir="~/sdp-data",
)

Returns a pandas.DataFrame status report with [url, dest, success, status, size, error] columns.

Browse the catalog visually¶

# Renders a thumbnail grid in Jupyter; outside notebooks str(...) returns raw HTML
pysdp.browse(domains=["UG"], types=["Vegetation"])

# Show deprecated products (current default hides them); deprecated cards get
# a tinted background and a "deprecated → NEWID" badge pointing at the successor.
pysdp.browse(types=["Snow"], include_deprecated=True)

# Surface open data-quality issues from rmbl-sdp/sdp-products as a per-card badge
pysdp.browse(domains=["UG"], with_issue_counts=True)

Discover available dates¶

# Yearly / Monthly / Daily products: computed deterministically from MinDate/MaxDate
pysdp.get_dates("R6D007")            # Yearly snow-persistence series

# Weekly drone-imagery products: discovered from the baked manifest (offline)
# or the live STAC catalog
pysdp.get_dates("R6D001")            # ~111 weekly flights

Report and discover data-quality issues¶

Dataset issues live in rmbl-sdp/sdp-products, separate from the pysdp package repo. The CLI opens a prefilled Issue Form:

pysdp.report_issue("R4D004")                          # opens browser to the form
pysdp.report_issue("R3D009", type="metadata-error")   # pre-selects the issue type

Before reporting, check whether the problem is already known:

issues = pysdp.known_issues("R4D004")     # one row per open issue, cached 1h
issues[["number", "type", "severity", "status", "title"]]

Set GITHUB_TOKEN (or GITHUB_PAT) in your environment to bump the API rate limit from 60 to 5000 requests/hr. Pass refresh=True to bypass the cache. The cache lives under $XDG_CACHE_HOME/pysdp/ (or ~/.cache/pysdp/).

Coming from rSDP?¶

pySDP is a direct port of the rSDP R package. The API mirrors rSDP closely, with Python-idiomatic adjustments:

rSDP (R)	pySDP (Python)
`sdp_get_catalog()`	`pysdp.get_catalog()`
`sdp_get_metadata()`	`pysdp.get_metadata()`
`sdp_get_dates()`	`pysdp.get_dates()`
`sdp_browse()`	`pysdp.browse()`
`sdp_get_raster()`	`pysdp.open_raster()` / `pysdp.open_stack()`
`sdp_extract_data(points)`	`pysdp.extract_points()`
`sdp_extract_data(polygons)`	`pysdp.extract_polygons()`
`download_data()`	`pysdp.download()`
`sdp_report_issue()`	`pysdp.report_issue()`
`sdp_known_issues()`	`pysdp.known_issues()`
`SpatRaster`	`xarray.Dataset`
`SpatVector` / `sf::sf`	`geopandas.GeoDataFrame`

See the full behavioral mapping in SPEC §5.

Where to next¶

API reference — every public function with signatures and docstrings.
User guides — longer walkthroughs (porting the four rSDP vignettes to Python, one per 0.1.x release): cloud-data access, raster wrangling, field-site sampling, and pretty maps.
Roadmap — JupyterHub / Dask Gateway integration, distributed extraction, benchmarks.