Getting started¶
Install¶
pip install pysdp # core: catalog, raster, extract, download
pip install "pysdp[dask]" # + lazy chunked COG reads via Dask
pip install "pysdp[stac]" # + pystac-client and odc-stac for STAC
pip install "pysdp[exact]" # + exactextract for fractional zonal stats
pip install "pysdp[download]" # + obstore/fsspec for faster downloads
pip install "pysdp[hub]" # + dask-gateway for JupyterHub clusters
pip install "pysdp[all]" # everything
Conda-forge support lands alongside the first stable release. In the meantime, pip install pysdp inside a conda environment works well.
Dependencies¶
Core runtime deps include xarray, rioxarray, rasterio, geopandas, pystac, scipy, xvec, and requests. These all have wheels on PyPI for Linux, macOS (Intel + Apple Silicon), and Windows — no GDAL system install needed.
Python 3.11, 3.12, and 3.13 are supported; pySDP follows SPEC 0 for version windows.
Quick start¶
Discover what's in the catalog¶
import pysdp
# All current (non-deprecated) SDP products
cat = pysdp.get_catalog()
cat.shape # e.g. (140, 18)
# Narrow by domain, type, time-series shape
ug_climate_daily = pysdp.get_catalog(
domains=["UG"],
types=["Climate"],
timeseries_types=["Daily"],
)
ug_climate_daily[["CatalogID", "Product", "Resolution"]]
Open a raster¶
# Single-layer product
dem = pysdp.open_raster("R3D009") # UG bare-earth DEM, 3 m
dem
# Daily time-series, sliced by date range
tmax = pysdp.open_raster(
"R4D004",
date_start="2021-11-02",
date_end="2021-11-04",
)
tmax.sizes # {'time': 3, 'y': ..., 'x': ...}
pySDP returns an xarray.Dataset:
- The data variable is named from the product's canonical short name (e.g.
UG_dem_3m_v1). - CRS is set to
EPSG:32613(UTM zone 13N) on every SDP raster. - For time-series, the
timecoordinate is apandas.DatetimeIndex(Daily → actual date, Monthly → first-of-month, Yearly → Jan 1).
Extract at points and polygons¶
import geopandas as gpd
sites = gpd.GeoDataFrame(
{"site": ["Roaring Judy", "Gothic", "Galena Lake"]},
geometry=gpd.points_from_xy(
[-106.853186, -106.988934, -107.072569],
[38.716995, 38.958446, 39.021644],
),
crs="EPSG:4326",
)
# Bilinear interpolation at points (auto-reprojects to raster CRS)
elevations = pysdp.extract_points(dem, sites)
# Zonal mean over polygons (centroid-based; set exact=True for fractional coverage)
watersheds = gpd.read_file("my_watersheds.gpkg")
watershed_elev = pysdp.extract_polygons(dem, watersheds, stats="mean")
For time-series rasters, extraction output is long-form: one row per (geometry × time). Pivot to wide if you want the rSDP-style layout:
# tmax is the Daily time-series from above; extract at the 3 field sites
samples = pysdp.extract_points(tmax, sites)
wide = samples.pivot_table(index="site", columns="time", values="bayes_tmax_est")
Download to local disk¶
# By catalog_id (expands Yearly/Monthly to all catalog slices)
pysdp.download(
catalog_ids=["R1D001", "R3D009"],
output_dir="~/sdp-data",
)
# By URL (for hand-picked subsets — e.g. selective daily slices)
pysdp.download(
urls=[
"https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0305_est.tif",
"https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0306_est.tif",
],
output_dir="~/sdp-data",
)
Returns a pandas.DataFrame status report with [url, dest, success, status, size, error] columns.
Browse the catalog visually¶
# Renders a thumbnail grid in Jupyter; outside notebooks str(...) returns raw HTML
pysdp.browse(domains=["UG"], types=["Vegetation"])
# Show deprecated products (current default hides them); deprecated cards get
# a tinted background and a "deprecated → NEWID" badge pointing at the successor.
pysdp.browse(types=["Snow"], include_deprecated=True)
# Surface open data-quality issues from rmbl-sdp/sdp-products as a per-card badge
pysdp.browse(domains=["UG"], with_issue_counts=True)
Discover available dates¶
# Yearly / Monthly / Daily products: computed deterministically from MinDate/MaxDate
pysdp.get_dates("R6D007") # Yearly snow-persistence series
# Weekly drone-imagery products: discovered from the baked manifest (offline)
# or the live STAC catalog
pysdp.get_dates("R6D001") # ~111 weekly flights
Report and discover data-quality issues¶
Dataset issues live in rmbl-sdp/sdp-products, separate from the pysdp package repo. The CLI opens a prefilled Issue Form:
pysdp.report_issue("R4D004") # opens browser to the form
pysdp.report_issue("R3D009", type="metadata-error") # pre-selects the issue type
Before reporting, check whether the problem is already known:
issues = pysdp.known_issues("R4D004") # one row per open issue, cached 1h
issues[["number", "type", "severity", "status", "title"]]
Set GITHUB_TOKEN (or GITHUB_PAT) in your environment to bump the API rate limit from 60 to 5000 requests/hr. Pass refresh=True to bypass the cache. The cache lives under $XDG_CACHE_HOME/pysdp/ (or ~/.cache/pysdp/).
Coming from rSDP?¶
pySDP is a direct port of the rSDP R package. The API mirrors rSDP closely, with Python-idiomatic adjustments:
| rSDP (R) | pySDP (Python) |
|---|---|
sdp_get_catalog() |
pysdp.get_catalog() |
sdp_get_metadata() |
pysdp.get_metadata() |
sdp_get_dates() |
pysdp.get_dates() |
sdp_browse() |
pysdp.browse() |
sdp_get_raster() |
pysdp.open_raster() / pysdp.open_stack() |
sdp_extract_data(points) |
pysdp.extract_points() |
sdp_extract_data(polygons) |
pysdp.extract_polygons() |
download_data() |
pysdp.download() |
sdp_report_issue() |
pysdp.report_issue() |
sdp_known_issues() |
pysdp.known_issues() |
SpatRaster |
xarray.Dataset |
SpatVector / sf::sf |
geopandas.GeoDataFrame |
See the full behavioral mapping in SPEC §5.
Where to next¶
- API reference — every public function with signatures and docstrings.
- User guides — longer walkthroughs (porting the four rSDP vignettes to Python, one per 0.1.x release): cloud-data access, raster wrangling, field-site sampling, and pretty maps.
- Roadmap — JupyterHub / Dask Gateway integration, distributed extraction, benchmarks.