Troubleshooting & FAQ¶
Common issues and their resolutions.
Install¶
pip install pysdp pulls a huge wheel stack on first run¶
Normal. The core dep set (rasterio, rioxarray, xarray, geopandas, pystac, scipy, etc.) is ~150 MB of wheels. All of it installs from PyPI without a system GDAL — the GDAL binary ships inside rasterio's wheel.
If install is slow on a laptop or CI, try uv pip install pysdp (Astral's resolver is substantially faster than pip's).
ImportError: Cannot import name X from pysdp¶
The public API is the seven functions + five constants re-exported from pysdp/__init__.py:
get_catalog,get_metadataopen_raster,open_stackextract_points,extract_polygonsdownloadDOMAINS,TYPES,RELEASES,TIMESERIES_TYPES,SDP_CRS
Anything else is internal and subject to change.
Lazy reads / chunking¶
chunks='auto' emits a warning about installing pysdp[dask]¶
pysdp.open_raster() defaults to chunks="auto" for lazy Dask-backed reads. If dask isn't installed, pysdp falls back to eager loads with a UserWarning. For lazy reads on large / time-series rasters, install the extra:
Eager mode still works fine for small rasters or small cropped regions.
CRS and reprojection¶
"Re-projecting locations to coordinate system of the raster." — is this bad?¶
No. extract_points and extract_polygons auto-reproject your input locations to the raster's CRS (EPSG:32613 / UTM 13N) when they differ. The message is informational and goes to stderr; silence it with verbose=False.
For large point sets, pre-projecting once outside the extract call is slightly more efficient:
sites_utm = sites.to_crs("EPSG:32613")
for dem_id in ["R3D009", "R5D009"]:
out = pysdp.extract_points(pysdp.open_raster(dem_id), sites_utm)
Reprojecting a full raster is slow¶
Reprojection touches every cell, so it's inherently expensive on cloud-hosted rasters. Two strategies:
- Crop first, reproject the small subset. If you only need the raster in a small AOI,
clip_box()beforereproject(). - Download locally first.
pysdp.download(catalog_ids="R3D009", output_dir=...), then open and reproject from disk.
Extraction¶
Point extraction is slow on large cloud rasters¶
Known gap. xvec.extract_points (our method="nearest" path) and xarray.interp (our method="linear" path) both pull more cells than strictly necessary when backed by a remote COG — usually the full chunk containing each point rather than just the point itself. For widely-spaced points on a >1 GB raster, this can take minutes.
Workarounds today:
method="nearest"is usually 2–5× faster than"linear".- Crop first — the bounding box of your sites, with a small buffer, often reduces the raster by orders of magnitude.
- Download locally — for repeat extractions against the same raster,
pysdp.download()+rioxarray.open_rasteriois the fastest path.
Dask-aware partition-and-reduce extraction is tracked in ROADMAP §Phase 8a. If you're running at 10k+ points, this is the phase to watch.
extract_polygons(exact=True) raises NotImplementedError¶
Tracked in ROADMAP §Phase 8a. exact=False (centroid inclusion, matching rSDP / terra::extract) is the default and works today. The fractional-coverage path via exactextract needs a custom xarray→exactextract bridge that composes with Dask — non-trivial, budgeted for 0.2.
Extraction output has one row per (point, time) — I want wide format¶
That's long-form. Pivot to wide:
long = pysdp.extract_points(tmax, sites)
wide = long.pivot_table(
index="site",
columns="time",
values="bayes_tmax_est", # the raster's variable name
)
Catalog¶
UserWarning: Packaged SDP catalog is 8 months old¶
pySDP ships with a snapshot of the SDP product catalog. When it gets old, the warning nudges you to either upgrade pysdp or use source="live":
Threshold is configurable via SDP_STALENESS_MONTHS environment variable (default: 6).
KeyError: CatalogID 'R5D042' not found in packaged catalog (dated 2026-04-14)¶
Your packaged catalog doesn't know about that ID — likely a newer product. Try:
If it's there, pip install -U pysdp gets you a fresh packaged snapshot.
Download¶
Cannot expand Daily catalog_id 'R4D004' via catalog_ids=¶
Daily products often span years of data — thousands of files. pySDP refuses to expand them implicitly to avoid surprise data-hoard downloads. Two ways forward:
Open and process without downloading:
Or hand-pick specific days:
# Construct the Data.URLs yourself from the catalog template
row = pysdp.get_catalog(deprecated=None).query("CatalogID == 'R4D004'").iloc[0]
urls = [row["Data.URL"].format(year=2021, day=f"{i:03d}") for i in range(1, 8)]
pysdp.download(urls=urls, output_dir="~/tmax-week")
Downloads fail with 403 Forbidden on some files¶
The SDP bucket is public, so this usually indicates a deprecated product URL that's been removed. Check:
Performance envelope¶
What's a reasonable workload for pysdp 0.1?¶
The v0.1 release targets interactive and moderate-scale workflows — tens to thousands of points or polygons against SDP rasters. Specifically:
- Point extraction: hundreds of points × full SDP rasters → seconds–minutes (with
method="nearest"+ local download). - Polygon extraction: hundreds of polygons × time-series → minutes.
- Catalog/metadata: <1 second, offline.
- Lazy raster reads: single-digit seconds to open + crop + plot a ~1 GB COG over same-region S3.
For millions of points or continental-scale workflows, see ROADMAP §Phase 8a — the Dask-aware at-scale path that'll land in 0.2.
Still stuck?¶
Open an issue: https://github.com/rmbl-sdp/pySDP/issues. Include:
- pysdp version (
pysdp.__version__) - Python version (
sys.version) - Platform (Linux/macOS/Windows)
- A minimal reproducer
- Full error traceback if applicable