Changelog¶
All notable changes to pySDP are documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Added¶
- Phase 6b docs expansion — visual assets in the User Guides (PNG
plots + folium HTML maps generated by
scripts/build_guide_assets.pyfrom real SDP data). Guidescloud-data.md,field-sampling.md, andpretty-maps.mdnow include rendered images / iframes showing domain coverage, field sites on the UG 3 m bare-earth DEM (R3D009 — matching the rSDP vignettes), extracted elevations, and multi-panel overlays. All 8 assets live atdocs/guides/assets/. The asset script aggressively coarsens the 1 GB DEM before plotting (factor 60×60) so docs pages stay lightweight. mapclassify>=2.6added to the[viz]extra (required byGeoDataFrame.explore(column=...)).
Fixed¶
-
pysdp.io.vsicurl.gdal_defaults()no longer setsCPL_VSIL_CURL_ALLOWED_EXTENSIONS=".tif,.TIF,.tiff". That env var leaked process-globally and blocked any VSICURL-backed open of GeoJSON / GeoPackage / Shapefile URLs after a pysdp raster was opened in the same process.GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIRalone already achieves the main performance goal (skipping sidecar probes) without the side-effect. Matches ROADMAP §2 Principle 5 (scoped defaults, never clobber). -
Phase 5 — Bulk download (SPEC.md §9):
pysdp.download(urls=..., output_dir=..., ...)— fetch SDP COGs to local disk. Acceptsurls=(string or list) orcatalog_ids=(mutually exclusive, exactly one required).catalog_idsexpansion: Single → 1 URL; Yearly → every catalog year; Monthly → every catalog month between MinDate and MaxDate. Daily raises a descriptiveValueError(expansion would be open-ended; users pass expliciturls=or open viaopen_raster(...)with a date range first).- Existing-file pre-check mirrors rSDP: files > 1 kB on disk are
considered valid and skipped unless
overwrite=True. Partial files (< 1 kB) use HTTP Range resume whenresume=True. - Returns a
pandas.DataFramestatus report with columns[url, dest, success, status, size, error]— one row per URL (including skipped existing files).return_status=False→None. - Backend: threaded
requestsviaconcurrent.futures.ThreadPoolExecutor(core deps only). Faster backends (obstore,fsspec+s3fs) deferred to ROADMAP §Phase 7. - Tests: 22 unit tests (input validation, catalog_id expansion for
each TimeSeriesType, pre-check logic, responses-mocked end-to-end
flow, overwrite + resume, HTTP error propagation) + 1
@pytest.mark.networkintegration test that downloads R1D001 (~4 MB UER streamlines) from real S3 in ~1.5 s. -
Smoke test's
NotImplementedErrorplaceholder parametrize list is now empty — all public API functions are implemented. -
Phase 4 — Point & polygon extraction (SPEC.md §9):
pysdp.extract_points(raster, locations, ...)— sample raster values at point locations. Accepts aGeoDataFrameor a plainDataFramewith x/y columns + explicitcrs. Auto-reprojects locations to raster CRS when they differ (with a verbose message).method="linear"(default, bilinear viaxr.interp) ormethod="nearest"(viaxvec.extract_points).pysdp.extract_polygons(raster, locations, stats="mean", ...)— zonal stats viaxvec.zonal_stats. Defaultexact=False(centroid inclusion, parity with rSDP /terra::extract).exact=Trueandall_cells=TrueraiseNotImplementedErrorpointing at ROADMAP §Phase 8a;exact=Falsecovers the common case.- Port of rSDP's
.filter_raster_layers_by_time()as_filter_by_time:years=ordate_start/date_end=filters for time-indexed rasters, with error-on-empty-overlap and warn-on-partial-overlap semantics. - Output is long-form GeoDataFrame: one row per point (or polygon)
for single-layer rasters; one row per (geometry × time) for
time-series.
bind=True(default) merges input attribute columns onto the output;bind=Falsereturns just geometry + extracted values. - Tests: 34 new unit tests with synthetic local rasters + one
@pytest.mark.networkintegration test that extracts elevation at three real RMBL field sites from the UG 3 m DEM (R3D009); verifies elevations fall in the sensible 2000–4500 m range for the Gunnison basin. All pass locally. -
scipy>=1.11added to core deps (required for xarray'sinterp(method='linear'), the bilinear extraction path; standard scientific Python dep). -
Phase 3 — Raster access (SPEC.md §9):
pysdp.open_raster(catalog_id, ...)— lazy cloud COG access viarioxarray.open_rasterioover GDAL VSICURL. Returnsxarray.Datasetwith one data variable named after the product's canonical short name. Dims(y, x)/(band, y, x)for Single;(time, y, x)for Yearly / Monthly / Daily.pysdp.open_raster(url=...)— URL-direct branch, no catalog lookup, no scale/offset application (matches rSDP).pysdp.open_stack(catalog_ids, align="exact")— multi-product loader.align="exact"default verifies CRS + transform + shape consistency; mismatches raise a descriptive error listing which products drifted.align="reproject"is planned for Phase 7 and raisesNotImplementedErrortoday.- Time coordinate is uniformly
pandas.DatetimeIndex: Daily → actual date, Monthly → first-of-month, Yearly → Jan 1. Enablesds.sel(time="2019"),.resample(),groupby("time.year")across all TimeSeriesTypes. - CRS set to
EPSG:32613viario.write_crs(). Scale/offset from the catalog attached as CFscale_factor/add_offsetattrs on the data variable (scale = 1 / DataScaleFactor per rSDP's convention;xarray.decode_cf()ormask_and_scale=Truematerializes the real values). pysdp.io.vsicurl.gdal_defaults()+ensure_gdal_defaults()— minimal GDAL VSICURL env for cloud COGs (GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR,CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif,.TIF,.tiff,VSI_CACHE=TRUE,VSI_CACHE_SIZE=5000000). Applied viaos.environ.setdefault— never clobbers user-set values. Full cloud-tuned set (HTTP/2 etc.) lands in Phase 7.chunks="auto"(default) gracefully falls back to eager reads with aUserWarningwhendaskis not installed; installpysdp[dask]for lazy Dask-backed reads.download=TrueraisesNotImplementedErrorpointing at Phase 5'spysdp.download().- Added
pysdp._catalog_data.lookup_catalog_row()as a shared helper;get_metadataandopen_rasternow use it, emitting the same descriptive "unknown catalog_id" error. - Tests: 30 new unit tests in
tests/test_raster.py(canonical-variable naming, time-coord construction, chunks fallback, GDAL-env safety, Dataset assembly via synthetic local COGs,open_stackgrid-alignment checks) + 3 network integration tests against real S3 (Single + Daily + URL-direct branches). -
dask[array]>=2024.1added to thetestdependency group so CI covers the chunked read path. -
Phase 2 — Argument validation + time-slice resolvers (SPEC.md §9):
pysdp.io.template.substitute_template()—{year}/{month}/{day}URL-template substitution with scalar/vector recycling and length-consistency checks. Port of rSDP's.substitute_template().pysdp._validate.validate_user_args()— pre-catalog-lookup validation; zero-padsmonthsto two-digit strings; rejects invalid combinations ofcatalog_id/url/date_start/date_end/download_files/download_path.pysdp._validate.validate_args_vs_type()— post-lookup check for whether a time-arg combo is valid for a givenTimeSeriesType(Single rejects all time args; Yearly rejects months + years∧dates; Monthly requires months with years; Daily requires dates only).pysdp._resolve.resolve_time_slices()and per-type resolvers (resolve_single,resolve_yearly,resolve_monthly,resolve_daily) returning aTimeSlices(paths, names)named tuple. Pure functions, no network, no raster I/O.- Preserved behavior carry-overs from rSDP: anchor-day
seq(by="year"/"month")semantics for Yearly/Monthly date-range branches; 30-layer default clip for Daily datasets with no date bounds; error-on-empty-overlap; warn-on-partial-overlap. -
52 new unit tests:
test_template.py(8 tests),test_validate.py(20 tests),test_resolve.py(24 tests). Ports rSDP's 32 testthat tests acrosstest-internal_resolve.Randtest-internal_validate.R, plus additional edge-case coverage. -
Phase 1 — Catalog + metadata (SPEC.md §9):
pysdp.get_catalog()with three sources:packaged(default; offline, emits aUserWarningwhen the snapshot is older thanSDP_STALENESS_MONTHS/ default 6 months),live(refetches the CSV from S3),stac(returns apystac.Catalogfor the SDP static STAC v1 catalog; filter args ignored with a warning).pysdp.get_metadata(catalog_id, as_dict=True)— fetches QGIS-style XML metadata; returns adictviaxmltodictor anlxmlelement. DescriptiveKeyErroron unknown catalog_id includes the snapshot date.- Packaged catalog CSV snapshot:
SDP_product_table_04_14_2026.csv(156 products across UG/UER/GT/GMUG domains). Loaded viaimportlib.resourcesinpysdp._catalog_data. Handles bothm/d/yandm/d/Ydate formats mixed across rows, and preserves rSDP'ssysdata.rdabaking model. scripts/update_catalog.py— mirrors rSDP'sdata-raw/SDP_catalog.R; downloads a fresh CSV from S3 and rotates the packaged snapshot.-
Test suite: 48 unit tests (filter validation, date parsing, staleness warning, synthetic-DataFrame filter logic, responses-mocked HTTP) + 3 live integration tests under
@pytest.mark.network. -
Initial Phase 0 scaffolding:
pyproject.toml(hatchling + hatch-vcs),src/pysdp/package skeleton with public-API stubs,constants.pywith real SDP catalog values (CRS, domains, types, releases, timeseries types),tests/smoke tests, CI workflows (lint, type-check, test matrix on Python 3.11/3.12/3.13 × linux/macOS/windows), release workflow with PyPI Trusted Publishing, docs workflow stub,.pre-commit-config.yaml, MIT license. See SPEC.md §9 Phase 0.