Skip to content

Changelog

All notable changes to pySDP are documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added

  • Phase 6b docs expansion — visual assets in the User Guides (PNG plots + folium HTML maps generated by scripts/build_guide_assets.py from real SDP data). Guides cloud-data.md, field-sampling.md, and pretty-maps.md now include rendered images / iframes showing domain coverage, field sites on the UG 3 m bare-earth DEM (R3D009 — matching the rSDP vignettes), extracted elevations, and multi-panel overlays. All 8 assets live at docs/guides/assets/. The asset script aggressively coarsens the 1 GB DEM before plotting (factor 60×60) so docs pages stay lightweight.
  • mapclassify>=2.6 added to the [viz] extra (required by GeoDataFrame.explore(column=...)).

Fixed

  • pysdp.io.vsicurl.gdal_defaults() no longer sets CPL_VSIL_CURL_ALLOWED_EXTENSIONS=".tif,.TIF,.tiff". That env var leaked process-globally and blocked any VSICURL-backed open of GeoJSON / GeoPackage / Shapefile URLs after a pysdp raster was opened in the same process. GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR alone already achieves the main performance goal (skipping sidecar probes) without the side-effect. Matches ROADMAP §2 Principle 5 (scoped defaults, never clobber).

  • Phase 5 — Bulk download (SPEC.md §9):

  • pysdp.download(urls=..., output_dir=..., ...) — fetch SDP COGs to local disk. Accepts urls= (string or list) or catalog_ids= (mutually exclusive, exactly one required).
  • catalog_ids expansion: Single → 1 URL; Yearly → every catalog year; Monthly → every catalog month between MinDate and MaxDate. Daily raises a descriptive ValueError (expansion would be open-ended; users pass explicit urls= or open via open_raster(...) with a date range first).
  • Existing-file pre-check mirrors rSDP: files > 1 kB on disk are considered valid and skipped unless overwrite=True. Partial files (< 1 kB) use HTTP Range resume when resume=True.
  • Returns a pandas.DataFrame status report with columns [url, dest, success, status, size, error] — one row per URL (including skipped existing files). return_status=FalseNone.
  • Backend: threaded requests via concurrent.futures.ThreadPoolExecutor (core deps only). Faster backends (obstore, fsspec+s3fs) deferred to ROADMAP §Phase 7.
  • Tests: 22 unit tests (input validation, catalog_id expansion for each TimeSeriesType, pre-check logic, responses-mocked end-to-end flow, overwrite + resume, HTTP error propagation) + 1 @pytest.mark.network integration test that downloads R1D001 (~4 MB UER streamlines) from real S3 in ~1.5 s.
  • Smoke test's NotImplementedError placeholder parametrize list is now empty — all public API functions are implemented.

  • Phase 4 — Point & polygon extraction (SPEC.md §9):

  • pysdp.extract_points(raster, locations, ...) — sample raster values at point locations. Accepts a GeoDataFrame or a plain DataFrame with x/y columns + explicit crs. Auto-reprojects locations to raster CRS when they differ (with a verbose message). method="linear" (default, bilinear via xr.interp) or method="nearest" (via xvec.extract_points).
  • pysdp.extract_polygons(raster, locations, stats="mean", ...) — zonal stats via xvec.zonal_stats. Default exact=False (centroid inclusion, parity with rSDP / terra::extract). exact=True and all_cells=True raise NotImplementedError pointing at ROADMAP §Phase 8a; exact=False covers the common case.
  • Port of rSDP's .filter_raster_layers_by_time() as _filter_by_time: years= or date_start/date_end= filters for time-indexed rasters, with error-on-empty-overlap and warn-on-partial-overlap semantics.
  • Output is long-form GeoDataFrame: one row per point (or polygon) for single-layer rasters; one row per (geometry × time) for time-series. bind=True (default) merges input attribute columns onto the output; bind=False returns just geometry + extracted values.
  • Tests: 34 new unit tests with synthetic local rasters + one @pytest.mark.network integration test that extracts elevation at three real RMBL field sites from the UG 3 m DEM (R3D009); verifies elevations fall in the sensible 2000–4500 m range for the Gunnison basin. All pass locally.
  • scipy>=1.11 added to core deps (required for xarray's interp(method='linear'), the bilinear extraction path; standard scientific Python dep).

  • Phase 3 — Raster access (SPEC.md §9):

  • pysdp.open_raster(catalog_id, ...) — lazy cloud COG access via rioxarray.open_rasterio over GDAL VSICURL. Returns xarray.Dataset with one data variable named after the product's canonical short name. Dims (y, x) / (band, y, x) for Single; (time, y, x) for Yearly / Monthly / Daily.
  • pysdp.open_raster(url=...) — URL-direct branch, no catalog lookup, no scale/offset application (matches rSDP).
  • pysdp.open_stack(catalog_ids, align="exact") — multi-product loader. align="exact" default verifies CRS + transform + shape consistency; mismatches raise a descriptive error listing which products drifted. align="reproject" is planned for Phase 7 and raises NotImplementedError today.
  • Time coordinate is uniformly pandas.DatetimeIndex: Daily → actual date, Monthly → first-of-month, Yearly → Jan 1. Enables ds.sel(time="2019"), .resample(), groupby("time.year") across all TimeSeriesTypes.
  • CRS set to EPSG:32613 via rio.write_crs(). Scale/offset from the catalog attached as CF scale_factor / add_offset attrs on the data variable (scale = 1 / DataScaleFactor per rSDP's convention; xarray.decode_cf() or mask_and_scale=True materializes the real values).
  • pysdp.io.vsicurl.gdal_defaults() + ensure_gdal_defaults() — minimal GDAL VSICURL env for cloud COGs (GDAL_DISABLE_READDIR_ON_OPEN=EMPTY_DIR, CPL_VSIL_CURL_ALLOWED_EXTENSIONS=.tif,.TIF,.tiff, VSI_CACHE=TRUE, VSI_CACHE_SIZE=5000000). Applied via os.environ.setdefault — never clobbers user-set values. Full cloud-tuned set (HTTP/2 etc.) lands in Phase 7.
  • chunks="auto" (default) gracefully falls back to eager reads with a UserWarning when dask is not installed; install pysdp[dask] for lazy Dask-backed reads.
  • download=True raises NotImplementedError pointing at Phase 5's pysdp.download().
  • Added pysdp._catalog_data.lookup_catalog_row() as a shared helper; get_metadata and open_raster now use it, emitting the same descriptive "unknown catalog_id" error.
  • Tests: 30 new unit tests in tests/test_raster.py (canonical-variable naming, time-coord construction, chunks fallback, GDAL-env safety, Dataset assembly via synthetic local COGs, open_stack grid-alignment checks) + 3 network integration tests against real S3 (Single + Daily + URL-direct branches).
  • dask[array]>=2024.1 added to the test dependency group so CI covers the chunked read path.

  • Phase 2 — Argument validation + time-slice resolvers (SPEC.md §9):

  • pysdp.io.template.substitute_template(){year}/{month}/{day} URL-template substitution with scalar/vector recycling and length-consistency checks. Port of rSDP's .substitute_template().
  • pysdp._validate.validate_user_args() — pre-catalog-lookup validation; zero-pads months to two-digit strings; rejects invalid combinations of catalog_id/url/date_start/date_end/ download_files/download_path.
  • pysdp._validate.validate_args_vs_type() — post-lookup check for whether a time-arg combo is valid for a given TimeSeriesType (Single rejects all time args; Yearly rejects months + years∧dates; Monthly requires months with years; Daily requires dates only).
  • pysdp._resolve.resolve_time_slices() and per-type resolvers (resolve_single, resolve_yearly, resolve_monthly, resolve_daily) returning a TimeSlices(paths, names) named tuple. Pure functions, no network, no raster I/O.
  • Preserved behavior carry-overs from rSDP: anchor-day seq(by="year"/"month") semantics for Yearly/Monthly date-range branches; 30-layer default clip for Daily datasets with no date bounds; error-on-empty-overlap; warn-on-partial-overlap.
  • 52 new unit tests: test_template.py (8 tests), test_validate.py (20 tests), test_resolve.py (24 tests). Ports rSDP's 32 testthat tests across test-internal_resolve.R and test-internal_validate.R, plus additional edge-case coverage.

  • Phase 1 — Catalog + metadata (SPEC.md §9):

  • pysdp.get_catalog() with three sources: packaged (default; offline, emits a UserWarning when the snapshot is older than SDP_STALENESS_MONTHS / default 6 months), live (refetches the CSV from S3), stac (returns a pystac.Catalog for the SDP static STAC v1 catalog; filter args ignored with a warning).
  • pysdp.get_metadata(catalog_id, as_dict=True) — fetches QGIS-style XML metadata; returns a dict via xmltodict or an lxml element. Descriptive KeyError on unknown catalog_id includes the snapshot date.
  • Packaged catalog CSV snapshot: SDP_product_table_04_14_2026.csv (156 products across UG/UER/GT/GMUG domains). Loaded via importlib.resources in pysdp._catalog_data. Handles both m/d/y and m/d/Y date formats mixed across rows, and preserves rSDP's sysdata.rda baking model.
  • scripts/update_catalog.py — mirrors rSDP's data-raw/SDP_catalog.R; downloads a fresh CSV from S3 and rotates the packaged snapshot.
  • Test suite: 48 unit tests (filter validation, date parsing, staleness warning, synthetic-DataFrame filter logic, responses-mocked HTTP) + 3 live integration tests under @pytest.mark.network.

  • Initial Phase 0 scaffolding: pyproject.toml (hatchling + hatch-vcs), src/pysdp/ package skeleton with public-API stubs, constants.py with real SDP catalog values (CRS, domains, types, releases, timeseries types), tests/ smoke tests, CI workflows (lint, type-check, test matrix on Python 3.11/3.12/3.13 × linux/macOS/windows), release workflow with PyPI Trusted Publishing, docs workflow stub, .pre-commit-config.yaml, MIT license. See SPEC.md §9 Phase 0.