Skip to content

API reference

All of pySDP's public surface. Import everything from the top-level package:

import pysdp

pysdp.get_catalog(...)
pysdp.open_raster(...)
# ...

Catalog discovery

get_catalog

get_catalog(
    domains: Sequence[str] | None = None,
    types: Sequence[str] | None = None,
    releases: Sequence[str] | None = None,
    timeseries_types: Sequence[str] | None = None,
    deprecated: bool | None = False,
    *,
    source: Literal[
        "packaged", "live", "stac"
    ] = "packaged",
) -> DataFrame | Catalog

Discover SDP datasets by filtering the product catalog.

pySDP ships with a snapshot of the SDP product catalog baked in; filtering is instantaneous and works offline. source="live" refetches the canonical CSV from S3 (useful when the packaged snapshot lags a recent catalog update). source="stac" returns the SDP's static STAC v1 catalog as a pystac.Catalog, which composes with the broader STAC ecosystem.

Parameters:

Name Type Description Default
domains sequence of str

Spatial domains to include ("UG", "UER", "GT", "GMUG"). See pysdp.DOMAINS for the canonical list. None (default) returns every domain.

None
types sequence of str

Dataset type categories (e.g., "Vegetation", "Topo", "Climate", "Snow"). See pysdp.TYPES. None returns all types.

None
releases sequence of str

Dataset release cohorts ("Release1".."Release5", "Basemaps"). See pysdp.RELEASES. None returns all.

None
timeseries_types sequence of str

One or more of "Single", "Yearly", "Monthly", "Daily", "Seasonal". See pysdp.TIMESERIES_TYPES. None returns all.

None
deprecated bool or None

False returns only current datasets. True returns only deprecated ones. None returns both.

False
source ('packaged', 'live', 'stac')

Where to pull the catalog from. See Notes.

"packaged"

Returns:

Type Description
DataFrame or Catalog

For CSV-backed sources, a DataFrame with one row per dataset and columns matching the SDP product-table schema (CatalogID, Release, Type, Product, Domain, Resolution, Deprecated, MinDate, MaxDate, MinYear, MaxYear, TimeSeriesType, DataType, DataUnit, DataScaleFactor, DataOffset, Data.URL, Metadata.URL). For source="stac", a pystac.Catalog rooted at the SDP's static STAC v1 catalog.

Raises:

Type Description
ValueError

If any filter argument contains a value outside its canonical vocabulary, or if source isn't one of the three valid options.

Warns:

Type Description
UserWarning

When source="packaged" and the packaged snapshot is older than SDP_STALENESS_MONTHS months (default 6; env-configurable). The warning suggests source="live" or a pysdp upgrade.

Notes

The packaged CSV is refreshed on each pysdp release. source="live" hits the S3-hosted canonical CSV directly, so it's always as fresh as upstream. source="stac" ignores filter arguments — use pystac traversal to filter the returned catalog. The catalog is browsable at radiantearth's STAC Browser <https://radiantearth.github.io/stac-browser/#/external/rmbl-sdp.s3.us-east-2.amazonaws.com/stac/v1/catalog.json>_.

Examples:

Get every current dataset:

>>> import pysdp
>>> cat = pysdp.get_catalog()
>>> cat.shape
(140, 18)

Filter to Upper Gunnison vegetation products:

>>> veg = pysdp.get_catalog(domains=["UG"], types=["Vegetation"])

Find all yearly time-series products across every domain:

>>> yearly = pysdp.get_catalog(timeseries_types=["Yearly"])

Return both current and deprecated entries:

>>> all_rows = pysdp.get_catalog(deprecated=None)
See Also

get_metadata : Fetch detailed XML metadata for one dataset. open_raster : Open a catalog entry as a lazy xarray.Dataset.

Source code in src/pysdp/catalog.py
def get_catalog(
    domains: Sequence[str] | None = None,
    types: Sequence[str] | None = None,
    releases: Sequence[str] | None = None,
    timeseries_types: Sequence[str] | None = None,
    deprecated: bool | None = False,
    *,
    source: Literal["packaged", "live", "stac"] = "packaged",
) -> pd.DataFrame | pystac.Catalog:
    """Discover SDP datasets by filtering the product catalog.

    pySDP ships with a snapshot of the SDP product catalog baked in; filtering
    is instantaneous and works offline. ``source="live"`` refetches the
    canonical CSV from S3 (useful when the packaged snapshot lags a recent
    catalog update). ``source="stac"`` returns the SDP's static STAC v1
    catalog as a ``pystac.Catalog``, which composes with the broader STAC
    ecosystem.

    Parameters
    ----------
    domains : sequence of str, optional
        Spatial domains to include (``"UG"``, ``"UER"``, ``"GT"``, ``"GMUG"``).
        See ``pysdp.DOMAINS`` for the canonical list. ``None`` (default)
        returns every domain.
    types : sequence of str, optional
        Dataset type categories (e.g., ``"Vegetation"``, ``"Topo"``,
        ``"Climate"``, ``"Snow"``). See ``pysdp.TYPES``. ``None`` returns
        all types.
    releases : sequence of str, optional
        Dataset release cohorts (``"Release1"``..``"Release5"``,
        ``"Basemaps"``). See ``pysdp.RELEASES``. ``None`` returns all.
    timeseries_types : sequence of str, optional
        One or more of ``"Single"``, ``"Yearly"``, ``"Monthly"``,
        ``"Daily"``, ``"Seasonal"``. See ``pysdp.TIMESERIES_TYPES``.
        ``None`` returns all.
    deprecated : bool or None, default False
        ``False`` returns only current datasets. ``True`` returns only
        deprecated ones. ``None`` returns both.
    source : {"packaged", "live", "stac"}, default "packaged"
        Where to pull the catalog from. See Notes.

    Returns
    -------
    pandas.DataFrame or pystac.Catalog
        For CSV-backed sources, a DataFrame with one row per dataset and
        columns matching the SDP product-table schema (``CatalogID``,
        ``Release``, ``Type``, ``Product``, ``Domain``, ``Resolution``,
        ``Deprecated``, ``MinDate``, ``MaxDate``, ``MinYear``, ``MaxYear``,
        ``TimeSeriesType``, ``DataType``, ``DataUnit``,
        ``DataScaleFactor``, ``DataOffset``, ``Data.URL``, ``Metadata.URL``).
        For ``source="stac"``, a ``pystac.Catalog`` rooted at the SDP's
        static STAC v1 catalog.

    Raises
    ------
    ValueError
        If any filter argument contains a value outside its canonical
        vocabulary, or if ``source`` isn't one of the three valid options.

    Warns
    -----
    UserWarning
        When ``source="packaged"`` and the packaged snapshot is older than
        ``SDP_STALENESS_MONTHS`` months (default 6; env-configurable). The
        warning suggests ``source="live"`` or a pysdp upgrade.

    Notes
    -----
    The packaged CSV is refreshed on each pysdp release. ``source="live"``
    hits the S3-hosted canonical CSV directly, so it's always as fresh as
    upstream. ``source="stac"`` ignores filter arguments — use pystac
    traversal to filter the returned catalog. The catalog is browsable at
    `radiantearth's STAC Browser
    <https://radiantearth.github.io/stac-browser/#/external/rmbl-sdp.s3.us-east-2.amazonaws.com/stac/v1/catalog.json>`_.

    Examples
    --------
    Get every current dataset:

    >>> import pysdp
    >>> cat = pysdp.get_catalog()  # doctest: +SKIP
    >>> cat.shape  # doctest: +SKIP
    (140, 18)

    Filter to Upper Gunnison vegetation products:

    >>> veg = pysdp.get_catalog(domains=["UG"], types=["Vegetation"])  # doctest: +SKIP

    Find all yearly time-series products across every domain:

    >>> yearly = pysdp.get_catalog(timeseries_types=["Yearly"])  # doctest: +SKIP

    Return both current and deprecated entries:

    >>> all_rows = pysdp.get_catalog(deprecated=None)  # doctest: +SKIP

    See Also
    --------
    get_metadata : Fetch detailed XML metadata for one dataset.
    open_raster : Open a catalog entry as a lazy ``xarray.Dataset``.
    """
    if source == "stac":
        from pysdp.stac import get_stac_catalog

        if any(v is not None for v in (domains, types, releases, timeseries_types)):
            warnings.warn(
                "Filter arguments are ignored when source='stac'. "
                "Use pystac traversal to filter the returned catalog.",
                UserWarning,
                stacklevel=2,
            )
        return get_stac_catalog()

    _validate_filter(domains, DOMAINS, "domains")
    _validate_filter(types, TYPES, "types")
    _validate_filter(releases, RELEASES, "releases")
    _validate_filter(timeseries_types, TIMESERIES_TYPES, "timeseries_types")

    if source == "packaged":
        df = load_packaged_catalog()
    elif source == "live":
        df = load_live_catalog()
    else:
        raise ValueError(f"Unknown source: {source!r}. Must be 'packaged', 'live', or 'stac'.")

    return _apply_filters(
        df,
        domains=domains,
        types=types,
        releases=releases,
        timeseries_types=timeseries_types,
        deprecated=deprecated,
    )

get_metadata

get_metadata(
    catalog_id: str, *, as_dict: bool = True
) -> dict[str, Any] | Any

Fetch the QGIS-style XML metadata for one SDP dataset.

Each SDP product has a companion metadata XML document on S3 that describes provenance, sensor details, processing history, and other long-form context. This function fetches that XML over HTTP and parses it.

Parameters:

Name Type Description Default
catalog_id str

Six-character SDP catalog ID (e.g., "R3D009", "BM012").

required
as_dict bool

If True, return a nested dict parsed via xmltodict — convenient for scripting. If False, return the parsed lxml.etree._Element — better for XPath queries.

True

Returns:

Type Description
dict or _Element

Parsed metadata. For dict output, the top-level key is typically "qgis" (reflecting the document's QGIS metadata schema).

Raises:

Type Description
ValueError

If catalog_id isn't exactly six characters.

KeyError

If catalog_id isn't in the packaged catalog. The message includes the snapshot date and suggests source="live" or an upgrade.

HTTPError

If the XML URL returns a non-2xx status (rare; implies an upstream data-hosting issue).

Examples:

Get the metadata for the UG 3 m bare-earth DEM as a dict:

>>> import pysdp
>>> meta = pysdp.get_metadata("R3D009")
>>> meta["qgis"]["abstract"]
'This 3 m resolution digital elevation model...'
See Also

get_catalog : Discover catalog IDs by filtering. open_raster : Open a catalog entry as a raster.

Source code in src/pysdp/catalog.py
def get_metadata(
    catalog_id: str,
    *,
    as_dict: bool = True,
) -> dict[str, Any] | Any:
    """Fetch the QGIS-style XML metadata for one SDP dataset.

    Each SDP product has a companion metadata XML document on S3 that
    describes provenance, sensor details, processing history, and other
    long-form context. This function fetches that XML over HTTP and parses
    it.

    Parameters
    ----------
    catalog_id : str
        Six-character SDP catalog ID (e.g., ``"R3D009"``, ``"BM012"``).
    as_dict : bool, default True
        If ``True``, return a nested ``dict`` parsed via ``xmltodict`` —
        convenient for scripting. If ``False``, return the parsed
        ``lxml.etree._Element`` — better for XPath queries.

    Returns
    -------
    dict or lxml.etree._Element
        Parsed metadata. For dict output, the top-level key is typically
        ``"qgis"`` (reflecting the document's QGIS metadata schema).

    Raises
    ------
    ValueError
        If ``catalog_id`` isn't exactly six characters.
    KeyError
        If ``catalog_id`` isn't in the packaged catalog. The message
        includes the snapshot date and suggests ``source="live"`` or an upgrade.
    requests.HTTPError
        If the XML URL returns a non-2xx status (rare; implies an
        upstream data-hosting issue).

    Examples
    --------
    Get the metadata for the UG 3 m bare-earth DEM as a dict:

    >>> import pysdp
    >>> meta = pysdp.get_metadata("R3D009")  # doctest: +SKIP
    >>> meta["qgis"]["abstract"]  # doctest: +SKIP
    'This 3 m resolution digital elevation model...'

    See Also
    --------
    get_catalog : Discover catalog IDs by filtering.
    open_raster : Open a catalog entry as a raster.
    """
    import requests

    row = lookup_catalog_row(catalog_id)
    resp = requests.get(row["Metadata.URL"], timeout=30)
    resp.raise_for_status()

    if as_dict:
        import xmltodict

        return xmltodict.parse(resp.content)
    from lxml import etree

    return etree.fromstring(resp.content)

Raster access

open_raster

open_raster(
    catalog_id: str | None = None,
    url: str | None = None,
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    chunks: dict[str, int]
    | Literal["auto"]
    | None = "auto",
    download: bool = False,
    download_path: str | PathLike[str] | None = None,
    overwrite: bool = False,
    verbose: bool = True,
) -> Dataset

Open an SDP raster as a lazy xarray.Dataset.

Reads cloud-optimized GeoTIFFs from S3 via GDAL's VSICURL, without downloading. Returns a Dataset with one data variable named after the product's canonical short name (e.g. "UG_dem_3m_v1"). CRS is always set to EPSG:32613 (UTM 13N). For time-series products, the Dataset gains a uniform pandas.DatetimeIndex on the time coordinate — Daily → actual date, Monthly → first-of-month, Yearly → Jan 1.

Parameters:

Name Type Description Default
catalog_id str

Six-character SDP catalog ID (e.g., "R3D009"). Mutually exclusive with url. When given, scale/offset metadata from the catalog are attached as CF attrs on the data variable.

None
url str

Direct HTTPS URL to an SDP COG. Mutually exclusive with catalog_id. No catalog lookup, so scale/offset attrs come from the COG header only.

None
years sequence of int

For Yearly products, which years to load. Alternative to date_start/date_end.

None
months sequence of int

For Monthly products, which months (1–12) to load. Must be combined with years or used alone (all years × months requested).

None
date_start str or date

Date range to load (inclusive). For Daily, defines the time slice; for Monthly/Yearly, uses rSDP's anchor-day stepping semantics. When neither is given on a Daily product, the first 30 days from MinDate are loaded to avoid accidental 10-year VSICURL handle explosions (matches rSDP).

None
date_end str or date

Date range to load (inclusive). For Daily, defines the time slice; for Monthly/Yearly, uses rSDP's anchor-day stepping semantics. When neither is given on a Daily product, the first 30 days from MinDate are loaded to avoid accidental 10-year VSICURL handle explosions (matches rSDP).

None
chunks dict, "auto", or None

Dask chunking. "auto" uses xarray's chunk inference (requires pysdp[dask]; falls back to eager reads with a warning if dask isn't installed). None eager-loads. Pass a dict for manual control (e.g., {"x": 1024, "y": 1024}).

"auto"
download bool

Not yet implemented (Phase 5). For now, raises NotImplementedError. Use pysdp.download() to bulk-fetch COGs to disk, then open them with rioxarray.open_rasterio.

False
download_path str or PathLike

Directory for downloaded files (only used when download=True).

None
overwrite bool

Reserved for the download path (not yet implemented).

False
verbose bool

If True, print progress messages (layer count etc.) to stderr.

True

Returns:

Type Description
Dataset

Dataset with one data variable. Dimensions depend on the product:

  • (y, x) for single-band Single products
  • (band, y, x) for multi-band Single products
  • (time, y, x) for Yearly/Monthly/Daily time-series (where time is a pandas.DatetimeIndex)

CRS is EPSG:32613 written via rio.write_crs. Catalog-derived scale/offset metadata is attached to the variable as CF scale_factor and add_offset attrs; call ds.decode_cf() or open with mask_and_scale=True to materialize real values.

Raises:

Type Description
ValueError

On invalid catalog_id / url combinations, time-arg combinations inconsistent with the product's TimeSeriesType, or a url that doesn't start with https://.

KeyError

If catalog_id isn't in the packaged catalog.

NotImplementedError

If download=True (Phase 5 stub).

Examples:

Open the UG 3 m bare-earth DEM (Single product):

>>> import pysdp
>>> dem = pysdp.open_raster("R3D009")
>>> dem.rio.crs.to_epsg()
32613

Open three days of daily Tmax:

>>> tmax = pysdp.open_raster(
...     "R4D004",
...     date_start="2021-11-02",
...     date_end="2021-11-04",
... )
>>> tmax.sizes["time"]
3

Open a single year of annual snow persistence:

>>> snow = pysdp.open_raster("R4D001", years=[2019])
See Also

open_stack : Load multiple products as variables in one Dataset. extract_points : Sample an opened raster at point locations. extract_polygons : Summarize an opened raster over polygons.

Source code in src/pysdp/raster.py
def open_raster(
    catalog_id: str | None = None,
    url: str | None = None,
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    chunks: dict[str, int] | Literal["auto"] | None = "auto",
    download: bool = False,
    download_path: str | os.PathLike[str] | None = None,
    overwrite: bool = False,
    verbose: bool = True,
) -> xr.Dataset:
    """Open an SDP raster as a lazy ``xarray.Dataset``.

    Reads cloud-optimized GeoTIFFs from S3 via GDAL's VSICURL, without
    downloading. Returns a Dataset with one data variable named after the
    product's canonical short name (e.g. ``"UG_dem_3m_v1"``). CRS is always
    set to ``EPSG:32613`` (UTM 13N). For time-series products, the Dataset
    gains a uniform ``pandas.DatetimeIndex`` on the ``time`` coordinate —
    Daily → actual date, Monthly → first-of-month, Yearly → Jan 1.

    Parameters
    ----------
    catalog_id : str, optional
        Six-character SDP catalog ID (e.g., ``"R3D009"``). Mutually
        exclusive with ``url``. When given, scale/offset metadata from the
        catalog are attached as CF attrs on the data variable.
    url : str, optional
        Direct HTTPS URL to an SDP COG. Mutually exclusive with
        ``catalog_id``. No catalog lookup, so scale/offset attrs come from
        the COG header only.
    years : sequence of int, optional
        For Yearly products, which years to load. Alternative to
        ``date_start``/``date_end``.
    months : sequence of int, optional
        For Monthly products, which months (1–12) to load. Must be combined
        with ``years`` or used alone (all years × months requested).
    date_start, date_end : str or datetime.date, optional
        Date range to load (inclusive). For Daily, defines the time slice;
        for Monthly/Yearly, uses rSDP's anchor-day stepping semantics. When
        neither is given on a Daily product, the first 30 days from
        ``MinDate`` are loaded to avoid accidental 10-year VSICURL handle
        explosions (matches rSDP).
    chunks : dict, "auto", or None, default "auto"
        Dask chunking. ``"auto"`` uses xarray's chunk inference (requires
        ``pysdp[dask]``; falls back to eager reads with a warning if dask
        isn't installed). ``None`` eager-loads. Pass a dict for manual
        control (e.g., ``{"x": 1024, "y": 1024}``).
    download : bool, default False
        **Not yet implemented (Phase 5).** For now, raises
        ``NotImplementedError``. Use ``pysdp.download()`` to bulk-fetch
        COGs to disk, then open them with ``rioxarray.open_rasterio``.
    download_path : str or PathLike, optional
        Directory for downloaded files (only used when ``download=True``).
    overwrite : bool, default False
        Reserved for the download path (not yet implemented).
    verbose : bool, default True
        If ``True``, print progress messages (layer count etc.) to stderr.

    Returns
    -------
    xarray.Dataset
        Dataset with one data variable. Dimensions depend on the product:

        - ``(y, x)`` for single-band ``Single`` products
        - ``(band, y, x)`` for multi-band ``Single`` products
        - ``(time, y, x)`` for ``Yearly``/``Monthly``/``Daily`` time-series
          (where ``time`` is a ``pandas.DatetimeIndex``)

        CRS is ``EPSG:32613`` written via ``rio.write_crs``. Catalog-derived
        scale/offset metadata is attached to the variable as CF
        ``scale_factor`` and ``add_offset`` attrs; call
        ``ds.decode_cf()`` or open with ``mask_and_scale=True`` to
        materialize real values.

    Raises
    ------
    ValueError
        On invalid ``catalog_id`` / ``url`` combinations, time-arg
        combinations inconsistent with the product's ``TimeSeriesType``, or
        a ``url`` that doesn't start with ``https://``.
    KeyError
        If ``catalog_id`` isn't in the packaged catalog.
    NotImplementedError
        If ``download=True`` (Phase 5 stub).

    Examples
    --------
    Open the UG 3 m bare-earth DEM (``Single`` product):

    >>> import pysdp
    >>> dem = pysdp.open_raster("R3D009")  # doctest: +SKIP
    >>> dem.rio.crs.to_epsg()  # doctest: +SKIP
    32613

    Open three days of daily Tmax:

    >>> tmax = pysdp.open_raster(
    ...     "R4D004",
    ...     date_start="2021-11-02",
    ...     date_end="2021-11-04",
    ... )  # doctest: +SKIP
    >>> tmax.sizes["time"]  # doctest: +SKIP
    3

    Open a single year of annual snow persistence:

    >>> snow = pysdp.open_raster("R4D001", years=[2019])  # doctest: +SKIP

    See Also
    --------
    open_stack : Load multiple products as variables in one Dataset.
    extract_points : Sample an opened raster at point locations.
    extract_polygons : Summarize an opened raster over polygons.
    """
    ensure_gdal_defaults()
    normalized = validate_user_args(
        catalog_id=catalog_id,
        url=url,
        years=years,
        months=months,
        date_start=date_start,
        date_end=date_end,
        download_files=download,
        download_path=download_path,
    )
    months_pad = normalized["months_pad"]

    if download:
        raise NotImplementedError(
            "download=True is implemented in Phase 5; see `pysdp.download()` "
            "for the bulk-download path. Phase 3 supports lazy cloud reads only."
        )

    if catalog_id is not None:
        # `pd.Series` quacks as `Mapping[str, Any]` for our purposes; convert
        # so the resolver + dataset builder can be typed against a simple
        # Mapping and also accept plain-dict fixtures in unit tests.
        cat_line: dict[str, Any] = dict(lookup_catalog_row(catalog_id))
        ts_type = str(cat_line["TimeSeriesType"])
        validate_args_vs_type(
            ts_type,
            years=years,
            months=months,
            date_start=date_start,
            date_end=date_end,
        )
        slices = resolve_time_slices(
            cat_line,
            years=years,
            months_pad=months_pad,
            date_start=date_start,
            date_end=date_end,
            verbose=verbose,
        )
        return _build_dataset(slices, cat_line=cat_line, url=None, chunks=chunks)

    # url= branch: single-layer only. Scale/offset are skipped (no catalog row).
    assert url is not None
    if not url.startswith("https://"):
        raise ValueError("A valid URL must start with 'https://'.")
    slices = TimeSlices(paths=[VSICURL_PREFIX + url], names=[])
    return _build_dataset(slices, cat_line=None, url=url, chunks=chunks)

open_stack

open_stack(
    catalog_ids: Sequence[str],
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    chunks: dict[str, int]
    | Literal["auto"]
    | None = "auto",
    align: Literal["exact", "reproject"] = "exact",
    verbose: bool = True,
) -> Dataset

Load multiple SDP products into a single xarray.Dataset.

Each product becomes one data variable. x/y (and time where applicable) coordinates are shared across variables, so downstream analysis can treat the stack as a single object (ds["dem"] - ds["snow_persistence"].mean("time") etc.). Use this when you want to compose products that are already on the same grid — for example an elevation model and a slope raster both derived from the same LiDAR campaign.

Parameters:

Name Type Description Default
catalog_ids sequence of str

Non-empty sequence of SDP catalog IDs.

required
years optional

Shared time-slicing args. Applied to every time-series product in the stack; ignored for Single products.

None
months optional

Shared time-slicing args. Applied to every time-series product in the stack; ignored for Single products.

None
date_start optional

Shared time-slicing args. Applied to every time-series product in the stack; ignored for Single products.

None
date_end optional

Shared time-slicing args. Applied to every time-series product in the stack; ignored for Single products.

None
chunks dict, "auto", or None

Dask chunking, passed through to each open_raster call.

"auto"
align ('exact', 'reproject')

"exact" requires all products to share CRS + transform + shape; raises ValueError on mismatch with a descriptive list of which products diverged. "reproject" reprojects to the first product's grid via odc-stac (planned for Phase 7 of the ROADMAP; currently raises NotImplementedError).

"exact"
verbose bool

Forwarded to open_raster for per-product progress messages.

True

Returns:

Type Description
Dataset

One data variable per catalog_id. See :func:open_raster for per-variable shape and CRS.

Raises:

Type Description
ValueError

If catalog_ids is empty, if align isn't one of the two valid values, or (with align="exact") if the products don't share a common grid.

NotImplementedError

If align="reproject" (Phase 7 future work).

Examples:

Stack the UG 3 m DEM with the matching slope and aspect rasters:

>>> import pysdp
>>> topo = pysdp.open_stack(["R3D009", "R3D012", "R3D010"])
>>> sorted(topo.data_vars)
['UG_dem_3m_v1', 'UG_dem_slope_1m_v1', 'UG_topographic_aspect_southness_1m_v1']
See Also

open_raster : Single-product load.

Source code in src/pysdp/raster.py
def open_stack(
    catalog_ids: Sequence[str],
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    chunks: dict[str, int] | Literal["auto"] | None = "auto",
    align: Literal["exact", "reproject"] = "exact",
    verbose: bool = True,
) -> xr.Dataset:
    """Load multiple SDP products into a single ``xarray.Dataset``.

    Each product becomes one data variable. ``x``/``y`` (and ``time`` where
    applicable) coordinates are shared across variables, so downstream
    analysis can treat the stack as a single object (``ds["dem"] -
    ds["snow_persistence"].mean("time")`` etc.). Use this when you want to
    compose products that are already on the same grid — for example an
    elevation model and a slope raster both derived from the same LiDAR
    campaign.

    Parameters
    ----------
    catalog_ids : sequence of str
        Non-empty sequence of SDP catalog IDs.
    years, months, date_start, date_end : optional
        Shared time-slicing args. Applied to every time-series product in
        the stack; ignored for ``Single`` products.
    chunks : dict, "auto", or None, default "auto"
        Dask chunking, passed through to each ``open_raster`` call.
    align : {"exact", "reproject"}, default "exact"
        ``"exact"`` requires all products to share CRS + transform +
        shape; raises ``ValueError`` on mismatch with a descriptive list of
        which products diverged. ``"reproject"`` reprojects to the first
        product's grid via ``odc-stac`` (planned for Phase 7 of the
        ROADMAP; currently raises ``NotImplementedError``).
    verbose : bool, default True
        Forwarded to ``open_raster`` for per-product progress messages.

    Returns
    -------
    xarray.Dataset
        One data variable per catalog_id. See
        :func:`open_raster` for per-variable shape and CRS.

    Raises
    ------
    ValueError
        If ``catalog_ids`` is empty, if ``align`` isn't one of the two
        valid values, or (with ``align="exact"``) if the products don't
        share a common grid.
    NotImplementedError
        If ``align="reproject"`` (Phase 7 future work).

    Examples
    --------
    Stack the UG 3 m DEM with the matching slope and aspect rasters:

    >>> import pysdp
    >>> topo = pysdp.open_stack(["R3D009", "R3D012", "R3D010"])  # doctest: +SKIP
    >>> sorted(topo.data_vars)  # doctest: +SKIP
    ['UG_dem_3m_v1', 'UG_dem_slope_1m_v1', 'UG_topographic_aspect_southness_1m_v1']

    See Also
    --------
    open_raster : Single-product load.
    """
    if not catalog_ids:
        raise ValueError("catalog_ids must be a non-empty sequence.")
    if align == "reproject":
        raise NotImplementedError(
            "align='reproject' is implemented in Phase 7 (requires `pip install "
            "pysdp[stac]`). For now, pass `align='exact'` and load products "
            "that share a grid, or reproject explicitly with rioxarray."
        )
    if align != "exact":
        raise ValueError(f"Unknown align: {align!r}. Must be 'exact' or 'reproject'.")

    datasets = [
        open_raster(
            cid,
            years=years,
            months=months,
            date_start=date_start,
            date_end=date_end,
            chunks=chunks,
            verbose=verbose,
        )
        for cid in catalog_ids
    ]
    _verify_exact_alignment(datasets, catalog_ids=list(catalog_ids))
    return xr.merge(datasets, compat="equals", join="exact")

Extraction

extract_points

extract_points(
    raster: Dataset | DataArray,
    locations: GeoDataFrame | DataFrame,
    *,
    x: str = "x",
    y: str = "y",
    crs: str | None = None,
    method: Literal["nearest", "linear"] = "linear",
    years: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> GeoDataFrame

Extract raster values at point locations.

Accepts an xarray.Dataset or DataArray (typically from :func:open_raster / :func:open_stack) and a GeoDataFrame or plain DataFrame with x/y columns. Reprojects the input locations to the raster CRS if they differ.

Parameters:

Name Type Description Default
raster Dataset or DataArray

The raster to sample from. Must have x and y spatial coordinates and a CRS set (via rio.write_crs). Time-series rasters (with a time dim) produce long-form output.

required
locations GeoDataFrame or DataFrame

Points to sample. If a plain DataFrame, pass the column names via x=/y= and an explicit crs=. If a GeoDataFrame, its geometry column is used and its CRS must be set.

required
x str

Column names holding longitude/x and latitude/y for DataFrame inputs. Ignored for GeoDataFrame inputs.

"x", "y"
y str

Column names holding longitude/x and latitude/y for DataFrame inputs. Ignored for GeoDataFrame inputs.

"x", "y"
crs str

CRS of the input locations (e.g., "EPSG:4326"). Required when locations is a plain DataFrame; inferred from locations.crs for GeoDataFrame inputs.

None
method ('nearest', 'linear')

Interpolation method. "linear" is bilinear via xarray.interp (requires scipy, a core pysdp dep). "nearest" snaps to the nearest cell via xvec.extract_points and is substantially faster for large cloud rasters.

"nearest"
years sequence of int

Time filter applied before extraction. Only valid for time-series rasters.

None
date_start str or date

Date-range filter applied before extraction.

None
date_end str or date

Date-range filter applied before extraction.

None
bind bool

If True, merge the input location's non-geometry columns onto each output row. If False, return only geometry + extracted values.

True
verbose bool

Print per-extraction progress messages to stderr.

True

Returns:

Type Description
GeoDataFrame

Output GeoDataFrame with the raster's data variables as columns. For time-series rasters, output is long-form (one row per geometry × time) with time as a column; pivot to wide if needed via df.pivot_table(index=..., columns="time", values=...).

Raises:

Type Description
ValueError

If the raster has no CRS, if location CRS/columns are missing, if method isn't one of the two valid values, or if time filter args are passed for a non-time-indexed raster.

Examples:

Extract elevation at three RMBL-area field sites:

>>> import pysdp, geopandas as gpd
>>> from shapely.geometry import Point
>>> dem = pysdp.open_raster("R3D009")
>>> sites = gpd.GeoDataFrame(
...     {"site": ["Roaring Judy", "Gothic", "Galena Lake"]},
...     geometry=[
...         Point(-106.853186, 38.716995),
...         Point(-106.988934, 38.958446),
...         Point(-107.072569, 39.021644),
...     ],
...     crs="EPSG:4326",
... )
>>> samples = pysdp.extract_points(dem, sites)

Sample daily Tmax at the same sites and pivot to wide format:

>>> tmax = pysdp.open_raster("R4D004", date_start="2021-11-02", date_end="2021-11-04")
>>> long = pysdp.extract_points(tmax, sites)
>>> wide = long.pivot_table(index="site", columns="time", values="bayes_tmax_est")

Extract from a plain DataFrame (no GeoPandas needed upfront):

>>> import pandas as pd
>>> df = pd.DataFrame({"site": ["A"], "lon": [-106.85], "lat": [38.95]})
>>> out = pysdp.extract_points(dem, df, x="lon", y="lat", crs="EPSG:4326")
See Also

extract_polygons : Summarize values over polygon geometries. open_raster : Load a raster to extract from.

Source code in src/pysdp/extract.py
def extract_points(
    raster: xr.Dataset | xr.DataArray,
    locations: gpd.GeoDataFrame | pd.DataFrame,
    *,
    x: str = "x",
    y: str = "y",
    crs: str | None = None,
    method: Literal["nearest", "linear"] = "linear",
    years: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> gpd.GeoDataFrame:
    """Extract raster values at point locations.

    Accepts an ``xarray.Dataset`` or ``DataArray`` (typically from
    :func:`open_raster` / :func:`open_stack`) and a ``GeoDataFrame`` or
    plain ``DataFrame`` with ``x``/``y`` columns. Reprojects the input
    locations to the raster CRS if they differ.

    Parameters
    ----------
    raster : xarray.Dataset or xarray.DataArray
        The raster to sample from. Must have ``x`` and ``y`` spatial
        coordinates and a CRS set (via ``rio.write_crs``). Time-series
        rasters (with a ``time`` dim) produce long-form output.
    locations : GeoDataFrame or DataFrame
        Points to sample. If a plain ``DataFrame``, pass the column names
        via ``x=``/``y=`` and an explicit ``crs=``. If a ``GeoDataFrame``,
        its geometry column is used and its CRS must be set.
    x, y : str, default "x", "y"
        Column names holding longitude/x and latitude/y for
        ``DataFrame`` inputs. Ignored for ``GeoDataFrame`` inputs.
    crs : str, optional
        CRS of the input locations (e.g., ``"EPSG:4326"``). Required when
        ``locations`` is a plain ``DataFrame``; inferred from
        ``locations.crs`` for ``GeoDataFrame`` inputs.
    method : {"nearest", "linear"}, default "linear"
        Interpolation method. ``"linear"`` is bilinear via
        ``xarray.interp`` (requires ``scipy``, a core pysdp dep).
        ``"nearest"`` snaps to the nearest cell via ``xvec.extract_points``
        and is substantially faster for large cloud rasters.
    years : sequence of int, optional
        Time filter applied before extraction. Only valid for time-series
        rasters.
    date_start, date_end : str or datetime.date, optional
        Date-range filter applied before extraction.
    bind : bool, default True
        If ``True``, merge the input location's non-geometry columns onto
        each output row. If ``False``, return only geometry + extracted
        values.
    verbose : bool, default True
        Print per-extraction progress messages to stderr.

    Returns
    -------
    geopandas.GeoDataFrame
        Output GeoDataFrame with the raster's data variables as columns.
        For time-series rasters, output is **long-form** (one row per
        ``geometry × time``) with ``time`` as a column; pivot to wide if
        needed via ``df.pivot_table(index=..., columns="time", values=...)``.

    Raises
    ------
    ValueError
        If the raster has no CRS, if location CRS/columns are missing, if
        ``method`` isn't one of the two valid values, or if time filter
        args are passed for a non-time-indexed raster.

    Examples
    --------
    Extract elevation at three RMBL-area field sites:

    >>> import pysdp, geopandas as gpd
    >>> from shapely.geometry import Point
    >>> dem = pysdp.open_raster("R3D009")  # doctest: +SKIP
    >>> sites = gpd.GeoDataFrame(
    ...     {"site": ["Roaring Judy", "Gothic", "Galena Lake"]},
    ...     geometry=[
    ...         Point(-106.853186, 38.716995),
    ...         Point(-106.988934, 38.958446),
    ...         Point(-107.072569, 39.021644),
    ...     ],
    ...     crs="EPSG:4326",
    ... )
    >>> samples = pysdp.extract_points(dem, sites)  # doctest: +SKIP

    Sample daily Tmax at the same sites and pivot to wide format:

    >>> tmax = pysdp.open_raster("R4D004", date_start="2021-11-02", date_end="2021-11-04")  # doctest: +SKIP
    >>> long = pysdp.extract_points(tmax, sites)  # doctest: +SKIP
    >>> wide = long.pivot_table(index="site", columns="time", values="bayes_tmax_est")  # doctest: +SKIP

    Extract from a plain ``DataFrame`` (no GeoPandas needed upfront):

    >>> import pandas as pd
    >>> df = pd.DataFrame({"site": ["A"], "lon": [-106.85], "lat": [38.95]})
    >>> out = pysdp.extract_points(dem, df, x="lon", y="lat", crs="EPSG:4326")  # doctest: +SKIP

    See Also
    --------
    extract_polygons : Summarize values over polygon geometries.
    open_raster : Load a raster to extract from.
    """
    raster = _filter_by_time(
        raster,
        years=years,
        date_start=date_start,
        date_end=date_end,
        verbose=verbose,
    )
    gdf = _to_geodataframe(locations, x=x, y=y, crs=crs)
    gdf = _align_to_raster_crs(gdf, raster, verbose=verbose)

    n_points = len(gdf)
    n_time = int(raster.sizes.get("time", 1))
    _emit(f"Extracting values at {n_points} location(s) × {n_time} layer(s).", verbose)

    if method == "nearest":
        extracted = _point_extract_nearest(raster, gdf)
    elif method == "linear":
        extracted = _point_extract_linear(raster, gdf)
    else:
        raise ValueError(f"method must be 'nearest' or 'linear', got {method!r}.")

    _emit("Extraction complete.", verbose)
    return _extracted_to_geodataframe(extracted, gdf, bind=bind)

extract_polygons

extract_polygons(
    raster: Dataset | DataArray,
    locations: GeoDataFrame,
    *,
    stats: Sequence[str] | str = "mean",
    exact: bool = False,
    all_cells: bool = False,
    years: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> GeoDataFrame | DataFrame

Summarize raster values over polygon locations.

Computes per-polygon summary statistics (mean by default). For time-series rasters, produces one summary per (polygon × time) pair in long-form output.

Parameters:

Name Type Description Default
raster Dataset or DataArray

Raster to summarize. Must have a CRS set.

required
locations GeoDataFrame

Polygon geometries. Must be a GeoDataFrame (not a plain DataFrame) with CRS set.

required
stats str or sequence of str

Summary statistic(s) to compute. Accepts any xvec.zonal_stats string ("mean", "sum", "std", "min", "max", "median", "count", "nunique") or a callable. Pass a list for multiple stats.

"mean"
exact bool

False (default) uses centroid-based cell inclusion via xvec.zonal_stats — matches rSDP / terra::extract behavior. True uses fractional-coverage weighting via exactextract (requires pysdp[exact]); recommended for small polygons relative to cell size. True path is a Phase 8a roadmap item and currently raises NotImplementedError.

False
all_cells bool

If True, return a long-form DataFrame of per-cell values and coverage fractions instead of per-polygon summary statistics. Phase 8a roadmap item; currently raises NotImplementedError.

False
years optional

Time-series filters applied before summarization. Same semantics as in :func:extract_points.

None
date_start optional

Time-series filters applied before summarization. Same semantics as in :func:extract_points.

None
date_end optional

Time-series filters applied before summarization. Same semantics as in :func:extract_points.

None
bind bool

Merge input attribute columns onto output rows when True.

True
verbose bool

Print progress messages.

True

Returns:

Type Description
GeoDataFrame or DataFrame

GeoDataFrame when bind=True; DataFrame when bind=False. Columns include the raster's data variables (one per summary stat).

Raises:

Type Description
TypeError

If locations isn't a GeoDataFrame.

ValueError

On missing CRS or other location validation failures.

NotImplementedError

For exact=True or all_cells=True (roadmap items).

Examples:

Compute mean snow duration over watersheds for 2019:

>>> import pysdp, geopandas as gpd
>>> snow = pysdp.open_raster("R4D001", years=[2019])
>>> watersheds = gpd.read_file("watersheds.gpkg")
>>> out = pysdp.extract_polygons(snow, watersheds, stats="mean")

Compute multiple statistics in one call:

>>> stats = pysdp.extract_polygons(
...     snow, watersheds, stats=["mean", "std", "min", "max"]
... )
See Also

extract_points : Extract at point geometries. open_raster : Load a raster.

Source code in src/pysdp/extract.py
def extract_polygons(
    raster: xr.Dataset | xr.DataArray,
    locations: gpd.GeoDataFrame,
    *,
    stats: Sequence[str] | str = "mean",
    exact: bool = False,
    all_cells: bool = False,
    years: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> gpd.GeoDataFrame | pd.DataFrame:
    """Summarize raster values over polygon locations.

    Computes per-polygon summary statistics (mean by default). For
    time-series rasters, produces one summary per ``(polygon × time)``
    pair in long-form output.

    Parameters
    ----------
    raster : xarray.Dataset or xarray.DataArray
        Raster to summarize. Must have a CRS set.
    locations : GeoDataFrame
        Polygon geometries. Must be a ``GeoDataFrame`` (not a plain
        ``DataFrame``) with CRS set.
    stats : str or sequence of str, default "mean"
        Summary statistic(s) to compute. Accepts any ``xvec.zonal_stats``
        string (``"mean"``, ``"sum"``, ``"std"``, ``"min"``, ``"max"``,
        ``"median"``, ``"count"``, ``"nunique"``) or a callable. Pass a
        list for multiple stats.
    exact : bool, default False
        ``False`` (default) uses centroid-based cell inclusion via
        ``xvec.zonal_stats`` — matches rSDP / ``terra::extract`` behavior.
        ``True`` uses fractional-coverage weighting via ``exactextract``
        (requires ``pysdp[exact]``); recommended for small polygons
        relative to cell size. ``True`` path is a Phase 8a roadmap item and
        currently raises ``NotImplementedError``.
    all_cells : bool, default False
        If ``True``, return a long-form DataFrame of per-cell values and
        coverage fractions instead of per-polygon summary statistics. Phase
        8a roadmap item; currently raises ``NotImplementedError``.
    years, date_start, date_end : optional
        Time-series filters applied before summarization. Same semantics as
        in :func:`extract_points`.
    bind : bool, default True
        Merge input attribute columns onto output rows when ``True``.
    verbose : bool, default True
        Print progress messages.

    Returns
    -------
    geopandas.GeoDataFrame or pandas.DataFrame
        GeoDataFrame when ``bind=True``; DataFrame when ``bind=False``.
        Columns include the raster's data variables (one per summary stat).

    Raises
    ------
    TypeError
        If ``locations`` isn't a ``GeoDataFrame``.
    ValueError
        On missing CRS or other location validation failures.
    NotImplementedError
        For ``exact=True`` or ``all_cells=True`` (roadmap items).

    Examples
    --------
    Compute mean snow duration over watersheds for 2019:

    >>> import pysdp, geopandas as gpd
    >>> snow = pysdp.open_raster("R4D001", years=[2019])  # doctest: +SKIP
    >>> watersheds = gpd.read_file("watersheds.gpkg")  # doctest: +SKIP
    >>> out = pysdp.extract_polygons(snow, watersheds, stats="mean")  # doctest: +SKIP

    Compute multiple statistics in one call:

    >>> stats = pysdp.extract_polygons(
    ...     snow, watersheds, stats=["mean", "std", "min", "max"]
    ... )  # doctest: +SKIP

    See Also
    --------
    extract_points : Extract at point geometries.
    open_raster : Load a raster.
    """
    import geopandas as gpd

    if not isinstance(locations, gpd.GeoDataFrame):
        raise TypeError(
            f"extract_polygons requires a GeoDataFrame (got {type(locations).__name__}). "
            f"For point locations, use `pysdp.extract_points`."
        )

    raster = _filter_by_time(
        raster,
        years=years,
        date_start=date_start,
        date_end=date_end,
        verbose=verbose,
    )
    gdf = _align_to_raster_crs(locations, raster, verbose=verbose)

    _emit(
        f"Zonal extract at {len(gdf)} polygon(s) × {int(raster.sizes.get('time', 1))} layer(s).",
        verbose,
    )

    if all_cells:
        raise NotImplementedError(
            "all_cells=True (per-cell long-form output with fractions) is not yet "
            "implemented; tracked in ROADMAP §Phase 8a. Use `sum_fun='mean'` or "
            "another summary for now."
        )

    if exact:
        extracted = _zonal_stats_exact(raster, gdf, stats=stats)
    else:
        extracted = _zonal_stats_xvec(raster, gdf, stats=stats, all_touched=False)

    _emit("Extraction complete.", verbose)
    return _extracted_to_geodataframe(extracted, gdf, bind=bind)

Download

download

Bulk download of SDP datasets to local disk.

Ports rSDP's download_data(). See SPEC.md §4.4.

Primary backend: requests + concurrent.futures.ThreadPoolExecutor, using only core pysdp dependencies. Higher-throughput backends (obstore, fsspec + s3fs) are planned for Phase 7 when at-scale download performance becomes a hot path (ROADMAP §Phase 7); for the v0.1 use case — researchers pulling a handful of SDP products to local disk — the threaded-requests path is plenty fast.

download

download(
    urls: str | Sequence[str] | None = None,
    output_dir: str | PathLike[str] | None = None,
    *,
    catalog_ids: str | Sequence[str] | None = None,
    overwrite: bool = False,
    resume: bool = True,
    max_workers: int = 8,
    return_status: bool = True,
    verbose: bool = True,
) -> DataFrame | None

Download SDP COGs to a local directory.

Use this when you need the raster on disk (for tools that don't support cloud reads, for offline workflows, or for bulk mirroring). For interactive analysis, :func:open_raster lazy-reads from cloud without a download step.

Parameters:

Name Type Description Default
urls str or sequence of str

Direct HTTPS URL(s) to the COGs to download. Mutually exclusive with catalog_ids; exactly one is required.

None
output_dir str or PathLike

Destination directory. Created if it doesn't exist. Files are named from the URL basename.

None
catalog_ids str or sequence of str

Alternative to urls. pySDP expands each catalog_id via the packaged catalog:

  • Single → one URL
  • Yearly → one URL per catalog year
  • Monthly → one URL per catalog month between MinDate and MaxDate
  • Daily → raises ValueError (would be thousands of files; pass explicit urls= for selective daily slices)
None
overwrite bool

If False (default), skip destination files that already exist and are > 1 kB (matches rSDP's valid-file heuristic). If True, re-download even if a valid file is present.

False
resume bool

When a small partial file exists (< 1 kB), attempt an HTTP Range resume instead of re-downloading from scratch.

True
max_workers int

Number of concurrent HTTP fetches (via a ThreadPoolExecutor).

8
return_status bool

If True, return a DataFrame with one row per URL. If False, return None.

True
verbose bool

Print skip/download progress messages to stderr.

True

Returns:

Type Description
DataFrame or None

Status report with columns url, dest, success, status (HTTP code or "exists"), size (bytes), error (str or None). One row per URL, including pre-existing files that were skipped.

Raises:

Type Description
ValueError

If both urls and catalog_ids are given, or neither, or if output_dir is missing, or if a Daily catalog_id is passed.

KeyError

If a catalog_id isn't in the packaged catalog.

Warns:

Type Description
UserWarning

If any downloads failed; details are in the returned DataFrame's error column.

Examples:

Download two real SDP products by catalog ID:

>>> import pysdp
>>> status = pysdp.download(
...     catalog_ids=["R1D001", "R3D009"],
...     output_dir="~/sdp-data",
... )
>>> status[["dest", "success", "size"]]

Hand-pick a subset of daily temperature slices:

>>> urls = [
...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0305_est.tif",
...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0306_est.tif",
... ]
>>> pysdp.download(urls=urls, output_dir="~/tmax-samples")
See Also

open_raster : Open an SDP raster lazily, without downloading.

Source code in src/pysdp/download.py
def download(
    urls: str | Sequence[str] | None = None,
    output_dir: str | os.PathLike[str] | None = None,
    *,
    catalog_ids: str | Sequence[str] | None = None,
    overwrite: bool = False,
    resume: bool = True,
    max_workers: int = 8,
    return_status: bool = True,
    verbose: bool = True,
) -> pd.DataFrame | None:
    """Download SDP COGs to a local directory.

    Use this when you need the raster on disk (for tools that don't support
    cloud reads, for offline workflows, or for bulk mirroring). For
    interactive analysis, :func:`open_raster` lazy-reads from cloud without
    a download step.

    Parameters
    ----------
    urls : str or sequence of str, optional
        Direct HTTPS URL(s) to the COGs to download. Mutually exclusive with
        ``catalog_ids``; exactly one is required.
    output_dir : str or PathLike
        Destination directory. Created if it doesn't exist. Files are named
        from the URL basename.
    catalog_ids : str or sequence of str, optional
        Alternative to ``urls``. pySDP expands each catalog_id via the
        packaged catalog:

        - ``Single`` → one URL
        - ``Yearly`` → one URL per catalog year
        - ``Monthly`` → one URL per catalog month between MinDate and MaxDate
        - ``Daily`` → raises ``ValueError`` (would be thousands of files;
          pass explicit ``urls=`` for selective daily slices)
    overwrite : bool, default False
        If ``False`` (default), skip destination files that already exist
        and are > 1 kB (matches rSDP's valid-file heuristic). If ``True``,
        re-download even if a valid file is present.
    resume : bool, default True
        When a small partial file exists (< 1 kB), attempt an HTTP Range
        resume instead of re-downloading from scratch.
    max_workers : int, default 8
        Number of concurrent HTTP fetches (via a ``ThreadPoolExecutor``).
    return_status : bool, default True
        If ``True``, return a DataFrame with one row per URL. If ``False``,
        return ``None``.
    verbose : bool, default True
        Print skip/download progress messages to stderr.

    Returns
    -------
    pandas.DataFrame or None
        Status report with columns ``url``, ``dest``, ``success``,
        ``status`` (HTTP code or ``"exists"``), ``size`` (bytes),
        ``error`` (str or None). One row per URL, including pre-existing
        files that were skipped.

    Raises
    ------
    ValueError
        If both ``urls`` and ``catalog_ids`` are given, or neither, or if
        ``output_dir`` is missing, or if a ``Daily`` catalog_id is passed.
    KeyError
        If a catalog_id isn't in the packaged catalog.

    Warns
    -----
    UserWarning
        If any downloads failed; details are in the returned DataFrame's
        ``error`` column.

    Examples
    --------
    Download two real SDP products by catalog ID:

    >>> import pysdp
    >>> status = pysdp.download(
    ...     catalog_ids=["R1D001", "R3D009"],
    ...     output_dir="~/sdp-data",
    ... )  # doctest: +SKIP
    >>> status[["dest", "success", "size"]]  # doctest: +SKIP

    Hand-pick a subset of daily temperature slices:

    >>> urls = [
    ...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0305_est.tif",
    ...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0306_est.tif",
    ... ]
    >>> pysdp.download(urls=urls, output_dir="~/tmax-samples")  # doctest: +SKIP

    See Also
    --------
    open_raster : Open an SDP raster lazily, without downloading.
    """
    import pandas as pd

    if urls is not None and catalog_ids is not None:
        raise ValueError("Specify `urls` OR `catalog_ids`, not both.")
    if urls is None and catalog_ids is None:
        raise ValueError("You must specify `urls` or `catalog_ids`.")
    if output_dir is None:
        raise ValueError("`output_dir` is required.")

    output_path = Path(output_dir).expanduser()
    output_path.mkdir(parents=True, exist_ok=True)

    if catalog_ids is not None:
        url_list = _expand_catalog_ids(catalog_ids)
    elif isinstance(urls, str):
        url_list = [urls]
    else:
        url_list = list(urls) if urls is not None else []

    dest_paths = [output_path / Path(u).name for u in url_list]

    existing: list[dict[str, Any]] = []
    to_download_urls: list[str] = []
    to_download_dests: list[Path] = []
    for url, dest in zip(url_list, dest_paths, strict=True):
        if _is_valid_existing(dest) and not overwrite:
            existing.append(
                {
                    "url": url,
                    "dest": str(dest),
                    "success": True,
                    "status": "exists",
                    "size": dest.stat().st_size,
                    "error": None,
                }
            )
        else:
            to_download_urls.append(url)
            to_download_dests.append(dest)

    if existing:
        _emit(
            f"Skipping {len(existing)} existing file(s). Specify `overwrite=True` to re-download.",
            verbose,
        )
    if to_download_urls:
        _emit(
            f"Downloading {len(to_download_urls)} file(s) to {output_path}...",
            verbose,
        )

    download_results = _download_parallel(
        to_download_urls,
        to_download_dests,
        max_workers=max_workers,
        resume=resume,
    )

    failures = [r for r in download_results if not r["success"]]
    if failures:
        warnings.warn(
            f"Downloaded {len(download_results) - len(failures)} / "
            f"{len(download_results)} file(s) successfully; "
            f"{len(failures)} failed (see returned DataFrame for details).",
            UserWarning,
            stacklevel=2,
        )

    _emit("Download complete.", verbose)

    all_results = existing + download_results
    if return_status:
        return pd.DataFrame(all_results)
    return None

Constants

constants

Package constants for pysdp.

Values mirror the internal constants in rSDP's R/constants.R.

SDP_CRS module-attribute

SDP_CRS: Final[str] = 'EPSG:32613'

Coordinate reference system for all SDP raster products (UTM zone 13N).

DOMAINS module-attribute

DOMAINS: Final[tuple[str, ...]] = (
    "UG",
    "UER",
    "GT",
    "GMUG",
)

Spatial domains available in the SDP.

TYPES module-attribute

TYPES: Final[tuple[str, ...]] = (
    "Mask",
    "Topo",
    "Vegetation",
    "Hydro",
    "Planning",
    "Radiation",
    "Snow",
    "Climate",
    "Imagery",
    "Supplemental",
)

Dataset type categories.

RELEASES module-attribute

RELEASES: Final[tuple[str, ...]] = (
    "Basemaps",
    "Release1",
    "Release2",
    "Release3",
    "Release4",
    "Release5",
)

Dataset release cohorts.

TIMESERIES_TYPES module-attribute

TIMESERIES_TYPES: Final[tuple[str, ...]] = (
    "Single",
    "Yearly",
    "Seasonal",
    "Monthly",
    "Daily",
)

Time-series structure types for SDP datasets.