API reference¶

All of pySDP's public surface. Import everything from the top-level package:

import pysdp

pysdp.get_catalog(...)
pysdp.open_raster(...)
# ...

Catalog discovery¶

get_catalog ¶

get_catalog(
    domains: Sequence[str] | None = None,
    types: Sequence[str] | None = None,
    releases: Sequence[str] | None = None,
    timeseries_types: Sequence[str] | None = None,
    deprecated: bool | None = False,
    *,
    source: Literal[
        "packaged", "live", "stac"
    ] = "packaged",
) -> DataFrame | Catalog

Discover SDP datasets by filtering the product catalog.

pySDP ships with a snapshot of the SDP product catalog baked in; filtering is instantaneous and works offline. source="live" refetches the canonical CSV from S3 (useful when the packaged snapshot lags a recent catalog update). source="stac" returns the SDP's static STAC v1 catalog as a pystac.Catalog, which composes with the broader STAC ecosystem.

Parameters:

Name	Type	Description	Default
`domains`	`sequence of str`	Spatial domains to include (`"UG"`, `"UER"`, `"GT"`, `"GMUG"`). See `pysdp.DOMAINS` for the canonical list. `None` (default) returns every domain.	`None`
`types`	`sequence of str`	Dataset type categories (e.g., `"Vegetation"`, `"Topo"`, `"Climate"`, `"Snow"`). See `pysdp.TYPES`. `None` returns all types.	`None`
`releases`	`sequence of str`	Dataset release cohorts (`"Release1"`..`"Release5"`, `"Basemaps"`). See `pysdp.RELEASES`. `None` returns all.	`None`
`timeseries_types`	`sequence of str`	One or more of `"Single"`, `"Yearly"`, `"Monthly"`, `"Daily"`, `"Seasonal"`. See `pysdp.TIMESERIES_TYPES`. `None` returns all.	`None`
`deprecated`	`bool or None`	`False` returns only current datasets. `True` returns only deprecated ones. `None` returns both.	`False`
`source`	`('packaged', 'live', 'stac')`	Where to pull the catalog from. See Notes.	`"packaged"`

Returns:

Type Description

DataFrame or Catalog

For CSV-backed sources, a DataFrame with one row per dataset and columns matching the SDP product-table schema (CatalogID, Release, Type, Product, Domain, Resolution, Deprecated, MinDate, MaxDate, MinYear, MaxYear, TimeSeriesType, DataType, DataUnit, DataScaleFactor, DataOffset, Data.URL, Metadata.URL). For source="stac", a pystac.Catalog rooted at the SDP's static STAC v1 catalog.

Raises:

Type	Description
`ValueError`	If any filter argument contains a value outside its canonical vocabulary, or if `source` isn't one of the three valid options.

Warns:

Type	Description
`UserWarning`	When `source="packaged"` and the packaged snapshot is older than `SDP_STALENESS_MONTHS` months (default 6; env-configurable). The warning suggests `source="live"` or a pysdp upgrade.

Notes

The packaged CSV is refreshed on each pysdp release. source="live" hits the S3-hosted canonical CSV directly, so it's always as fresh as upstream. source="stac" ignores filter arguments — use pystac traversal to filter the returned catalog. The catalog is browsable at radiantearth's STAC Browser <https://radiantearth.github.io/stac-browser/#/external/rmbl-sdp.s3.us-east-2.amazonaws.com/stac/v1/catalog.json>_.

Examples:

Get every current dataset:

>>> import pysdp
>>> cat = pysdp.get_catalog()
>>> cat.shape
(140, 18)

Filter to Upper Gunnison vegetation products:

>>> veg = pysdp.get_catalog(domains=["UG"], types=["Vegetation"])

Find all yearly time-series products across every domain:

>>> yearly = pysdp.get_catalog(timeseries_types=["Yearly"])

Return both current and deprecated entries:

>>> all_rows = pysdp.get_catalog(deprecated=None)

See Also

get_metadata : Fetch detailed XML metadata for one dataset. open_raster : Open a catalog entry as a lazy xarray.Dataset.

Source code in src/pysdp/catalog.py

def get_catalog(
    domains: Sequence[str] | None = None,
    types: Sequence[str] | None = None,
    releases: Sequence[str] | None = None,
    timeseries_types: Sequence[str] | None = None,
    deprecated: bool | None = False,
    *,
    source: Literal["packaged", "live", "stac"] = "packaged",
) -> pd.DataFrame | pystac.Catalog:
    """Discover SDP datasets by filtering the product catalog.

    pySDP ships with a snapshot of the SDP product catalog baked in; filtering
    is instantaneous and works offline. ``source="live"`` refetches the
    canonical CSV from S3 (useful when the packaged snapshot lags a recent
    catalog update). ``source="stac"`` returns the SDP's static STAC v1
    catalog as a ``pystac.Catalog``, which composes with the broader STAC
    ecosystem.

    Parameters
    ----------
    domains : sequence of str, optional
        Spatial domains to include (``"UG"``, ``"UER"``, ``"GT"``, ``"GMUG"``).
        See ``pysdp.DOMAINS`` for the canonical list. ``None`` (default)
        returns every domain.
    types : sequence of str, optional
        Dataset type categories (e.g., ``"Vegetation"``, ``"Topo"``,
        ``"Climate"``, ``"Snow"``). See ``pysdp.TYPES``. ``None`` returns
        all types.
    releases : sequence of str, optional
        Dataset release cohorts (``"Release1"``..``"Release5"``,
        ``"Basemaps"``). See ``pysdp.RELEASES``. ``None`` returns all.
    timeseries_types : sequence of str, optional
        One or more of ``"Single"``, ``"Yearly"``, ``"Monthly"``,
        ``"Daily"``, ``"Seasonal"``. See ``pysdp.TIMESERIES_TYPES``.
        ``None`` returns all.
    deprecated : bool or None, default False
        ``False`` returns only current datasets. ``True`` returns only
        deprecated ones. ``None`` returns both.
    source : {"packaged", "live", "stac"}, default "packaged"
        Where to pull the catalog from. See Notes.

    Returns
    -------
    pandas.DataFrame or pystac.Catalog
        For CSV-backed sources, a DataFrame with one row per dataset and
        columns matching the SDP product-table schema (``CatalogID``,
        ``Release``, ``Type``, ``Product``, ``Domain``, ``Resolution``,
        ``Deprecated``, ``MinDate``, ``MaxDate``, ``MinYear``, ``MaxYear``,
        ``TimeSeriesType``, ``DataType``, ``DataUnit``,
        ``DataScaleFactor``, ``DataOffset``, ``Data.URL``, ``Metadata.URL``).
        For ``source="stac"``, a ``pystac.Catalog`` rooted at the SDP's
        static STAC v1 catalog.

    Raises
    ------
    ValueError
        If any filter argument contains a value outside its canonical
        vocabulary, or if ``source`` isn't one of the three valid options.

    Warns
    -----
    UserWarning
        When ``source="packaged"`` and the packaged snapshot is older than
        ``SDP_STALENESS_MONTHS`` months (default 6; env-configurable). The
        warning suggests ``source="live"`` or a pysdp upgrade.

    Notes
    -----
    The packaged CSV is refreshed on each pysdp release. ``source="live"``
    hits the S3-hosted canonical CSV directly, so it's always as fresh as
    upstream. ``source="stac"`` ignores filter arguments — use pystac
    traversal to filter the returned catalog. The catalog is browsable at
    `radiantearth's STAC Browser
    <https://radiantearth.github.io/stac-browser/#/external/rmbl-sdp.s3.us-east-2.amazonaws.com/stac/v1/catalog.json>`_.

    Examples
    --------
    Get every current dataset:

    >>> import pysdp
    >>> cat = pysdp.get_catalog()  # doctest: +SKIP
    >>> cat.shape  # doctest: +SKIP
    (140, 18)

    Filter to Upper Gunnison vegetation products:

    >>> veg = pysdp.get_catalog(domains=["UG"], types=["Vegetation"])  # doctest: +SKIP

    Find all yearly time-series products across every domain:

    >>> yearly = pysdp.get_catalog(timeseries_types=["Yearly"])  # doctest: +SKIP

    Return both current and deprecated entries:

    >>> all_rows = pysdp.get_catalog(deprecated=None)  # doctest: +SKIP

    See Also
    --------
    get_metadata : Fetch detailed XML metadata for one dataset.
    open_raster : Open a catalog entry as a lazy ``xarray.Dataset``.
    """
    if source == "stac":
        from pysdp.stac import get_stac_catalog

        if any(v is not None for v in (domains, types, releases, timeseries_types)):
            warnings.warn(
                "Filter arguments are ignored when source='stac'. "
                "Use pystac traversal to filter the returned catalog.",
                UserWarning,
                stacklevel=2,
            )
        return get_stac_catalog()

    _validate_filter(domains, DOMAINS, "domains")
    _validate_filter(types, TYPES, "types")
    _validate_filter(releases, RELEASES, "releases")
    _validate_filter(timeseries_types, TIMESERIES_TYPES, "timeseries_types")

    if source == "packaged":
        df = load_packaged_catalog()
    elif source == "live":
        df = load_live_catalog()
    else:
        raise ValueError(f"Unknown source: {source!r}. Must be 'packaged', 'live', or 'stac'.")

    return _apply_filters(
        df,
        domains=domains,
        types=types,
        releases=releases,
        timeseries_types=timeseries_types,
        deprecated=deprecated,
    )

get_metadata ¶

get_metadata(
    catalog_id: str, *, as_dict: bool = True
) -> dict[str, Any] | Any

Fetch the QGIS-style XML metadata for one SDP dataset.

Each SDP product has a companion metadata XML document on S3 that describes provenance, sensor details, processing history, and other long-form context. This function fetches that XML over HTTP and parses it.

Parameters:

Name	Type	Description	Default
`catalog_id`	`str`	Six-character SDP catalog ID (e.g., `"R3D009"`, `"BM012"`).	required
`as_dict`	`bool`	If `True`, return a nested `dict` parsed via `xmltodict` — convenient for scripting. If `False`, return the parsed `lxml.etree._Element` — better for XPath queries.	`True`

Returns:

Type	Description
`dict or _Element`	Parsed metadata. For dict output, the top-level key is typically `"qgis"` (reflecting the document's QGIS metadata schema).

Raises:

Type	Description
`ValueError`	If `catalog_id` isn't exactly six characters.
`KeyError`	If `catalog_id` isn't in the packaged catalog. The message includes the snapshot date and suggests `source="live"` or an upgrade.
`HTTPError`	If the XML URL returns a non-2xx status (rare; implies an upstream data-hosting issue).

Examples:

Get the metadata for the UG 3 m bare-earth DEM as a dict:

>>> import pysdp
>>> meta = pysdp.get_metadata("R3D009")
>>> meta["qgis"]["abstract"]
'This 3 m resolution digital elevation model...'

See Also

get_catalog : Discover catalog IDs by filtering. open_raster : Open a catalog entry as a raster.

Source code in src/pysdp/catalog.py

def get_metadata(
    catalog_id: str,
    *,
    as_dict: bool = True,
) -> dict[str, Any] | Any:
    """Fetch the QGIS-style XML metadata for one SDP dataset.

    Each SDP product has a companion metadata XML document on S3 that
    describes provenance, sensor details, processing history, and other
    long-form context. This function fetches that XML over HTTP and parses
    it.

    Parameters
    ----------
    catalog_id : str
        Six-character SDP catalog ID (e.g., ``"R3D009"``, ``"BM012"``).
    as_dict : bool, default True
        If ``True``, return a nested ``dict`` parsed via ``xmltodict`` —
        convenient for scripting. If ``False``, return the parsed
        ``lxml.etree._Element`` — better for XPath queries.

    Returns
    -------
    dict or lxml.etree._Element
        Parsed metadata. For dict output, the top-level key is typically
        ``"qgis"`` (reflecting the document's QGIS metadata schema).

    Raises
    ------
    ValueError
        If ``catalog_id`` isn't exactly six characters.
    KeyError
        If ``catalog_id`` isn't in the packaged catalog. The message
        includes the snapshot date and suggests ``source="live"`` or an upgrade.
    requests.HTTPError
        If the XML URL returns a non-2xx status (rare; implies an
        upstream data-hosting issue).

    Examples
    --------
    Get the metadata for the UG 3 m bare-earth DEM as a dict:

    >>> import pysdp
    >>> meta = pysdp.get_metadata("R3D009")  # doctest: +SKIP
    >>> meta["qgis"]["abstract"]  # doctest: +SKIP
    'This 3 m resolution digital elevation model...'

    See Also
    --------
    get_catalog : Discover catalog IDs by filtering.
    open_raster : Open a catalog entry as a raster.
    """
    import requests

    row = lookup_catalog_row(catalog_id)
    resp = requests.get(row["Metadata.URL"], timeout=30)
    resp.raise_for_status()

    if as_dict:
        import xmltodict

        return xmltodict.parse(resp.content)
    from lxml import etree

    return etree.fromstring(resp.content)

Raster access¶

open_raster ¶

open_raster(
    catalog_id: str | None = None,
    url: str | None = None,
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    chunks: dict[str, int]
    | Literal["auto"]
    | None = "auto",
    download: bool = False,
    download_path: str | PathLike[str] | None = None,
    overwrite: bool = False,
    verbose: bool = True,
) -> Dataset

Open an SDP raster as a lazy xarray.Dataset.

Reads cloud-optimized GeoTIFFs from S3 via GDAL's VSICURL, without downloading. Returns a Dataset with one data variable named after the product's canonical short name (e.g. "UG_dem_3m_v1"). CRS is always set to EPSG:32613 (UTM 13N). For time-series products, the Dataset gains a uniform pandas.DatetimeIndex on the time coordinate — Daily → actual date, Monthly → first-of-month, Yearly → Jan 1.

Parameters:

Name	Type	Description	Default
`catalog_id`	`str`	Six-character SDP catalog ID (e.g., `"R3D009"`). Mutually exclusive with `url`. When given, scale/offset metadata from the catalog are attached as CF attrs on the data variable.	`None`
`url`	`str`	Direct HTTPS URL to an SDP COG. Mutually exclusive with `catalog_id`. No catalog lookup, so scale/offset attrs come from the COG header only.	`None`
`years`	`sequence of int`	For Yearly products, which years to load. Alternative to `date_start`/`date_end`.	`None`
`months`	`sequence of int`	For Monthly products, which months (1–12) to load. Must be combined with `years` or used alone (all years × months requested).	`None`
`date_start`	`str or date`	Date range to load (inclusive). For Daily, defines the time slice; for Monthly/Yearly, uses rSDP's anchor-day stepping semantics. When neither is given on a Daily product, the first 30 days from `MinDate` are loaded to avoid accidental 10-year VSICURL handle explosions (matches rSDP).	`None`
`date_end`	`str or date`	Date range to load (inclusive). For Daily, defines the time slice; for Monthly/Yearly, uses rSDP's anchor-day stepping semantics. When neither is given on a Daily product, the first 30 days from `MinDate` are loaded to avoid accidental 10-year VSICURL handle explosions (matches rSDP).	`None`
`chunks`	`dict, "auto", or None`	Dask chunking. `"auto"` uses xarray's chunk inference (requires `pysdp[dask]`; falls back to eager reads with a warning if dask isn't installed). `None` eager-loads. Pass a dict for manual control (e.g., `{"x": 1024, "y": 1024}`).	`"auto"`
`download`	`bool`	Not yet implemented (Phase 5). For now, raises `NotImplementedError`. Use `pysdp.download()` to bulk-fetch COGs to disk, then open them with `rioxarray.open_rasterio`.	`False`
`download_path`	`str or PathLike`	Directory for downloaded files (only used when `download=True`).	`None`
`overwrite`	`bool`	Reserved for the download path (not yet implemented).	`False`
`verbose`	`bool`	If `True`, print progress messages (layer count etc.) to stderr.	`True`

Returns:

Type Description

Dataset

Dataset with one data variable. Dimensions depend on the product:

(y, x) for single-band Single products
(band, y, x) for multi-band Single products
(time, y, x) for Yearly/Monthly/Daily time-series (where time is a pandas.DatetimeIndex)

CRS is EPSG:32613 written via rio.write_crs. Catalog-derived scale/offset metadata is attached to the variable as CF scale_factor and add_offset attrs; call ds.decode_cf() or open with mask_and_scale=True to materialize real values.

Raises:

Type	Description
`ValueError`	On invalid `catalog_id` / `url` combinations, time-arg combinations inconsistent with the product's `TimeSeriesType`, or a `url` that doesn't start with `https://`.
`KeyError`	If `catalog_id` isn't in the packaged catalog.
`NotImplementedError`	If `download=True` (Phase 5 stub).

Examples:

Open the UG 3 m bare-earth DEM (Single product):

>>> import pysdp
>>> dem = pysdp.open_raster("R3D009")
>>> dem.rio.crs.to_epsg()
32613

Open three days of daily Tmax:

>>> tmax = pysdp.open_raster(
...     "R4D004",
...     date_start="2021-11-02",
...     date_end="2021-11-04",
... )
>>> tmax.sizes["time"]
3

Open a single year of annual snow persistence:

>>> snow = pysdp.open_raster("R4D001", years=[2019])

See Also

open_stack : Load multiple products as variables in one Dataset. extract_points : Sample an opened raster at point locations. extract_polygons : Summarize an opened raster over polygons.

Source code in src/pysdp/raster.py

def open_raster(
    catalog_id: str | None = None,
    url: str | None = None,
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    chunks: dict[str, int] | Literal["auto"] | None = "auto",
    download: bool = False,
    download_path: str | os.PathLike[str] | None = None,
    overwrite: bool = False,
    verbose: bool = True,
) -> xr.Dataset:
    """Open an SDP raster as a lazy ``xarray.Dataset``.

    Reads cloud-optimized GeoTIFFs from S3 via GDAL's VSICURL, without
    downloading. Returns a Dataset with one data variable named after the
    product's canonical short name (e.g. ``"UG_dem_3m_v1"``). CRS is always
    set to ``EPSG:32613`` (UTM 13N). For time-series products, the Dataset
    gains a uniform ``pandas.DatetimeIndex`` on the ``time`` coordinate —
    Daily → actual date, Monthly → first-of-month, Yearly → Jan 1.

    Parameters
    ----------
    catalog_id : str, optional
        Six-character SDP catalog ID (e.g., ``"R3D009"``). Mutually
        exclusive with ``url``. When given, scale/offset metadata from the
        catalog are attached as CF attrs on the data variable.
    url : str, optional
        Direct HTTPS URL to an SDP COG. Mutually exclusive with
        ``catalog_id``. No catalog lookup, so scale/offset attrs come from
        the COG header only.
    years : sequence of int, optional
        For Yearly products, which years to load. Alternative to
        ``date_start``/``date_end``.
    months : sequence of int, optional
        For Monthly products, which months (1–12) to load. Must be combined
        with ``years`` or used alone (all years × months requested).
    date_start, date_end : str or datetime.date, optional
        Date range to load (inclusive). For Daily, defines the time slice;
        for Monthly/Yearly, uses rSDP's anchor-day stepping semantics. When
        neither is given on a Daily product, the first 30 days from
        ``MinDate`` are loaded to avoid accidental 10-year VSICURL handle
        explosions (matches rSDP).
    chunks : dict, "auto", or None, default "auto"
        Dask chunking. ``"auto"`` uses xarray's chunk inference (requires
        ``pysdp[dask]``; falls back to eager reads with a warning if dask
        isn't installed). ``None`` eager-loads. Pass a dict for manual
        control (e.g., ``{"x": 1024, "y": 1024}``).
    download : bool, default False
        **Not yet implemented (Phase 5).** For now, raises
        ``NotImplementedError``. Use ``pysdp.download()`` to bulk-fetch
        COGs to disk, then open them with ``rioxarray.open_rasterio``.
    download_path : str or PathLike, optional
        Directory for downloaded files (only used when ``download=True``).
    overwrite : bool, default False
        Reserved for the download path (not yet implemented).
    verbose : bool, default True
        If ``True``, print progress messages (layer count etc.) to stderr.

    Returns
    -------
    xarray.Dataset
        Dataset with one data variable. Dimensions depend on the product:

        - ``(y, x)`` for single-band ``Single`` products
        - ``(band, y, x)`` for multi-band ``Single`` products
        - ``(time, y, x)`` for ``Yearly``/``Monthly``/``Daily`` time-series
          (where ``time`` is a ``pandas.DatetimeIndex``)

        CRS is ``EPSG:32613`` written via ``rio.write_crs``. Catalog-derived
        scale/offset metadata is attached to the variable as CF
        ``scale_factor`` and ``add_offset`` attrs; call
        ``ds.decode_cf()`` or open with ``mask_and_scale=True`` to
        materialize real values.

    Raises
    ------
    ValueError
        On invalid ``catalog_id`` / ``url`` combinations, time-arg
        combinations inconsistent with the product's ``TimeSeriesType``, or
        a ``url`` that doesn't start with ``https://``.
    KeyError
        If ``catalog_id`` isn't in the packaged catalog.
    NotImplementedError
        If ``download=True`` (Phase 5 stub).

    Examples
    --------
    Open the UG 3 m bare-earth DEM (``Single`` product):

    >>> import pysdp
    >>> dem = pysdp.open_raster("R3D009")  # doctest: +SKIP
    >>> dem.rio.crs.to_epsg()  # doctest: +SKIP
    32613

    Open three days of daily Tmax:

    >>> tmax = pysdp.open_raster(
    ...     "R4D004",
    ...     date_start="2021-11-02",
    ...     date_end="2021-11-04",
    ... )  # doctest: +SKIP
    >>> tmax.sizes["time"]  # doctest: +SKIP
    3

    Open a single year of annual snow persistence:

    >>> snow = pysdp.open_raster("R4D001", years=[2019])  # doctest: +SKIP

    See Also
    --------
    open_stack : Load multiple products as variables in one Dataset.
    extract_points : Sample an opened raster at point locations.
    extract_polygons : Summarize an opened raster over polygons.
    """
    ensure_gdal_defaults()
    normalized = validate_user_args(
        catalog_id=catalog_id,
        url=url,
        years=years,
        months=months,
        date_start=date_start,
        date_end=date_end,
        download_files=download,
        download_path=download_path,
    )
    months_pad = normalized["months_pad"]

    if download:
        raise NotImplementedError(
            "download=True is implemented in Phase 5; see `pysdp.download()` "
            "for the bulk-download path. Phase 3 supports lazy cloud reads only."
        )

    if catalog_id is not None:
        # `pd.Series` quacks as `Mapping[str, Any]` for our purposes; convert
        # so the resolver + dataset builder can be typed against a simple
        # Mapping and also accept plain-dict fixtures in unit tests.
        cat_line: dict[str, Any] = dict(lookup_catalog_row(catalog_id))
        ts_type = str(cat_line["TimeSeriesType"])
        validate_args_vs_type(
            ts_type,
            years=years,
            months=months,
            date_start=date_start,
            date_end=date_end,
        )
        slices = resolve_time_slices(
            cat_line,
            years=years,
            months_pad=months_pad,
            date_start=date_start,
            date_end=date_end,
            verbose=verbose,
        )
        return _build_dataset(slices, cat_line=cat_line, url=None, chunks=chunks)

    # url= branch: single-layer only. Scale/offset are skipped (no catalog row).
    assert url is not None
    if not url.startswith("https://"):
        raise ValueError("A valid URL must start with 'https://'.")
    slices = TimeSlices(paths=[VSICURL_PREFIX + url], names=[])
    return _build_dataset(slices, cat_line=None, url=url, chunks=chunks)

open_stack ¶

open_stack(
    catalog_ids: Sequence[str],
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    chunks: dict[str, int]
    | Literal["auto"]
    | None = "auto",
    align: Literal["exact", "reproject"] = "exact",
    verbose: bool = True,
) -> Dataset

Load multiple SDP products into a single xarray.Dataset.

Each product becomes one data variable. x/y (and time where applicable) coordinates are shared across variables, so downstream analysis can treat the stack as a single object (ds["dem"] - ds["snow_persistence"].mean("time") etc.). Use this when you want to compose products that are already on the same grid — for example an elevation model and a slope raster both derived from the same LiDAR campaign.

Parameters:

Name	Type	Description	Default
`catalog_ids`	`sequence of str`	Non-empty sequence of SDP catalog IDs.	required
`years`	`optional`	Shared time-slicing args. Applied to every time-series product in the stack; ignored for `Single` products.	`None`
`months`	`optional`	Shared time-slicing args. Applied to every time-series product in the stack; ignored for `Single` products.	`None`
`date_start`	`optional`	Shared time-slicing args. Applied to every time-series product in the stack; ignored for `Single` products.	`None`
`date_end`	`optional`	Shared time-slicing args. Applied to every time-series product in the stack; ignored for `Single` products.	`None`
`chunks`	`dict, "auto", or None`	Dask chunking, passed through to each `open_raster` call.	`"auto"`
`align`	`('exact', 'reproject')`	`"exact"` requires all products to share CRS + transform + shape; raises `ValueError` on mismatch with a descriptive list of which products diverged. `"reproject"` reprojects to the first product's grid via `odc-stac` (planned for Phase 7 of the ROADMAP; currently raises `NotImplementedError`).	`"exact"`
`verbose`	`bool`	Forwarded to `open_raster` for per-product progress messages.	`True`

Returns:

Type	Description
`Dataset`	One data variable per catalog_id. See :func:`open_raster` for per-variable shape and CRS.

Raises:

Type	Description
`ValueError`	If `catalog_ids` is empty, if `align` isn't one of the two valid values, or (with `align="exact"`) if the products don't share a common grid.
`NotImplementedError`	If `align="reproject"` (Phase 7 future work).

Examples:

Stack the UG 3 m DEM with the matching slope and aspect rasters:

>>> import pysdp
>>> topo = pysdp.open_stack(["R3D009", "R3D012", "R3D010"])
>>> sorted(topo.data_vars)
['UG_dem_3m_v1', 'UG_dem_slope_1m_v1', 'UG_topographic_aspect_southness_1m_v1']

See Also

open_raster : Single-product load.

Source code in src/pysdp/raster.py

def open_stack(
    catalog_ids: Sequence[str],
    *,
    years: Sequence[int] | None = None,
    months: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    chunks: dict[str, int] | Literal["auto"] | None = "auto",
    align: Literal["exact", "reproject"] = "exact",
    verbose: bool = True,
) -> xr.Dataset:
    """Load multiple SDP products into a single ``xarray.Dataset``.

    Each product becomes one data variable. ``x``/``y`` (and ``time`` where
    applicable) coordinates are shared across variables, so downstream
    analysis can treat the stack as a single object (``ds["dem"] -
    ds["snow_persistence"].mean("time")`` etc.). Use this when you want to
    compose products that are already on the same grid — for example an
    elevation model and a slope raster both derived from the same LiDAR
    campaign.

    Parameters
    ----------
    catalog_ids : sequence of str
        Non-empty sequence of SDP catalog IDs.
    years, months, date_start, date_end : optional
        Shared time-slicing args. Applied to every time-series product in
        the stack; ignored for ``Single`` products.
    chunks : dict, "auto", or None, default "auto"
        Dask chunking, passed through to each ``open_raster`` call.
    align : {"exact", "reproject"}, default "exact"
        ``"exact"`` requires all products to share CRS + transform +
        shape; raises ``ValueError`` on mismatch with a descriptive list of
        which products diverged. ``"reproject"`` reprojects to the first
        product's grid via ``odc-stac`` (planned for Phase 7 of the
        ROADMAP; currently raises ``NotImplementedError``).
    verbose : bool, default True
        Forwarded to ``open_raster`` for per-product progress messages.

    Returns
    -------
    xarray.Dataset
        One data variable per catalog_id. See
        :func:`open_raster` for per-variable shape and CRS.

    Raises
    ------
    ValueError
        If ``catalog_ids`` is empty, if ``align`` isn't one of the two
        valid values, or (with ``align="exact"``) if the products don't
        share a common grid.
    NotImplementedError
        If ``align="reproject"`` (Phase 7 future work).

    Examples
    --------
    Stack the UG 3 m DEM with the matching slope and aspect rasters:

    >>> import pysdp
    >>> topo = pysdp.open_stack(["R3D009", "R3D012", "R3D010"])  # doctest: +SKIP
    >>> sorted(topo.data_vars)  # doctest: +SKIP
    ['UG_dem_3m_v1', 'UG_dem_slope_1m_v1', 'UG_topographic_aspect_southness_1m_v1']

    See Also
    --------
    open_raster : Single-product load.
    """
    if not catalog_ids:
        raise ValueError("catalog_ids must be a non-empty sequence.")
    if align == "reproject":
        raise NotImplementedError(
            "align='reproject' is implemented in Phase 7 (requires `pip install "
            "pysdp[stac]`). For now, pass `align='exact'` and load products "
            "that share a grid, or reproject explicitly with rioxarray."
        )
    if align != "exact":
        raise ValueError(f"Unknown align: {align!r}. Must be 'exact' or 'reproject'.")

    datasets = [
        open_raster(
            cid,
            years=years,
            months=months,
            date_start=date_start,
            date_end=date_end,
            chunks=chunks,
            verbose=verbose,
        )
        for cid in catalog_ids
    ]
    _verify_exact_alignment(datasets, catalog_ids=list(catalog_ids))
    return xr.merge(datasets, compat="equals", join="exact")

Extraction¶

extract_points ¶

extract_points(
    raster: Dataset | DataArray,
    locations: GeoDataFrame | DataFrame,
    *,
    x: str = "x",
    y: str = "y",
    crs: str | None = None,
    method: Literal["nearest", "linear"] = "linear",
    years: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> GeoDataFrame

Extract raster values at point locations.

Accepts an xarray.Dataset or DataArray (typically from :func:open_raster / :func:open_stack) and a GeoDataFrame or plain DataFrame with x/y columns. Reprojects the input locations to the raster CRS if they differ.

Parameters:

Name	Type	Description	Default
`raster`	`Dataset or DataArray`	The raster to sample from. Must have `x` and `y` spatial coordinates and a CRS set (via `rio.write_crs`). Time-series rasters (with a `time` dim) produce long-form output.	required
`locations`	`GeoDataFrame or DataFrame`	Points to sample. If a plain `DataFrame`, pass the column names via `x=`/`y=` and an explicit `crs=`. If a `GeoDataFrame`, its geometry column is used and its CRS must be set.	required
`x`	`str`	Column names holding longitude/x and latitude/y for `DataFrame` inputs. Ignored for `GeoDataFrame` inputs.	`"x", "y"`
`y`	`str`	Column names holding longitude/x and latitude/y for `DataFrame` inputs. Ignored for `GeoDataFrame` inputs.	`"x", "y"`
`crs`	`str`	CRS of the input locations (e.g., `"EPSG:4326"`). Required when `locations` is a plain `DataFrame`; inferred from `locations.crs` for `GeoDataFrame` inputs.	`None`
`method`	`('nearest', 'linear')`	Interpolation method. `"linear"` is bilinear via `xarray.interp` (requires `scipy`, a core pysdp dep). `"nearest"` snaps to the nearest cell via `xvec.extract_points` and is substantially faster for large cloud rasters.	`"nearest"`
`years`	`sequence of int`	Time filter applied before extraction. Only valid for time-series rasters.	`None`
`date_start`	`str or date`	Date-range filter applied before extraction.	`None`
`date_end`	`str or date`	Date-range filter applied before extraction.	`None`
`bind`	`bool`	If `True`, merge the input location's non-geometry columns onto each output row. If `False`, return only geometry + extracted values.	`True`
`verbose`	`bool`	Print per-extraction progress messages to stderr.	`True`

Returns:

Type	Description
`GeoDataFrame`	Output GeoDataFrame with the raster's data variables as columns. For time-series rasters, output is long-form (one row per `geometry × time`) with `time` as a column; pivot to wide if needed via `df.pivot_table(index=..., columns="time", values=...)`.

Raises:

Type	Description
`ValueError`	If the raster has no CRS, if location CRS/columns are missing, if `method` isn't one of the two valid values, or if time filter args are passed for a non-time-indexed raster.

Examples:

Extract elevation at three RMBL-area field sites:

>>> import pysdp, geopandas as gpd
>>> from shapely.geometry import Point
>>> dem = pysdp.open_raster("R3D009")
>>> sites = gpd.GeoDataFrame(
...     {"site": ["Roaring Judy", "Gothic", "Galena Lake"]},
...     geometry=[
...         Point(-106.853186, 38.716995),
...         Point(-106.988934, 38.958446),
...         Point(-107.072569, 39.021644),
...     ],
...     crs="EPSG:4326",
... )
>>> samples = pysdp.extract_points(dem, sites)

Sample daily Tmax at the same sites and pivot to wide format:

>>> tmax = pysdp.open_raster("R4D004", date_start="2021-11-02", date_end="2021-11-04")
>>> long = pysdp.extract_points(tmax, sites)
>>> wide = long.pivot_table(index="site", columns="time", values="bayes_tmax_est")

Extract from a plain DataFrame (no GeoPandas needed upfront):

>>> import pandas as pd
>>> df = pd.DataFrame({"site": ["A"], "lon": [-106.85], "lat": [38.95]})
>>> out = pysdp.extract_points(dem, df, x="lon", y="lat", crs="EPSG:4326")

See Also

extract_polygons : Summarize values over polygon geometries. open_raster : Load a raster to extract from.

Source code in src/pysdp/extract.py

def extract_points(
    raster: xr.Dataset | xr.DataArray,
    locations: gpd.GeoDataFrame | pd.DataFrame,
    *,
    x: str = "x",
    y: str = "y",
    crs: str | None = None,
    method: Literal["nearest", "linear"] = "linear",
    years: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> gpd.GeoDataFrame:
    """Extract raster values at point locations.

    Accepts an ``xarray.Dataset`` or ``DataArray`` (typically from
    :func:`open_raster` / :func:`open_stack`) and a ``GeoDataFrame`` or
    plain ``DataFrame`` with ``x``/``y`` columns. Reprojects the input
    locations to the raster CRS if they differ.

    Parameters
    ----------
    raster : xarray.Dataset or xarray.DataArray
        The raster to sample from. Must have ``x`` and ``y`` spatial
        coordinates and a CRS set (via ``rio.write_crs``). Time-series
        rasters (with a ``time`` dim) produce long-form output.
    locations : GeoDataFrame or DataFrame
        Points to sample. If a plain ``DataFrame``, pass the column names
        via ``x=``/``y=`` and an explicit ``crs=``. If a ``GeoDataFrame``,
        its geometry column is used and its CRS must be set.
    x, y : str, default "x", "y"
        Column names holding longitude/x and latitude/y for
        ``DataFrame`` inputs. Ignored for ``GeoDataFrame`` inputs.
    crs : str, optional
        CRS of the input locations (e.g., ``"EPSG:4326"``). Required when
        ``locations`` is a plain ``DataFrame``; inferred from
        ``locations.crs`` for ``GeoDataFrame`` inputs.
    method : {"nearest", "linear"}, default "linear"
        Interpolation method. ``"linear"`` is bilinear via
        ``xarray.interp`` (requires ``scipy``, a core pysdp dep).
        ``"nearest"`` snaps to the nearest cell via ``xvec.extract_points``
        and is substantially faster for large cloud rasters.
    years : sequence of int, optional
        Time filter applied before extraction. Only valid for time-series
        rasters.
    date_start, date_end : str or datetime.date, optional
        Date-range filter applied before extraction.
    bind : bool, default True
        If ``True``, merge the input location's non-geometry columns onto
        each output row. If ``False``, return only geometry + extracted
        values.
    verbose : bool, default True
        Print per-extraction progress messages to stderr.

    Returns
    -------
    geopandas.GeoDataFrame
        Output GeoDataFrame with the raster's data variables as columns.
        For time-series rasters, output is **long-form** (one row per
        ``geometry × time``) with ``time`` as a column; pivot to wide if
        needed via ``df.pivot_table(index=..., columns="time", values=...)``.

    Raises
    ------
    ValueError
        If the raster has no CRS, if location CRS/columns are missing, if
        ``method`` isn't one of the two valid values, or if time filter
        args are passed for a non-time-indexed raster.

    Examples
    --------
    Extract elevation at three RMBL-area field sites:

    >>> import pysdp, geopandas as gpd
    >>> from shapely.geometry import Point
    >>> dem = pysdp.open_raster("R3D009")  # doctest: +SKIP
    >>> sites = gpd.GeoDataFrame(
    ...     {"site": ["Roaring Judy", "Gothic", "Galena Lake"]},
    ...     geometry=[
    ...         Point(-106.853186, 38.716995),
    ...         Point(-106.988934, 38.958446),
    ...         Point(-107.072569, 39.021644),
    ...     ],
    ...     crs="EPSG:4326",
    ... )
    >>> samples = pysdp.extract_points(dem, sites)  # doctest: +SKIP

    Sample daily Tmax at the same sites and pivot to wide format:

    >>> tmax = pysdp.open_raster("R4D004", date_start="2021-11-02", date_end="2021-11-04")  # doctest: +SKIP
    >>> long = pysdp.extract_points(tmax, sites)  # doctest: +SKIP
    >>> wide = long.pivot_table(index="site", columns="time", values="bayes_tmax_est")  # doctest: +SKIP

    Extract from a plain ``DataFrame`` (no GeoPandas needed upfront):

    >>> import pandas as pd
    >>> df = pd.DataFrame({"site": ["A"], "lon": [-106.85], "lat": [38.95]})
    >>> out = pysdp.extract_points(dem, df, x="lon", y="lat", crs="EPSG:4326")  # doctest: +SKIP

    See Also
    --------
    extract_polygons : Summarize values over polygon geometries.
    open_raster : Load a raster to extract from.
    """
    raster = _filter_by_time(
        raster,
        years=years,
        date_start=date_start,
        date_end=date_end,
        verbose=verbose,
    )
    gdf = _to_geodataframe(locations, x=x, y=y, crs=crs)
    gdf = _align_to_raster_crs(gdf, raster, verbose=verbose)

    n_points = len(gdf)
    n_time = int(raster.sizes.get("time", 1))
    _emit(f"Extracting values at {n_points} location(s) × {n_time} layer(s).", verbose)

    if method == "nearest":
        extracted = _point_extract_nearest(raster, gdf)
    elif method == "linear":
        extracted = _point_extract_linear(raster, gdf)
    else:
        raise ValueError(f"method must be 'nearest' or 'linear', got {method!r}.")

    _emit("Extraction complete.", verbose)
    return _extracted_to_geodataframe(extracted, gdf, bind=bind)

extract_polygons ¶

extract_polygons(
    raster: Dataset | DataArray,
    locations: GeoDataFrame,
    *,
    stats: Sequence[str] | str = "mean",
    exact: bool = False,
    all_cells: bool = False,
    years: Sequence[int] | None = None,
    date_start: str | date | None = None,
    date_end: str | date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> GeoDataFrame | DataFrame

Summarize raster values over polygon locations.

Computes per-polygon summary statistics (mean by default). For time-series rasters, produces one summary per (polygon × time) pair in long-form output.

Parameters:

Name	Type	Description	Default
`raster`	`Dataset or DataArray`	Raster to summarize. Must have a CRS set.	required
`locations`	`GeoDataFrame`	Polygon geometries. Must be a `GeoDataFrame` (not a plain `DataFrame`) with CRS set.	required
`stats`	`str or sequence of str`	Summary statistic(s) to compute. Accepts any `xvec.zonal_stats` string (`"mean"`, `"sum"`, `"std"`, `"min"`, `"max"`, `"median"`, `"count"`, `"nunique"`) or a callable. Pass a list for multiple stats.	`"mean"`
`exact`	`bool`	`False` (default) uses centroid-based cell inclusion via `xvec.zonal_stats` — matches rSDP / `terra::extract` behavior. `True` uses fractional-coverage weighting via `exactextract` (requires `pysdp[exact]`); recommended for small polygons relative to cell size. `True` path is a Phase 8a roadmap item and currently raises `NotImplementedError`.	`False`
`all_cells`	`bool`	If `True`, return a long-form DataFrame of per-cell values and coverage fractions instead of per-polygon summary statistics. Phase 8a roadmap item; currently raises `NotImplementedError`.	`False`
`years`	`optional`	Time-series filters applied before summarization. Same semantics as in :func:`extract_points`.	`None`
`date_start`	`optional`	Time-series filters applied before summarization. Same semantics as in :func:`extract_points`.	`None`
`date_end`	`optional`	Time-series filters applied before summarization. Same semantics as in :func:`extract_points`.	`None`
`bind`	`bool`	Merge input attribute columns onto output rows when `True`.	`True`
`verbose`	`bool`	Print progress messages.	`True`

Returns:

Type	Description
`GeoDataFrame or DataFrame`	GeoDataFrame when `bind=True`; DataFrame when `bind=False`. Columns include the raster's data variables (one per summary stat).

Raises:

Type	Description
`TypeError`	If `locations` isn't a `GeoDataFrame`.
`ValueError`	On missing CRS or other location validation failures.
`NotImplementedError`	For `exact=True` or `all_cells=True` (roadmap items).

Examples:

Compute mean snow duration over watersheds for 2019:

>>> import pysdp, geopandas as gpd
>>> snow = pysdp.open_raster("R4D001", years=[2019])
>>> watersheds = gpd.read_file("watersheds.gpkg")
>>> out = pysdp.extract_polygons(snow, watersheds, stats="mean")

Compute multiple statistics in one call:

>>> stats = pysdp.extract_polygons(
...     snow, watersheds, stats=["mean", "std", "min", "max"]
... )

See Also

extract_points : Extract at point geometries. open_raster : Load a raster.

Source code in src/pysdp/extract.py

def extract_polygons(
    raster: xr.Dataset | xr.DataArray,
    locations: gpd.GeoDataFrame,
    *,
    stats: Sequence[str] | str = "mean",
    exact: bool = False,
    all_cells: bool = False,
    years: Sequence[int] | None = None,
    date_start: str | datetime.date | None = None,
    date_end: str | datetime.date | None = None,
    bind: bool = True,
    verbose: bool = True,
) -> gpd.GeoDataFrame | pd.DataFrame:
    """Summarize raster values over polygon locations.

    Computes per-polygon summary statistics (mean by default). For
    time-series rasters, produces one summary per ``(polygon × time)``
    pair in long-form output.

    Parameters
    ----------
    raster : xarray.Dataset or xarray.DataArray
        Raster to summarize. Must have a CRS set.
    locations : GeoDataFrame
        Polygon geometries. Must be a ``GeoDataFrame`` (not a plain
        ``DataFrame``) with CRS set.
    stats : str or sequence of str, default "mean"
        Summary statistic(s) to compute. Accepts any ``xvec.zonal_stats``
        string (``"mean"``, ``"sum"``, ``"std"``, ``"min"``, ``"max"``,
        ``"median"``, ``"count"``, ``"nunique"``) or a callable. Pass a
        list for multiple stats.
    exact : bool, default False
        ``False`` (default) uses centroid-based cell inclusion via
        ``xvec.zonal_stats`` — matches rSDP / ``terra::extract`` behavior.
        ``True`` uses fractional-coverage weighting via ``exactextract``
        (requires ``pysdp[exact]``); recommended for small polygons
        relative to cell size. ``True`` path is a Phase 8a roadmap item and
        currently raises ``NotImplementedError``.
    all_cells : bool, default False
        If ``True``, return a long-form DataFrame of per-cell values and
        coverage fractions instead of per-polygon summary statistics. Phase
        8a roadmap item; currently raises ``NotImplementedError``.
    years, date_start, date_end : optional
        Time-series filters applied before summarization. Same semantics as
        in :func:`extract_points`.
    bind : bool, default True
        Merge input attribute columns onto output rows when ``True``.
    verbose : bool, default True
        Print progress messages.

    Returns
    -------
    geopandas.GeoDataFrame or pandas.DataFrame
        GeoDataFrame when ``bind=True``; DataFrame when ``bind=False``.
        Columns include the raster's data variables (one per summary stat).

    Raises
    ------
    TypeError
        If ``locations`` isn't a ``GeoDataFrame``.
    ValueError
        On missing CRS or other location validation failures.
    NotImplementedError
        For ``exact=True`` or ``all_cells=True`` (roadmap items).

    Examples
    --------
    Compute mean snow duration over watersheds for 2019:

    >>> import pysdp, geopandas as gpd
    >>> snow = pysdp.open_raster("R4D001", years=[2019])  # doctest: +SKIP
    >>> watersheds = gpd.read_file("watersheds.gpkg")  # doctest: +SKIP
    >>> out = pysdp.extract_polygons(snow, watersheds, stats="mean")  # doctest: +SKIP

    Compute multiple statistics in one call:

    >>> stats = pysdp.extract_polygons(
    ...     snow, watersheds, stats=["mean", "std", "min", "max"]
    ... )  # doctest: +SKIP

    See Also
    --------
    extract_points : Extract at point geometries.
    open_raster : Load a raster.
    """
    import geopandas as gpd

    if not isinstance(locations, gpd.GeoDataFrame):
        raise TypeError(
            f"extract_polygons requires a GeoDataFrame (got {type(locations).__name__}). "
            f"For point locations, use `pysdp.extract_points`."
        )

    raster = _filter_by_time(
        raster,
        years=years,
        date_start=date_start,
        date_end=date_end,
        verbose=verbose,
    )
    gdf = _align_to_raster_crs(locations, raster, verbose=verbose)

    _emit(
        f"Zonal extract at {len(gdf)} polygon(s) × {int(raster.sizes.get('time', 1))} layer(s).",
        verbose,
    )

    if all_cells:
        raise NotImplementedError(
            "all_cells=True (per-cell long-form output with fractions) is not yet "
            "implemented; tracked in ROADMAP §Phase 8a. Use `sum_fun='mean'` or "
            "another summary for now."
        )

    if exact:
        extracted = _zonal_stats_exact(raster, gdf, stats=stats)
    else:
        extracted = _zonal_stats_xvec(raster, gdf, stats=stats, all_touched=False)

    _emit("Extraction complete.", verbose)
    return _extracted_to_geodataframe(extracted, gdf, bind=bind)

Download¶

download ¶

Bulk download of SDP datasets to local disk.

Ports rSDP's download_data(). See SPEC.md §4.4.

Primary backend: requests + concurrent.futures.ThreadPoolExecutor, using only core pysdp dependencies. Higher-throughput backends (obstore, fsspec + s3fs) are planned for Phase 7 when at-scale download performance becomes a hot path (ROADMAP §Phase 7); for the v0.1 use case — researchers pulling a handful of SDP products to local disk — the threaded-requests path is plenty fast.

download ¶

download(
    urls: str | Sequence[str] | None = None,
    output_dir: str | PathLike[str] | None = None,
    *,
    catalog_ids: str | Sequence[str] | None = None,
    overwrite: bool = False,
    resume: bool = True,
    max_workers: int = 8,
    return_status: bool = True,
    verbose: bool = True,
) -> DataFrame | None

Download SDP COGs to a local directory.

Use this when you need the raster on disk (for tools that don't support cloud reads, for offline workflows, or for bulk mirroring). For interactive analysis, :func:open_raster lazy-reads from cloud without a download step.

Parameters:

Name	Type	Description	Default
`urls`	`str or sequence of str`	Direct HTTPS URL(s) to the COGs to download. Mutually exclusive with `catalog_ids`; exactly one is required.	`None`
`output_dir`	`str or PathLike`	Destination directory. Created if it doesn't exist. Files are named from the URL basename.	`None`
`catalog_ids`	`str or sequence of str`	Alternative to `urls`. pySDP expands each catalog_id via the packaged catalog: `Single` → one URL `Yearly` → one URL per catalog year `Monthly` → one URL per catalog month between MinDate and MaxDate `Daily` → raises `ValueError` (would be thousands of files; pass explicit `urls=` for selective daily slices)	`None`
`overwrite`	`bool`	If `False` (default), skip destination files that already exist and are > 1 kB (matches rSDP's valid-file heuristic). If `True`, re-download even if a valid file is present.	`False`
`resume`	`bool`	When a small partial file exists (< 1 kB), attempt an HTTP Range resume instead of re-downloading from scratch.	`True`
`max_workers`	`int`	Number of concurrent HTTP fetches (via a `ThreadPoolExecutor`).	`8`
`return_status`	`bool`	If `True`, return a DataFrame with one row per URL. If `False`, return `None`.	`True`
`verbose`	`bool`	Print skip/download progress messages to stderr.	`True`

Returns:

Type	Description
`DataFrame or None`	Status report with columns `url`, `dest`, `success`, `status` (HTTP code or `"exists"`), `size` (bytes), `error` (str or None). One row per URL, including pre-existing files that were skipped.

Raises:

Type	Description
`ValueError`	If both `urls` and `catalog_ids` are given, or neither, or if `output_dir` is missing, or if a `Daily` catalog_id is passed.
`KeyError`	If a catalog_id isn't in the packaged catalog.

Warns:

Type	Description
`UserWarning`	If any downloads failed; details are in the returned DataFrame's `error` column.

Examples:

Download two real SDP products by catalog ID:

>>> import pysdp
>>> status = pysdp.download(
...     catalog_ids=["R1D001", "R3D009"],
...     output_dir="~/sdp-data",
... )
>>> status[["dest", "success", "size"]]

Hand-pick a subset of daily temperature slices:

>>> urls = [
...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0305_est.tif",
...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0306_est.tif",
... ]
>>> pysdp.download(urls=urls, output_dir="~/tmax-samples")

See Also

open_raster : Open an SDP raster lazily, without downloading.

Source code in src/pysdp/download.py

def download(
    urls: str | Sequence[str] | None = None,
    output_dir: str | os.PathLike[str] | None = None,
    *,
    catalog_ids: str | Sequence[str] | None = None,
    overwrite: bool = False,
    resume: bool = True,
    max_workers: int = 8,
    return_status: bool = True,
    verbose: bool = True,
) -> pd.DataFrame | None:
    """Download SDP COGs to a local directory.

    Use this when you need the raster on disk (for tools that don't support
    cloud reads, for offline workflows, or for bulk mirroring). For
    interactive analysis, :func:`open_raster` lazy-reads from cloud without
    a download step.

    Parameters
    ----------
    urls : str or sequence of str, optional
        Direct HTTPS URL(s) to the COGs to download. Mutually exclusive with
        ``catalog_ids``; exactly one is required.
    output_dir : str or PathLike
        Destination directory. Created if it doesn't exist. Files are named
        from the URL basename.
    catalog_ids : str or sequence of str, optional
        Alternative to ``urls``. pySDP expands each catalog_id via the
        packaged catalog:

        - ``Single`` → one URL
        - ``Yearly`` → one URL per catalog year
        - ``Monthly`` → one URL per catalog month between MinDate and MaxDate
        - ``Daily`` → raises ``ValueError`` (would be thousands of files;
          pass explicit ``urls=`` for selective daily slices)
    overwrite : bool, default False
        If ``False`` (default), skip destination files that already exist
        and are > 1 kB (matches rSDP's valid-file heuristic). If ``True``,
        re-download even if a valid file is present.
    resume : bool, default True
        When a small partial file exists (< 1 kB), attempt an HTTP Range
        resume instead of re-downloading from scratch.
    max_workers : int, default 8
        Number of concurrent HTTP fetches (via a ``ThreadPoolExecutor``).
    return_status : bool, default True
        If ``True``, return a DataFrame with one row per URL. If ``False``,
        return ``None``.
    verbose : bool, default True
        Print skip/download progress messages to stderr.

    Returns
    -------
    pandas.DataFrame or None
        Status report with columns ``url``, ``dest``, ``success``,
        ``status`` (HTTP code or ``"exists"``), ``size`` (bytes),
        ``error`` (str or None). One row per URL, including pre-existing
        files that were skipped.

    Raises
    ------
    ValueError
        If both ``urls`` and ``catalog_ids`` are given, or neither, or if
        ``output_dir`` is missing, or if a ``Daily`` catalog_id is passed.
    KeyError
        If a catalog_id isn't in the packaged catalog.

    Warns
    -----
    UserWarning
        If any downloads failed; details are in the returned DataFrame's
        ``error`` column.

    Examples
    --------
    Download two real SDP products by catalog ID:

    >>> import pysdp
    >>> status = pysdp.download(
    ...     catalog_ids=["R1D001", "R3D009"],
    ...     output_dir="~/sdp-data",
    ... )  # doctest: +SKIP
    >>> status[["dest", "success", "size"]]  # doctest: +SKIP

    Hand-pick a subset of daily temperature slices:

    >>> urls = [
    ...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0305_est.tif",
    ...     "https://rmbl-sdp.s3.us-east-2.amazonaws.com/data_products/released/release4/bayes_tmax_year_2021_day_0306_est.tif",
    ... ]
    >>> pysdp.download(urls=urls, output_dir="~/tmax-samples")  # doctest: +SKIP

    See Also
    --------
    open_raster : Open an SDP raster lazily, without downloading.
    """
    import pandas as pd

    if urls is not None and catalog_ids is not None:
        raise ValueError("Specify `urls` OR `catalog_ids`, not both.")
    if urls is None and catalog_ids is None:
        raise ValueError("You must specify `urls` or `catalog_ids`.")
    if output_dir is None:
        raise ValueError("`output_dir` is required.")

    output_path = Path(output_dir).expanduser()
    output_path.mkdir(parents=True, exist_ok=True)

    if catalog_ids is not None:
        url_list = _expand_catalog_ids(catalog_ids)
    elif isinstance(urls, str):
        url_list = [urls]
    else:
        url_list = list(urls) if urls is not None else []

    dest_paths = [output_path / Path(u).name for u in url_list]

    existing: list[dict[str, Any]] = []
    to_download_urls: list[str] = []
    to_download_dests: list[Path] = []
    for url, dest in zip(url_list, dest_paths, strict=True):
        if _is_valid_existing(dest) and not overwrite:
            existing.append(
                {
                    "url": url,
                    "dest": str(dest),
                    "success": True,
                    "status": "exists",
                    "size": dest.stat().st_size,
                    "error": None,
                }
            )
        else:
            to_download_urls.append(url)
            to_download_dests.append(dest)

    if existing:
        _emit(
            f"Skipping {len(existing)} existing file(s). Specify `overwrite=True` to re-download.",
            verbose,
        )
    if to_download_urls:
        _emit(
            f"Downloading {len(to_download_urls)} file(s) to {output_path}...",
            verbose,
        )

    download_results = _download_parallel(
        to_download_urls,
        to_download_dests,
        max_workers=max_workers,
        resume=resume,
    )

    failures = [r for r in download_results if not r["success"]]
    if failures:
        warnings.warn(
            f"Downloaded {len(download_results) - len(failures)} / "
            f"{len(download_results)} file(s) successfully; "
            f"{len(failures)} failed (see returned DataFrame for details).",
            UserWarning,
            stacklevel=2,
        )

    _emit("Download complete.", verbose)

    all_results = existing + download_results
    if return_status:
        return pd.DataFrame(all_results)
    return None

Constants¶

constants ¶

Package constants for pysdp.

Values mirror the internal constants in rSDP's R/constants.R.

SDP_CRS `module-attribute` ¶

SDP_CRS: Final[str] = 'EPSG:32613'

Coordinate reference system for all SDP raster products (UTM zone 13N).

DOMAINS `module-attribute` ¶

DOMAINS: Final[tuple[str, ...]] = (
    "UG",
    "UER",
    "GT",
    "GMUG",
)

Spatial domains available in the SDP.

TYPES `module-attribute` ¶

TYPES: Final[tuple[str, ...]] = (
    "Mask",
    "Topo",
    "Vegetation",
    "Hydro",
    "Planning",
    "Radiation",
    "Snow",
    "Climate",
    "Imagery",
    "Supplemental",
)

Dataset type categories.

RELEASES `module-attribute` ¶

RELEASES: Final[tuple[str, ...]] = (
    "Basemaps",
    "Release1",
    "Release2",
    "Release3",
    "Release4",
    "Release5",
)

Dataset release cohorts.

TIMESERIES_TYPES `module-attribute` ¶

TIMESERIES_TYPES: Final[tuple[str, ...]] = (
    "Single",
    "Yearly",
    "Seasonal",
    "Monthly",
    "Daily",
)

Time-series structure types for SDP datasets.

API reference¶

Catalog discovery¶

get_catalog ¶

get_metadata ¶

Raster access¶

open_raster ¶

open_stack ¶

Extraction¶

extract_points ¶

extract_polygons ¶

Download¶

download ¶

download ¶

Constants¶

constants ¶

SDP_CRS module-attribute ¶

DOMAINS module-attribute ¶

TYPES module-attribute ¶

RELEASES module-attribute ¶

TIMESERIES_TYPES module-attribute ¶

SDP_CRS `module-attribute` ¶

DOMAINS `module-attribute` ¶

TYPES `module-attribute` ¶

RELEASES `module-attribute` ¶

TIMESERIES_TYPES `module-attribute` ¶