index – CNG Tutorials

How to run this tutorial

In order to run the code in this tutorial, you can either download the notebook to run it on your local computer, or click the button below to run the tutorial in a GitHub Codespace.

Reading Data from the STAC API

The Planetary Computer catalogs the datasets we host using the STAC (SpatioTemporal Asset Catalog) specification. We provide a STAC API endpoint that can be used to search our datasets by space, time, and more. This quickstart will show you how to search for data using our STAC API and open-source Python libraries. For more on how to use our STAC API from R, see Reading data from the STAC API with R.

First we’ll use pystac-client to open up our STAC API:

from pystac_client import Client

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

Searching

We can use the STAC API to search for assets meeting some criteria. This might include the date and time the asset covers, is spatial extent, or any other property captured in the STAC item’s metadata.

In this example we’ll search for imagery from Landsat Collection 2 Level-2 area around Microsoft’s main campus in December of 2020.

time_range = "2020-12-01/2020-12-31"
bbox = [-122.2751, 47.5469, -121.9613, 47.7458]

search = catalog.search(collections=["landsat-8-c2-l2"], bbox=bbox, datetime=time_range)
items = search.get_all_items()
len(items)

In that example our spatial query used a bounding box with a bbox. Alternatively, you can pass a GeoJSON object as intersects

area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-122.2751, 47.5469],
            [-121.9613, 47.9613],
            [-121.9613, 47.9613],
            [-122.2751, 47.9613],
            [-122.2751, 47.5469],
        ]
    ],
}

time_range = "2020-12-01/2020-12-31"

search = catalog.search(
    collections=["landsat-8-c2-l2"], intersects=area_of_interest, datetime=time_range
)

items is a pystac.ItemCollection. We can see that 4 items matched our search criteria.

len(items)

Each pystac.Item in this ItemCollection includes all the metadata for that scene. STAC Items are GeoJSON features, and so can be loaded by libraries like geopandas.

import geopandas

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")
df

We can use the eo extension to sort the items by cloudiness. We’ll grab an item with low cloudiness:

selected_item = min(items, key=lambda item: item.properties["eo:cloud_cover"])
selected_item

Each STAC item has one or more Assets, which include links to the actual files.

import rich.table

table = rich.table.Table("Asset Key", "Descripiption")
for asset_key, asset in selected_item.assets.items():
    # print(f"{asset_key:<25} - {asset.title}")
    table.add_row(asset_key, asset.title)

table

Here, we’ll inspect the thumbnail asset.

selected_item.assets["thumbnail"].to_dict()

from IPython.display import Image

Image(url=selected_item.assets["thumbnail"].href, width=500)

That rendered_preview asset is generated dynamically from the raw data using the Planetary Computer’s data API. We can access the raw data, stored as Cloud Optimzied GeoTIFFs in Azure Blob Storage, using one of the other assets. That said, we do need to do one more thing before accessing the data. If we simply made a request to the file in blob storage we’d get a 404:

import requests

url = selected_item.assets["SR_B2"].href
print("Accessing", url)
response = requests.get(url)
response

That’s because the Plantary Computer uses Azure Blob Storage SAS Tokens to enable access to our data, which allows us to provide the data for free to anyone, anywhere while maintaining some control over the amount of egress for datasets.

To get a token for access, you can use the Planetary Computer’s Data Authentication API. You can access that anonymously, or you can provide an API Key for higher rate limits and longer-lived tokens.

You can also use the planetary-computer package to generate tokens and sign asset HREFs for access. You can install via pip with

> pip install planetary-computer

import planetary_computer

# PC_SDK_SUBSCRIPTION_KEY
signed_href = planetary_computer.sign(selected_item).assets["SR_B2"].href

We can load up that single COG using libraries like rioxarray or rasterio

# import xarray as xr
import rioxarray

ds = rioxarray.open_rasterio(signed_href, overview_level=4).squeeze()
img = ds.plot(cmap="Blues", add_colorbar=False)
img.axes.set_axis_off();

If you wish to work with multiple STAC items as a datacube, you can use libraries like stackstac or odc-stac.

import stackstac

ds = stackstac.stack(
  planetary_computer.sign(items),
  epsg=4326
)
ds

Searching on additional properties

Previously, we searched for items by space and time. Because the Planetary Computer’s STAC API supports the query parameter, you can search on additional properties on the STAC item.

For example, collections like sentinel-2-l2a and landsat-8-c2-l2 both implement the eo STAC extension and include an eo:cloud_cover property. Use query={"eo:cloud_cover": {"lt": 20}} to return only items that are less than 20% cloudy.

time_range = "2020-12-01/2020-12-31"
bbox = [-122.2751, 47.5469, -121.9613, 47.7458]

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=bbox,
    datetime=time_range,
    query={"eo:cloud_cover": {"lt": 20}},
)
items = search.get_all_items()

Other common uses of the query parameter is to filter a collection down to items of a specific type, For example, the GOES-CMI collection includes images from various when the satellite is in various modes, which produces images of either the Full Disk of the earth, the continental United States, or a mesoscale. You can use goes:image-type to filter down to just the ones you want.

search = catalog.search(
    collections=["goes-cmi"],
    bbox=[-67.2729, 25.6000, -61.7999, 27.5423],
    datetime=["2018-09-11T13:00:00Z", "2018-09-11T15:40:00Z"],
    query={"goes:image-type": {"eq": "MESOSCALE"}},
)

Analyzing STAC Metadata

STAC items are proper GeoJSON Features, and so can be treated as a kind of data on their own.

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-124.2751, 45.5469, -110.9613, 47.7458],
    datetime="2020-12-26/2020-12-31",
)
items = search.get_all_items()

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")

df[["geometry", "datetime", "s2:mgrs_tile", "eo:cloud_cover"]].explore(
    column="eo:cloud_cover", style_kwds={"fillOpacity": 0.1}
)

Or we can plot cloudiness of a region over time.

import pandas as pd

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-124.2751, 45.5469, -123.9613, 45.7458],
    datetime="2020-01-01/2020-12-31",
)
items = search.get_all_items()
df = geopandas.GeoDataFrame.from_features(items.to_dict())
df["datetime"] = pd.to_datetime(df["datetime"])

ts = df.set_index("datetime").sort_index()["eo:cloud_cover"].rolling(7).mean()
ts.plot(title="eo:cloud-cover (7-scene rolling average)");

Working with STAC Catalogs and Collections

Our catalog is a STAC Catalog that we can crawl or search. The Catalog contains STAC Collections for each dataset we have indexed (which is not the yet the entirity of data hosted by the Planetary Computer).

Collections have information about the STAC Items they contain. For instance, here we look at the Bands available for Landsat Collection 2 Level 2 data:

import pandas as pd

landsat = catalog.get_collection("landsat-c2-l2")

bands = [k for k,v in landsat.extra_fields['item_assets'].items() if 'data' in v['roles']]
pd.DataFrame(bands)

We can see what Assets are available on our item with:

pd.DataFrame.from_dict(landsat.extra_fields["item_assets"], orient="index")[
    ["title", "description", "gsd"]
]

Some collections, like Daymet include collection-level assets. You can use the .assets property to access those assets.

collection = catalog.get_collection("daymet-daily-na")
collection

Just like assets on items, these assets include links to data in Azure Blob Storage.

asset = collection.assets["zarr-https"]
asset

import fsspec
import xarray as xr

store = fsspec.get_mapper(asset.href)
ds = xr.open_zarr(store, **asset.extra_fields["xarray:open_kwargs"])
ds