GeoGrapher tutorial - the basics

This tutorial shows how to use GeoGrapher to create a remote sensing dataset from vector data. If you are reading the html version in the documentation and would prefer the actual ipynb file you can find it here. As vector data, we will use bounding boxes for sports stadiums. You can download the file stadiums.geojson containing the vector data from here.

Contents:

  1. Creating an empty dataset

  2. Adding vector data

  3. Downloading rasters for the vector data

  4. Opening and saving a connector

1. Creating an empty dataset

First, we import geographer, as well as some other imports we will need.

[1]:
from pathlib import Path

from datetime import date, timedelta

import geographer as gg
import geopandas as gpd


The GeoGrapher library is built around the Connector class. A connector organizes a dataset of raster and vector data. To create an empty dataset, we use the from_scratch factory method:

The connector keeps track of the containment and intersection relations between raster and vector data in a bipartite graph. See our blogpost for a detailed explanation of why we want to keep track of this information.

[2]:
from geographer import Connector

DATA_DIR = Path("gg_example_dataset")

connector = Connector.from_scratch(
    data_dir=DATA_DIR,
    task_vector_classes=["football", "baseball"],
)

This creates a connector with a dataset in DATA_DIR.

The task_vector_classes argument defines the classes that objects can belong to for multi-class segmentation. It is used when creating labels (see this tutorial notebook). It is optional and at this not important here.

The most important attributes of a connector are its rasters and vectors attributes. These are geopandas GeoDataFrames. The vectors GeoDataFrame contains the vector geometries of the stadiums as well as tabular information about the stadiums (name, country, etc). It also contains an "raster_count" column, which we will explain later. The rasters GeoDataFrame contains as geometries the bounding boxes of the rasters in our dataset as well as tabular information about the rasters (e.g. raster name, date, etc).

[4]:
connector.vectors
[4]:
geometry raster_count
vector_name
[5]:
connector.rasters
[5]:
geometry
raster_name

As you can see both GeoDataFrames are empty.

2. Adding vector data

Let’s try adding our stadiums to the vectors.

First, we read a GeoDataFrame containing the vector data from disk. You can download the example geojson file here.

[6]:
stadiums = gpd.read_file("stadiums.geojson")
stadiums
[6]:
vector_name location type geometry
0 Munich Olympiastadion Munich, Germany football POLYGON Z ((11.54677 48.17472 0.00000, 11.5446...
1 Munich Track and Field Stadium1 Munich, Germany football POLYGON Z ((11.54382 48.17279 0.00000, 11.5438...
2 Munich Olympia Track and Field2 Munich, Germany football POLYGON Z ((11.54686 48.17892 0.00000, 11.5468...
3 Munich Staedtisches Stadion Dantestr Munich, Germany football POLYGON Z ((11.52913 48.16874 0.00000, 11.5291...
4 Vasil Levski National Stadium Sofia, Bulgaria football POLYGON Z ((23.33410 42.68813 0.00000, 23.3340...
5 Bulgarian Army Stadium Sofia, Bulgaria football POLYGON Z ((23.34065 42.68492 0.00000, 23.3406...
6 Arena Sofia Sofia, Bulgaria football POLYGON Z ((23.34018 42.68318 0.00000, 23.3401...
7 Jingu Baseball Stadium Tokyo, Japan baseball POLYGON Z ((139.71597 35.67490 0.00000, 139.71...
8 Japan National Stadium Tokyo, Japan football POLYGON Z ((139.71482 35.67644 0.00000, 139.71...

It will be convenient to set the index to the vector_name column: TODO EXPLAIN WHY? DO WE NEEDS INDEX TO BE STRINGS?

[7]:
stadiums = stadiums.set_index("vector_name")
stadiums
[7]:
location geometry
vector_name
Munich Olympiastadion Munich, Germany POLYGON Z ((11.54677 48.17472 0.00000, 11.5446...
Munich Track and Field Stadium1 Munich, Germany POLYGON Z ((11.54382 48.17279 0.00000, 11.5438...
Munich Olympia Track and Field2 Munich, Germany POLYGON Z ((11.54686 48.17892 0.00000, 11.5468...
Munich Staedtisches Stadion Dantestr Munich, Germany POLYGON Z ((11.52913 48.16874 0.00000, 11.5291...
Vasil Levski National Stadium Sofia, Bulgaria POLYGON Z ((23.33410 42.68813 0.00000, 23.3340...
Bulgarian Army Stadium Sofia, Bulgaria POLYGON Z ((23.34065 42.68492 0.00000, 23.3406...
Arena Sofia Sofia, Bulgaria POLYGON Z ((23.34018 42.68318 0.00000, 23.3401...
Jingu Baseball Stadium Tokyo, Japan POLYGON Z ((139.71597 35.67490 0.00000, 139.71...
Japan National Stadium Tokyo, Japan POLYGON Z ((139.71482 35.67644 0.00000, 139.71...

Now, we can integrate the vector features into the dataset, i.e. into the connector:

[12]:
connector.add_to_vectors(stadiums)

The stadiums have now been added to the connector’s vectors GeoDataFrame:

[13]:
connector.vectors
[13]:
geometry raster_count location
vector_name
Munich Olympiastadion POLYGON Z ((11.54677 48.17472 0.00000, 11.5446... 0 Munich, Germany
Munich Track and Field Stadium1 POLYGON Z ((11.54382 48.17279 0.00000, 11.5438... 0 Munich, Germany
Munich Olympia Track and Field2 POLYGON Z ((11.54686 48.17892 0.00000, 11.5468... 0 Munich, Germany
Munich Staedtisches Stadion Dantestr POLYGON Z ((11.52913 48.16874 0.00000, 11.5291... 0 Munich, Germany
Vasil Levski National Stadium POLYGON Z ((23.33410 42.68813 0.00000, 23.3340... 0 Sofia, Bulgaria
Bulgarian Army Stadium POLYGON Z ((23.34065 42.68492 0.00000, 23.3406... 0 Sofia, Bulgaria
Arena Sofia POLYGON Z ((23.34018 42.68318 0.00000, 23.3401... 0 Sofia, Bulgaria
Jingu Baseball Stadium POLYGON Z ((139.71597 35.67490 0.00000, 139.71... 0 Tokyo, Japan
Japan National Stadium POLYGON Z ((139.71482 35.67644 0.00000, 139.71... 0 Tokyo, Japan

3. Downloading rasters for the vector data

To download rasters for the stadiums, we use the RasterDownloaderForVectors. This class needs to be passed a DownloaderForSingleVector to interface with the data provider for our rasters, and a RasterDownloadProcessor to process the downloaded files. In this example, we will use the EodagDownloaderForSingleVector which uses eodag as a backend giving easy access to more than 10 providers and more than 50 different product types. We will use it to download Sentinel-2 from the Copernicus Dataspace. To process the downloaded SAFE files (see here for an explanation of the Sentinel-2 data format) into GeoTiffs we use the Sentinel2SAFEProcessor. The GeoTiff format is a georeferenced version for remote sensing raster data of the Tiff format for normal rasters.

Here, we define the downloader:

[14]:
from geographer.downloaders import (
    RasterDownloaderForVectors,
    EodagDownloaderForSingleVector,
    Sentinel2SAFEProcessor,
)

download_processor = Sentinel2SAFEProcessor()
downloader_for_single_vector = EodagDownloaderForSingleVector()
downloader = RasterDownloaderForVectors(
    downloader_for_single_vector=downloader_for_single_vector,
    download_processor=download_processor,
)

TODO Username and password for cop_dataspace

If you do not yet have a copernicus dataspace account, you can create one here. To use eodag, eodag will need the username and password of your copernicus dataspace account. One can set these in a config file, but here we will use environment variables:

[ ]:
# os.environ["EODAG__COP_DATASPACE__AUTH__CREDENTIALS__USERNAME"] = "PLEASE_CHANGE_ME"
# os.environ["EODAG__COP_DATASPACE__AUTH__CREDENTIALS__PASSWORD"] = "PLEASE_CHANGE_ME"

The downloader_for_single_vector has an eodag attribute which is an EODataAccessGateway. We can use it to for example check that it is configured to access the copernicus dataspace provider:

[ ]:
downloader_for_single_vector.eodag.

assert "cop_dataspace" in downloader_for_single_vector.eodag.available_providers(
        product_type="S2_MSI_L2A"
    )

To download rasters and add them to our dataset we then run the following command.

[14]:
# Here, we define the parameters needed by the EodagDownloaderForSingleVector.download method
downloader_params = {
    "search_kwargs": {  # Keyword arguments for the eodag search_all method
        "provider": "cop_dataspace",  # Download from copernicus dataspace
        "productType": "S2_MSI_L2A",  # Search for Sentinel-2 L2A products
        "start": (date.today() - timedelta(days=364)).strftime("%Y-%m-%d"),  # one year ago
        "end": date.today().strftime("%Y-%m-%d"),  # today
    },
    "filter_online": True,  # Filter out products that are not online
    "sort_by": ("cloudCover", "ASC"),  # Sort products by percentage of cloud cover in ascending order
    "suffix_to_remove": ".SAFE"  # Will strip .SAFE from the stem of the tif file names
}
# Here, we define the parameters needed by the Sentinel2SAFEProcessor
processor_params = {
    "resolution": 10,  # Extract all 10m resolution bands
    "delete_safe": True,  # Delete the SAFE file after extracting a .tif file
}

downloader.download(
    connector=connector,
    target_raster_count=2,  # optional, defaults to 1. Aim for 2 rasters covering each stadium. See below for further explanation.
    downloader_params=downloader_params,
    processor_params=processor_params,
)
2022-09-21 22:50:49,613 - geographer.downloaders.sentinel2_safe_unpacking - INFO - Using all zero band for gml mask ('CLOUDS', 'B00') for S2A_MSIL2A_20220722T092041_N0400_R093_T34TFN_20220722T134859.SAFE
2022-09-21 22:57:35,719 - geographer.downloaders.sentinel2_safe_unpacking - INFO - Using all zero band for gml mask ('CLOUDS', 'B00') for S2A_MSIL2A_20220413T092031_N0400_R093_T34TFN_20220413T123632.SAFE
2022-09-21 23:03:51,514 - geographer.downloaders.sentinel2_safe_unpacking - INFO - Using all zero band for gml mask ('CLOUDS', 'B00') for S2A_MSIL2A_20220627T100611_N0400_R022_T32UPU_20220627T162810.SAFE
2022-09-21 23:10:43,443 - geographer.downloaders.sentinel2_safe_unpacking - INFO - Using all zero band for gml mask ('CLOUDS', 'B00') for S2B_MSIL2A_20220804T101559_N0400_R065_T32UPU_20220804T130854.SAFE
2022-09-21 23:17:38,499 - geographer.downloaders.sentinel2_safe_unpacking - INFO - Using all zero band for gml mask ('CLOUDS', 'B00') for S2A_MSIL2A_20220412T012701_N0400_R074_T54SUE_20220412T042315.SAFE
2022-09-21 23:24:31,570 - geographer.downloaders.sentinel2_safe_unpacking - INFO - Using all zero band for gml mask ('CLOUDS', 'B00') for S2A_MSIL2A_20220701T012711_N0400_R074_T54SUE_20220701T043318.SAFE

Notice that we set the optional target_raster_count which defines the number of distinct rasters each stadium should be contained in to download per argument to 2. The rasters attribute now contains information about the rasters:

[12]:
connector.rasters
[12]:
raster_processed? timestamp orig_crs_epsg_code geometry
raster_name
S2A_MSIL2A_20220722T092041_N0400_R093_T34TFN_20220722T134859.tif True 2022-07-22-09:20:41 32634 POLYGON ((23.54663 42.33578, 23.58754 43.32358...
S2A_MSIL2A_20220413T092031_N0400_R093_T34TFN_20220413T123632.tif True 2022-04-13-09:20:31 32634 POLYGON ((23.54663 42.33578, 23.58754 43.32358...
S2A_MSIL2A_20220627T100611_N0400_R022_T32UPU_20220627T162810.tif True 2022-06-27-10:06:11 32632 POLYGON ((11.79809 47.73104, 11.85244 48.71769...
S2B_MSIL2A_20220804T101559_N0400_R065_T32UPU_20220804T130854.tif True 2022-08-04-10:15:59 32632 POLYGON ((11.79809 47.73104, 11.85244 48.71769...
S2A_MSIL2A_20220412T012701_N0400_R074_T54SUE_20220412T042315.tif True 2022-04-12-01:27:01 32654 POLYGON ((140.00972 35.15084, 139.99743 36.140...
S2A_MSIL2A_20220701T012711_N0400_R074_T54SUE_20220701T043318.tif True 2022-07-01-01:27:11 32654 POLYGON ((140.00972 35.15084, 139.99743 36.140...

Now, let’s take another look at the vectors. It contains an "raster_count" column. This column tells us how many rasters each vector feature (i.e. in our case stadium) is fully contained in. Previously, these values were all 0, but now they are all 2. This reflects the value of 2 we passed to the optional target_raster_count argument above.

[29]:
connector.vectors
[29]:
raster_count location download_exception type geometry
vector_name
Munich Olympiastadion 2 Munich, Germany NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((11.54677 48.17472 0.00000, 11.5446...
Munich Track and Field Stadium1 2 Munich, Germany NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((11.54382 48.17279 0.00000, 11.5438...
Munich Olympia Track and Field2 2 Munich, Germany NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((11.54686 48.17892 0.00000, 11.5468...
Munich Staedtisches Stadion Dantestr 2 Munich, Germany NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((11.52913 48.16874 0.00000, 11.5291...
Vasil Levski National Stadium 2 Sofia, Bulgaria NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((23.33410 42.68813 0.00000, 23.3340...
Bulgarian Army Stadium 2 Sofia, Bulgaria NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((23.34065 42.68492 0.00000, 23.3406...
Arena Sofia 2 Sofia, Bulgaria NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((23.34018 42.68318 0.00000, 23.3401...
Jingu Baseball Stadium 2 Tokyo, Japan NoImgsForVectorFeatureFoundError('No images fo... baseball POLYGON Z ((139.71597 35.67490 0.00000, 139.71...
Japan National Stadium 2 Tokyo, Japan NoImgsForVectorFeatureFoundError('No images fo... football POLYGON Z ((139.71482 35.67644 0.00000, 139.71...

The connector keeps track of the containment and intersection relations between vector features and rasters in the form of an internal bipartite graph. We can ask questions about this graph, such as which rasters contain (or intersect) a given vector feature (stadium):

[14]:
# rasters containing a vector feature
vector_name = "Munich Olympiastadion"
containing_rasters = connector.rasters_containing_vector(vector_name)
print(f"rasters containing {vector_name}:\n{containing_rasters} \n")

# vector features intersecting a raster
raster_name = containing_rasters[0]
intersecting_vectors = connector.vectors_intersecting_raster(raster_name)
print(f"vector features (stadiums) intersecting {raster_name}:\n{intersecting_vectors}")
rasters containing Munich Olympiastadion:
['S2A_MSIL2A_20220627T100611_N0400_R022_T32UPU_20220627T162810.tif', 'S2B_MSIL2A_20220804T101559_N0400_R065_T32UPU_20220804T130854.tif']

vector features (stadiums) intersecting S2A_MSIL2A_20220627T100611_N0400_R022_T32UPU_20220627T162810.tif:
['Munich Staedtisches Stadion Dantestr', 'Munich Olympia Track and Field2', 'Munich Olympiastadion', 'Munich Track and Field Stadium1']

4. Loading and saving a connector

To save the connector, we use the save method. This will save the connector to the connector subdirectory of the connector’s data_dir:

[16]:
connector.save()

In our case, saving the connector wasn’t actually neccessary, since the downloader’s download method automatically saves the connector.

To load an existing connector, we use the from_data_dir method:

[ ]:
connector = Connector.from_data_dir(DATA_DIR)

To see how to cut this dataset and create labels for it so that we can do ML with it, read through the Creating a ML dataset tutorial notebook.