Cutting datasets: basics

The DSCutter classes are used for cutting a dataset. GeoGrapher has two general customizable DSCutter classes: geographer.cutters. There are two helper functions that return DSCutter s customized for the following two common use cases:

Cutting every raster to a grid of rasters

To create a new dataset in target_data_dir from a source dataset in source_data_dir by cutting every raster in the dataset to a grid of rasters use the geographer.cutters.get_cutter_every_raster_to_grid() function:

from geographer.cutters import get_cutter_every_raster_to_grid
cutter = get_cutter_every_raster_to_grid(
    new_raster_size=512,
    source_data_dir=<SOURCE_DATA_DIR>,
    target_data_dir=<TARGET_DATA_DIR>,
    name=<OPTIONAL_NAME_FOR_SAVING>)
cutter.cut()

The geographer.cutters.get_cutter_every_raster_to_grid() function returns a geographer.cutters.DSCutterIterOverRasters instance. The cut() method will save the cutter to a JSON file in connector.connector_dir. To update the target dataset after the source dataset has grown, first read the JSON file and then run update():

from geographer.cutters import DSCutterIterOverRasters
dataset_cutter = DSCutterIterOverRasters.from_json_file(<path/to/saved.json>)
dataset_cutter.update()

Warning

The update method assumes that that no vectors or raster rasters that remain in the target dataset have been removed from the source dataset.

Cutting rasters around vectors

Cutting rasters around vector features (e.g. create 512 × 512 pixel cutouts around vector features from 10980 × 10980 Sentinel-2 tiles):

from geographer.cutters import get_cutter_rasters_around_every_vector
cutter = get_cutter_rasters_around_every_vector(
    source_data_dir=<SOURCE_DATA_DIR>,
    target_data_dir=<TARGET_DATA_DIR>,
    name=<OPTIONAL_NAME_FOR_SAVING>
    new_raster_size: RasterSize | None
    new_raster_size=512,
    target_raster_count=2,
    mode: "random")
cutter.cut()

The geographer.cutters.get_cutter_rasters_around_every_vector() function returns a geographer.cutters.DSCutterIterOverVectors instance. The cut() method will save the cutter to a JSON file in connector.connector_dir. To update the target dataset after the source dataset has grown, first read the JSON file and then run update():

from geographer.cutters import DSCutterIterOverVectors
dataset_cutter = DSCutterIterOverVectors.from_json_file(<path/to/saved.json>)
dataset_cutter.update()

Warning

The update method assumes that that no vectors or rasters that remain in the target dataset have been removed from the source dataset.