Cluster rastersΒΆ

To get a list of the raster clusters that need to be respected in the train/validation split to avoid data leakage use the geographer.utils.cluster_rasters.get_raster_clusters() function.

Note

If you just naively split your rasters into a train and a validation set there might be data leakage. Some vector features might intersect several rasters. Also, rasters can overlap and there might be vector features in the overlaps. The clusters returned by geographer.utils.cluster_rasters.get_raster_clusters() are the minimal clusters of rasters that need to be consistently assigned to the train or validation splits to avoid data leakage.

from geographer.utils.cluster_rasters import get_raster_clusters
clusters : list[Set[str]] = get_raster_clusters(
    connector=connector,
    clusters_defined_by='rasters_that_share_vectors',
    preclustering_method='y then x-axis'
)

The clusters_defined_by argument defines how clusters are defined. It must be one of "rasters_that_share_vectors" or "rasters_that_share_vectors_or_overlap". Setting the optional preclustering_method argument speeds up clustering and is recommended.