geographer.downloaders package

Submodules

geographer.downloaders.base_download_processor module

Base class for processing a downloaded file.

class geographer.downloaders.base_download_processor.RasterDownloadProcessor(**data)[source]

Bases: ABC, BaseModel

Base class for download processors.

Return type:

None

abstract process(raster_name, download_dir, rasters_dir, return_bounds_in_crs_epsg_code, **params)[source]

Process a single download.

Return type:

dict[Union[Literal['raster_name', 'geometry', 'orig_crs_epsg_code'], str], Any]

Parameters:
  • raster_name (str) – Name of raster

  • download_dir (Path) – Directory containing download

  • rasters_dir (Path) – Directory to place processed raster in

  • crs_epsg_code – EPSG code of crs raster bounds should be returned in

  • params (Any) – Additional keyword arguments. Corresponds to the processor_params argument of the RasterDownloaderForVectors.download method.

  • return_bounds_in_crs_epsg_code (int)

Returns:

Contains information about the downloaded product. Keys should include: ‘raster_name’, ‘geometry’, ‘orig_crs_epsg_code’.

Return type:

return_dict

geographer.downloaders.base_downloader_for_single_vector module

Base class for downloaders for a single vector feature.

class geographer.downloaders.base_downloader_for_single_vector.RasterDownloaderForSingleVector(**data)[source]

Bases: ABC, BaseModel

Base class for downloaders for a single vector feature.

Return type:

None

abstract download(vector_name, vector_geom, download_dir, previously_downloaded_rasters_set, **params)[source]

Download (a series of) raster(s) for a single vector feature.

Return type:

dict[Union[Literal['raster_name', 'raster_processed?'], str], Any]

Parameters:
  • vector_name (str | int) – Name of vector feature

  • vector_geom (Polygon) – Geometry of vector feature

  • download_dir (Path) – Directory in which raw downloads are placed

  • previously_downloaded_rasters_set (set[str | int]) – Set of (names of) previously downloaded rasters

  • params (Any) – Additional keyword arguments. Corresponds to the downloader_params argument of the RasterDownloaderForVectors.download method.

Returns:

The corresponding value is a list of dicts containing (at least) the keys ‘raster_name’, ‘raster_processed?’, each corresponding to the entries of rasters for the row defined by the raster.

Return type:

Dict with a key ‘list_raster_info_dicts’

geographer.downloaders.downloader_for_vectors module

Download a targeted number of rasters per vector feature.

class geographer.downloaders.downloader_for_vectors.RasterDownloaderForVectors(**data)[source]

Bases: BaseModel, SaveAndLoadBaseModelMixIn

Class that downloads a targeted number of rasters per vector feature.

Parameters:
Return type:

None

field download_processor: RasterDownloadProcessor [Required]
field downloader_for_single_vector: RasterDownloaderForSingleVector [Required]
field temp_dir_relative_path: Union[Path, str] = 'temp_download_dir'
download(connector, vector_names=None, target_raster_count=1, filter_out_vectors_contained_in_union_of_intersecting_rasters=False, shuffle=True, downloader_params=None, processor_params=None)[source]

Download a targeted number of rasters per vector feature.

For each vector feature with fewer than target_raster_count rasters fully containing it, this function attempts to download additional rasters to meet the target. The new rasters are integrated into the dataset/connector immediately after downloading, updating the raster count for the vector feature before proceeding to the next feature.

Warning

The target number of downloads depends on target_raster_count and the current raster_count (number of rasters fully containing the vector feature). For vector features (e.g., polygons) too large to be fully contained in any raster, the raster_count will remain zero, and every call to this method will attempt to download target_raster_count rasters (or raster series). To avoid this, use the filter_out_vectors_contained_in_union_of_intersecting_rasters argument.

Parameters:
  • vector_names (str | int | list[int] | list[str] | None) – Optional vector_name or list of vector_names to download rasters for. Defaults to None, i.e. consider all vector features in connector.vectors.

  • downloader – One of ‘sentinel2’ or ‘jaxa’. Defaults, if possible, to previously used downloader.

  • target_raster_count (int) – Target for number of rasters per vector feature in the dataset after downloading. The actual number of rasters for each vector feature P that fully contain it could be lower if there are not enough rasters available or higher if after downloading num_target_rasters_per_vector rasters for P P is also contained in rasters downloaded for other vector features.

  • filter_out_vectors_contained_in_union_of_intersecting_rasters (bool) – Useful when dealing with ‘large’ vector features. Defaults to False.

  • shuffle (bool) – Whether to shuffle order of vector features for which rasters will be downloaded. Might in practice prevent an uneven distribution of the raster count for repeated downloads. Defaults to True.

  • downloader_params (dict[str, Any] | None) – (Optional) keyword arguments to pass to the downloader_for_single_vector.download. Corresponds to **params of download method of the the abstract base class RasterDownloaderForSingleVector. In particular, the keywords vector_name, vector_geom, download_dir, and previously_downloaded_rasters_set corresponding to the other arguments are not allowed.

  • processor_params (dict[str, Any] | None) – Optional additional keyword arguments passed to download_processor.process as **params. In particular, the keywords raster_name, download_dir, rasters_dir, and return_bounds_in_crs_epsg_code are not allowed.

  • connector (Path | str | Connector)

Returns:

None

Warning

In the case that the vector vector features are polygons it’s easy to come up with examples where the raster count distribution (i.e. distribution of rasters per polygon) becomes unbalanced particularly if num_target_rasters_per_vector is large. These scenarios are not necessarily very likely, but possible. As an example, if one wants to download say 5 rasters rasters for a polygon that is not fully contained in any raster in the dataset and if there does not exist a raster we can download that fully contains it but there are 20 disjoint sets of rasters we can download that jointly cover the polygon then these 20 disjoint sets will all be downloaded.

save(file_path)[source]

Save downloader.

By convention, the downloader should be saved to the connector subdirectory of the data directory it is supposed to operate on.

Parameters:

file_path (Path | str)

geographer.downloaders.eodag_downloader_for_single_vector module

SingleRasterDownloader for all providers supported by eodag.

In particular, this downloader can be used to obtain Sentinel-2 L2A data.

class geographer.downloaders.eodag_downloader_for_single_vector.DownloadParams[source]

Bases: dict

Parameters for the download method of an EOProduct.

Refer to the EOProduct documentation for more details: https://eodag.readthedocs.io/en/stable/api_reference/eoproduct.html

Some parameters of the EOProduct.download method should not be used:
  • product: Omitted because the value is determined by geographer.

  • progress_callback: Omitted because its values cannot easily

    be JSON serialized.

  • extract: Omitted because geographer requires the value of this

    kwarg to be True.

  • output_dir: Omitted because the value is determined by geographer.

  • asset: Omitted because it does not make sense for a downloader

    for a single vector.

  • output_extension: Omitted for simplicity’s sake.

This dictionary may include any of the following keys: - wait (int): The wait time in minutes between two download attempts. - timeout (int): The max time in minutes to retry downloading before stopping. - dl_url_params (dict[str, str]): Additional URL parameters to pass to the download URL. - delete_archive (bool): Whether to delete the downloaded archives after extraction.

clear() None.  Remove all items from D.
copy() a shallow copy of D
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values
class geographer.downloaders.eodag_downloader_for_single_vector.EodagDownloaderForSingleVector(**data)[source]

Bases: RasterDownloaderForSingleVector

Downloader for providers supported by eodag.

Refer to the eodag documentation at https://eodag.readthedocs.io/en/stable/ for more details on eodag.

Parameters:
  • eodag_kwargs (dict[str, Any])

  • eodag_setup_logging_kwargs (dict[str, Any])

Return type:

None

field eodag_kwargs: dict[str, Any] [Optional]

Optional kwargs defining an EODataAccessGateway instance. Possible keys are ‘user_conf_file_path’ to define a Path to the user configuration file and locations_conf_path to define a Path to the locations configuration file. See https://eodag.readthedocs.io/en/stable/api_reference/core.html#eodag.api.core.EODataAccessGateway.

field eodag_setup_logging_kwargs: dict[str, Any] [Optional]

Kwargs to be passed to eodag.utils.logging.setup_logging to set up eodag logging. See https://eodag.readthedocs.io/en/stable/api_reference/utils.html#eodag.utils.logging.setup_logging

download(vector_name, vector_geom, download_dir, previously_downloaded_rasters_set, *, search_kwargs=None, download_kwargs=None, properties_to_save=None, filter_property=None, filter_online=True, sort_by=None, suffix_to_remove=None)[source]

Download a raster for a vector feature using eodag.

Download a raster fully containing the vector feature, returns a dict in the format needed by the associator.

Return type:

dict

Parameters:
  • vector_name (str | int)

  • vector_geom (Polygon)

  • download_dir (Path)

  • previously_downloaded_rasters_set (set[str])

  • search_kwargs (SearchParams | None)

  • download_kwargs (DownloadParams | None)

  • properties_to_save (list[str] | None)

  • filter_property (dict[str, Any] | list[dict[str, Any]] | None)

  • filter_online (bool)

  • sort_by (str | tuple[str, Literal['ASC', 'DESC']] | None)

  • suffix_to_remove (str | None)

Note

The start, end, provider, items_per_page, and locations arguments correspond to kwargs of EODataAccessGateway.search_all (though the provider kwarg is only documented for the EODataAccessGateway.search). The descriptions are adapted from the official eodag documentation at https://eodag.readthedocs.io/en/latest/api_reference/core.html#eodag.api.core.EODataAccessGateway.

Parameters:
  • vector_name (str | int) – name of vector feature

  • vector_geom (Polygon) – Geometry of vector feature

  • download_dir (Path) – Directory Sentinel-2 products will be downloaded to.

  • previously_downloaded_rasters_set (set[str]) – Set of already downloaded products.

  • search_kwargs (SearchParams | None) – Keyword arguments for the search_all method of an EODataAccessGateway, excluding “geom”. Refer to the docstring of SearchParams for more details.

  • download_kwargs (DownloadParams | None) – Keyword arguments for the download` method of an EOProduct, excluding certain keys. Refer to the docstring of DownloadParams for more details.

  • properties_to_save (list[str] | None) – List of property keys to extract and save from an EOProduct’s properties dictionary. Values that cannot be stored in a GeoDataFrame will be replaced with the string “__DUMMY_VALUE__”.

  • filter_property (dict[str, Any] | list[dict[str, Any]] | None) – Kwargs or list of kwargs defining criteria according to which products should be filtered. These correspond exactly to kwargs for the EODataAccessGateway.filter_property method. Refer to https://eodag.readthedocs.io/en/stable/plugins_reference/generated/eodag.plugins.crunch.filter_property.FilterProperty.html#eodag.plugins.crunch.filter_property.FilterProperty # noqa for more details.

  • filter_online (bool) – Whether to filter the results to include only products that are online.

  • sort_by (str | tuple[str, Literal['ASC', 'DESC']] | None) – (Optional) A string or tuple like (“key”, “ASC”|”DESC”) by which to sort the results. If a string is provided, it will be interpreted as (“key”, “ASC”).

  • suffix_to_remove (str | None) – (Optional) A suffix to strip from the downloaded EOProduct’s file name. The resulting .tif raster will use the modified file name (if applicable) with “.tif” appended.

Returns:

A dictionary containing information about the rasters. ({‘list_raster_info_dicts’: [raster_info_dict]})

Raises:
  • ValueError – Raised if an unkknown product type is given.

  • NoRastersForPolygonFoundError – Raised if no downloadable rasters

  • could be found for the vector feature.

Return type:

dict

property eodag: EODataAccessGateway

Get eodag.

class geographer.downloaders.eodag_downloader_for_single_vector.SearchParams[source]

Bases: dict

Parameters for the search_all method of an EODataAccessGateway.

Note

The geom parameter of the EODataAccessGateway.search_all method is omitted, because its value is determined as a geographer argument.

See See https://eodag.readthedocs.io/en/latest/api_reference/core.html#eodag.api.core.EODataAccessGateway.search_all. # noqa for more details on most of the arguments below.

This dictionary may include the following keys: - start (str | None): Start sensing time in ISO 8601 format (e.g. “1990-11-26”, “1990-11-26T14:30:10.153Z”, “1990-11-26T14:30:10+02:00”, …). If no time offset is given, the time is assumed to be given in UTC. - end (str | None): End sensing time in ISO 8601 format (e.g. “1990-11-26”, “1990-11-26T14:30:10.153Z”, “1990-11-26T14:30:10+02:00”, …). If no time offset is given, the time is assumed to be given in UTC. - provider (str | None): The provider to be used. If set, search fallback will be disabled. If not set, the configured preferred provider will be used at first before trying others until finding results. See https://eodag.readthedocs.io/en/stable/_modules/eodag/api/core.html#EODataAccessGateway.search. # noqa - items_per_page (int | None): Number of items to retrieve per page. - locations (dict[str, str] | None): Location filtering by name using locations configuration {“<location_name>”=”<attr_regex>”}. For example, {“country”=”PA.”} will use the geometry of the features having the property ISO3 starting with ‘PA’ such as Panama and Pakistan in the shapefile configured with name=country and attr=ISO3. - In addition, the dictionary may contain any other keys (except geom) compatible with the provider.

clear() None.  Remove all items from D.
fromkeys(value=None, /)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values

geographer.downloaders.jaxa_download_processor module

RasterDownloadProcessor for JAXA downloads.

class geographer.downloaders.jaxa_download_processor.JAXADownloadProcessor(**data)[source]

Bases: RasterDownloadProcessor

RasterDownloadProcessor for JAXA downloads.

Return type:

None

process(raster_name, download_dir, rasters_dir, return_bounds_in_crs_epsg_code)[source]

Process a downloaded JAXA file.

Return type:

dict

Parameters:
  • raster_name (str) – raster name

  • download_dir (Path) – download directory

  • rasters_dir (Path) – rasters directory

  • return_bounds_in_crs_epsg_code (int) – EPSG code of crs to return raster bounds in

Returns:

raster_info_dict containing information about the raster

geographer.downloaders.jaxa_downloader_for_single_vector module

RasterDownloaderForSinglePolygon for JAXA DEM data.

Downloads digital elevation model (DEM) data from jaxa.jp’s ALOS data-source.

See here https://www.eorc.jaxa.jp/ALOS/en/index.htm for an overview of the ALOS data. A detailed product description for ALOS (file-format, etc) can be found in: https://www.eorc.jaxa.jp/ALOS/en/aw3d30/aw3d30v3.2_product_e_e1.0.pdf The data is assumed to be stored on the FTP server: ftp://ftp.eorc.jaxa.jp/pub/ALOS/ext1/AW3D30/release_vXXXX/ (port: 46287)

There are different versions of the ALOS data: 1804, 1903, 2003, 2012. Only the 1804 version has been tested.

class geographer.downloaders.jaxa_downloader_for_single_vector.JAXADownloaderForSingleVector(**data)[source]

Bases: RasterDownloaderForSingleVector

Download JAXA DEM (digital elevation) data.

Return type:

None

download(vector_name, vector_geom, download_dir, previously_downloaded_rasters_set, *, data_version=None, download_mode=None)[source]

Download JAXA DEM data for a vector feature.

Download DEM data from jaxa.jp’s ftp-server for a given vector feature and returns dict-structure compatible with the connector.

Return type:

dict[Union[Literal['raster_name', 'raster_processed?'], str], Any]

Parameters:
  • vector_name (str | int)

  • vector_geom (BaseGeometry)

  • download_dir (Path)

  • previously_downloaded_rasters_set (set[str | int])

  • data_version (str | None)

  • download_mode (str | None)

Warning

The downloader has only been tested for the 1804 jaxa_data_version.

Explanation:

The ‘bboxvertices’ download_mode will download rasters for vertices of the bbox of the (vector) geometry. This is preferred for small (vector) geometries, but will miss regions inbetween if a (vector) geometry spans more than two rasters in each axis. The ‘bboxgrid’ mode will download rasters for each point on a grid defined by the bbox. This overshoots for small geometries, but works for large geometries.

Parameters:
  • vector_name (str | int) – the name of the vector geometry

  • vector_geometry

  • download_dir (Path) – directory that the raster file should be downloaded to

  • data_version (str | None) – One of ‘1804’, ‘1903’, ‘2003’, or ‘2012’. 1804 is the only version that has been tested. Defaults if possible to whichever choice you made last time.

  • download_mode (str | None) – One of ‘bboxvertices’, ‘bboxgrid’. Defaults if possible to whichever choice you made last time.

  • vector_geom (BaseGeometry)

  • previously_downloaded_rasters_set (set[str | int])

Returns:

dict of dicts according to the connector convention (containing list_raster_info_dict).

Raises:
  • log.warning – when a file cannot be found or opened on jaxa’s-ftp

  • (download_exception = 'file_not_available_on_JAXA_ftp')

Return type:

dict[Literal[‘raster_name’, ‘raster_processed?’] | str, ~typing.Any]

geographer.downloaders.sentinel2_download_processor module

RasterDownloadProcessor for Sentinel-2 data from Copernicus Sci-hub.

Should be easily extendable to Sentinel-1.

class geographer.downloaders.sentinel2_download_processor.Sentinel2SAFEProcessor(**data)[source]

Bases: RasterDownloadProcessor

Processes downloads of L2A Sentinel-2 SAFE files.

Return type:

None

process(raster_name, download_dir, rasters_dir, return_bounds_in_crs_epsg_code, *, resolution, delete_safe, file_suffix='.SAFE', nodata_val=0)[source]

Process Sentinel-2 download.

Extract downloaded sentinel-2 zip file to a .SAFE directory, then process/convert to a GeoTiff raster, delete the zip file, put the GeoTiff raster in the right directory, and return information about the raster in a dict.

Return type:

dict

Parameters:
  • raster_name (str)

  • download_dir (Path)

  • rasters_dir (Path)

  • return_bounds_in_crs_epsg_code (int)

  • resolution (int)

  • delete_safe (bool)

  • file_suffix (str)

  • nodata_val (int)

Warning

Tested with the cop_dataspace eodag provider. It should also work with ‘creodias’, ‘onda’, and ‘sara’, which have an archive_depth of 2. For providers with a different archive_depth, the processor may need adjustments to locate the SAFE file correctly based on the raster name.

Parameters:
  • raster_name (str) – The name of the raster.

  • download_dir (Path) – The dir containing the SAFE file to be processed.

  • rasters_dir (Path) – The dir in which the .tif output file should be placed.

  • return_bounds_in_crs_epsg_code (int) – The EPSG of the CRS in which the bounds of the raster should be returned.

  • resolution (int) – The desired resolution of the output tif file.

  • delete_safe (bool) – Whether to delete the SAFE file after extracting the tif file.

  • file_suffix (str) – Possible suffix by which the stem of the raster_name and the downloaded SAFE file to be processed differ. If used together with the EodagDownloaderForSingleVector for the ‘cop_dataspace’ provider and the RasterDownloaderForVectors and the downloader_params parameter dict of the RasterDownloaderForVectors.download method contains a “suffix_to_remove: “.SAFE” pair then the default value of “.SAFE” for the file_suffix will result in nicer tif names, e.g. S2B_MSIL2A_20231208T013039_N0509_R074_T54SUE_20231208T031743.tif instead of S2B_MSIL2A_20231208T013039_N0509_R074_T54SUE_20231208T031743.SAFE.tif. # noqa

  • nodata_val (int) – The nodata value to fill. Defaults to 0.

Returns:

Contains information about the downloaded product.

Return type:

return_dict

geographer.downloaders.sentinel2_safe_unpacking module

Unpack/convert sentinel-2 SAFE files to GeoTiffs.

geographer.downloaders.sentinel2_safe_unpacking.safe_to_geotif_L2A(safe_root, resolution, upsample_lower_resolution=True, outdir=None, TCI=True, requested_jp2_masks=['CLDPRB', 'SNWPRB'], requested_gml_mask=[('CLOUDS', 'B00')], nodata_val=0)[source]

Convert a L2A-level Sentinel-2 .SAFE file to a GeoTIFF.

The GeoTIFF contains raster bands derived from the .SAFE file, including: - True color composite (TCI) bands if requested. - JP2 masks (e.g., cloud or snow masks) at the desired resolution. - Additional GML masks if available.

Return type:

dict

Parameters:
  • safe_root (Path)

  • resolution (str | int)

  • upsample_lower_resolution (bool)

  • outdir (Path | None)

  • TCI (bool)

  • requested_jp2_masks (list[str])

  • requested_gml_mask (list[tuple[str, str]])

  • nodata_val (int)

Warning

Sentinel-2 L2A products dated later than October 2021 no longer include GML masks.

Note

  • The GeoTIFF bands are ordered as follows:

    1. True Color Composite (TCI) (optional):

      Red, Green, Blue (if TCI=True).

    2. Spectral Bands: JP2 data bands at the target resolution,

      optionally including upsampled lower-resolution bands if upsample_lower_resolution=True.

    3. JP2 Masks: Added in the order specified by requested_jp2_masks

      (e.g., "CLDPRB", "SNWPRB"). Masks are limited to a maximum resolution of 20m.

    4. GML Masks: Rasterized from requested_gml_mask, with

      empty bands added for missing masks.

  • jp2_masks are only available up to a resolution of 20 m, so for 10m the 20m

    mask ist taken

  • "SNWPRB" for snow masks

Parameters:
  • safe_root (Path) – Path to the root directory of the .SAFE file.

  • resolution (str | int) – Desired resolution for the GeoTIFF (10, 20, or 60 meters).

  • upsample_lower_resolution (bool) – If True, includes lower-resolution bands and upsamples them to match the target resolution. Defaults to True.

  • outdir (Path | None) – Directory where the GeoTIFF will be saved. If None, saves the file in the parent directory of safe_root. Defaults to None.

  • TCI (bool) – Whether to include true color raster bands (TCI). Defaults to True.

  • requested_jp2_masks (list[str]) – List of JP2 masks to include in the output. Defaults to [“CLDPRB”, “SNWPRB”].

  • requested_gml_mask (list[tuple[str, str]]) – List of GML masks to include. Each tuple contains the mask name (e.g., “CLOUDS”) and the associated band (e.g., “B00”). Defaults to [(“CLOUDS”, “B00”)].

  • nodata_val (int) – Value to use for no-data areas in the GeoTIFF. Defaults to 0.

Returns:

A dictionary containing:
  • crs_epsg_code (int):

    The EPSG code of the CRS.

  • raster_bounding_rectangle (shapely.geometry.Polygon):

    The bounding rectangle of the output GeoTIFF.

Return type:

dict

Raises:
  • AssertionError – If resolution is not one of the supported values (10, 20, 60).

  • RasterioIOError – If there are issues reading or processing the JP2/GML files.

Module contents