Prepare Dataset for Training

Before training a model, .h5 files must be created.

Dataset Preparation

Before creating the .h5 files, the dataset must be prepared. Users need to import grid files containing the data into a Geoscience ANALYST workspace.

Then, the dataset can be created from a Python console. Users have to import the Python function prepare_dataset from the geo_unsup_mapper/utils/prepare_images_training module. Run the function as shown in the following code snippet.

from geo_unsup_mapper.utils.prepare_images_training import prepare_dataset

prepare_dataset(
    workspace="path/to/GA/Workspace.geoh5",  # path to the GA workspace containing the data
    grid_names_data=[  # list of tuples containing the Grid2D/data pairs
        ("ontario_mag_200m_res", "mag_res"),
        ("ontario_grav_2km_bou", "grav_bouguer"),
        ("ontario_dem", "dem"),
        ("ontario_em", "em"),
        ("ontario_rad_250m_Th", "rad_th")
    ],
    tile=256,  # size of the square tiles to generate
    path="path/to/saved/dataset.h5",  # path to save the dataset
    name="my_dataset"  # name to give to the dataset
)

The prepare_dataset function will create a .h5 file containing the dataset. It will find every coordinate in the images where tiles of a specified shape can be extracted without containing no-data values. The function will find all possible tiles for every data associated with the Grid2D objects. The function saves the images and the valid coordinates in a .h5 file at the specified location.

Training and Validation Datasets

To train a model, users must create both a training and a validation dataset. It is recommended to use data from different datasets, different geological contexts, and slightly different resolutions to ensure the generalization capabilities of the model.