Prepare Dataset for Training ============================ Before training a model, ``.h5`` files must be created. Dataset Preparation ------------------- Before creating the ``.h5`` files, the dataset must be prepared. Users need to import grid files containing the data into a `Geoscience ANALYST `_ workspace. Then, the dataset can be created from a ``Python`` console. Users have to import the Python function ``prepare_dataset`` from the ``geo_unsup_mapper/utils/prepare_images_training`` module. Run the function as shown in the following code snippet. .. code-block:: python from geo_unsup_mapper.utils.prepare_images_training import prepare_dataset prepare_dataset( workspace="path/to/GA/Workspace.geoh5", # path to the GA workspace containing the data grid_names_data=[ # list of tuples containing the Grid2D/data pairs ("ontario_mag_200m_res", "mag_res"), ("ontario_grav_2km_bou", "grav_bouguer"), ("ontario_dem", "dem"), ("ontario_em", "em"), ("ontario_rad_250m_Th", "rad_th") ], tile=256, # size of the square tiles to generate path="path/to/saved/dataset.h5", # path to save the dataset name="my_dataset" # name to give to the dataset ) The ``prepare_dataset`` function will create a ``.h5`` file containing the dataset. It will find every coordinate in the images where tiles of a specified shape can be extracted without containing no-data values. The function will find all possible tiles for every data associated with the Grid2D objects. The function saves the images and the valid coordinates in a ``.h5`` file at the specified location. Training and Validation Datasets -------------------------------- To train a model, users must create both a training and a validation dataset. It is recommended to use data from different datasets, different geological contexts, and slightly different resolutions to ensure the generalization capabilities of the model.