Prepare Dataset for Training
Before training a model, .h5
files must be created.
Dataset Preparation
Before creating the .h5
files, the dataset must be prepared. Users need to import grid files containing the data into a Geoscience ANALYST workspace.
Then, the dataset can be created from a Python
console. Users have to import the Python function prepare_dataset
from the geo_unsup_mapper/utils/prepare_images_training
module. Run the function as shown in the following code snippet.
from geo_unsup_mapper.utils.prepare_images_training import prepare_dataset
prepare_dataset(
workspace="path/to/GA/Workspace.geoh5", # path to the GA workspace containing the data
grid_names_data=[ # list of tuples containing the Grid2D/data pairs
("ontario_mag_200m_res", "mag_res"),
("ontario_grav_2km_bou", "grav_bouguer"),
("ontario_dem", "dem"),
("ontario_em", "em"),
("ontario_rad_250m_Th", "rad_th")
],
tile=256, # size of the square tiles to generate
path="path/to/saved/dataset.h5", # path to save the dataset
name="my_dataset" # name to give to the dataset
)
The prepare_dataset
function will create a .h5
file containing the dataset. It will find every coordinate in the images where tiles of a specified shape can be extracted without containing no-data values. The function will find all possible tiles for every data associated with the Grid2D objects. The function saves the images and the valid coordinates in a .h5
file at the specified location.
Training and Validation Datasets
To train a model, users must create both a training and a validation dataset. It is recommended to use data from different datasets, different geological contexts, and slightly different resolutions to ensure the generalization capabilities of the model.