Training the Model¶

Before training a model, the dataset must be created as described in the dataset section.

The model can then be trained using a ui.json interface in Geoscience Analyst. The interface is divided into two tabs: General and Optional parameters. The General tab is shown in figure_train_uijson below.

train_uijson — *General tab of the training interface.*¶

Three options are available in this tab:

Training dataset: The *.h5 dataset created in the dataset section.

Validation dataset: The *.h5 dataset created in the dataset section.

Output folder: The folder where the model will be saved.

The Optional tab is shown in figure_train_optional_uijson below.

The tab is divided into different sections, containing several options to set the training parameters.

Training options

Name: The name of the model to train.

Batch size: The number of samples to use in each batch.

Epochs: The number of epochs to train the model.

Number of samples per epoch: The number of samples to use in each epoch.

Samples for training (optional): Randomly select a given number of samples from each dataset. This avoids data imbalance if one layer contains many more tiles than another.

Learning rate options

Learning rate: The initial learning rate of the optimizer (the Adam optimizer).

Factor regularizer: The regularizer factor for the input block (see the Model section).

Factor N-cut: The factor applied to the N-cut loss (see the Model section).

Factor variance: The factor applied to the variance loss (see the Model section).

AMSGrad: If checked, the AMSGrad optimizer is used.

Scheduler factor: The factor applied to the learning rate scheduler. The learning rate scheduler used is the ReduceLROnPlateau method.

Scheduler patience: The patience of the learning rate scheduler.

N-cut loss options

Radius: The radius for the N-Cut loss (see soft n-cut loss paper).

Sigma 1: The intensity sigma for the N-Cut loss (see soft n-cut loss paper).

Sigma 2: The spatial sigma for the N-Cut loss (see soft n-cut loss paper).

Network options

Pretrained model (optional): The path to a pretrained model to use.

Dropout: The dropout rate to use in the network.

Output channels: The number of output channels of the network generator.

Validation options

Validation step: The number of epochs between each validation.

Validation patience: The patience of the validation.

Samples for validation (optional): Randomly select a given number of samples from each dataset. This avoids data imbalance if one layer contains many more tiles than another.

Once you have set all the parameters, you can start the training by clicking on the OK button. The training will begin, and the model will be saved in the folder specified in the General tab. We highly recommend using a GPU to train the model. During training, at each validation epoch, the loss scores, results of the first validation batch, and the state of the model’s weights (as .pt files) are saved in the Output folder.

Training a model is a lengthy process, taking several hours or days depending on the dataset size, the number of epochs, and the computing capabilities of the machine it’s trained on. It also requires an iterative trial and error to find the best parameters for the model.

Training the Model¶

Geophysical Domains Clustering

Navigation

Related Topics