Algorithms
==========
The Geophysical Domains Mapper leverages convolutional neural networks, as explained in the `theory`_ section. It adopts the `W-net`_ architecture, which is detailed below. This model should first be trained, and then it can be used to segment new data.
.. _theory: global.rst
.. _wnet_model:
Model
~~~~~
The `W-net`_ architecture is presented in :numref:`adapted_wnet`. The model consists of two main parts: the encoder and the decoder, both utilizing the `U-net`_ architecture. The first U-net encodes the input image into a latent representation, which is an image of the same size but with multiple channels. The decoder then reconstructs the input image using this latent representation.
During training, the network weights are adjusted to minimize the difference between the reconstructed image and the input image. Additionally, a statistical constraint called `soft N-cut loss`_ is applied to the embedding space. This loss penalizes the network if the latent representations are not spatially coherent.
An additional **variance** constraint has been added to the loss function. During the training, the variance of the input image within each domain is calculated, and the mean of this variance is added to the loss function. This forces the network to create clusters with similar variance, which is essential for the segmentation of geophysical data.
Finally, an additional **regularizer** term is included in the loss function to ensure the representations generated by the first convolutional block are diverse. This forces the network to decompose all the information available in the input image.
.. _adapted_wnet:
.. figure:: ../../images/algorithm/W_net_adapted.png
:align: center
:scale: 25%
*W-Net architecture adapted for Geophysical Domains Mapper.*
Some changes were made to the original `W-net`_ architecture to adapt it to the Geophysical Domains Mapper. The main changes are:
- *Input Block*: An input convolutional block is added to the encoder. This block allows the network to learn more complex features from the input images. During inference, the input block is used to concatenate the input images if several data types are passed.
- *Attention Mechanism*: The original `W-net`_ architecture uses a `U-net`_ architecture with skip connections. In the Geophysical Domains Mapper, an attention mechanism is added to the skip connections. This mechanism allows the network to focus on the most relevant information at each scale.
- *Residual Blocks*: The original `W-net`_ architecture uses a `U-net`_ architecture with convolutional blocks. In the Geophysical Domains Mapper, residual blocks are used instead of convolutional blocks.
- *Leaky ReLU*: The original `W-net`_ architecture uses ReLU activation functions. In the Geophysical Domains Mapper, Leaky ReLU activation functions are used instead.
.. _W-net: https://arxiv.org/abs/1711.08506
.. _U-net: https://link.springer.com/chapter/10.1007/978-3-319-24574-4_28
.. _soft N-cut loss: https://arxiv.org/abs/1804.01346
.. _training_procedure:
Training Procedure
~~~~~~~~~~~~~~~~~~
The performance achieved by a model depends on the training process. The goal of training is to find optimal weights for the network that enable it to generalize well to new data (see `theory`_). These weights are then saved in a file and can be loaded to use the model for inference. This section explains how the default weights, provided in a ``.pt`` file, were obtained.
Training Dataset
----------------
To train the model, we use a dataset composed of geophysical data from different regions in Canada. The dataset is divided into two parts: the training dataset and the validation dataset. The training dataset is used to adjust the model's weights, while the validation dataset is used to evaluate the model's performance. This dataset is presented in :numref:`figure_dataset`. The details of the different used layers are presented in the `dataset table`_.
The training dataset is composed of airborne magnetic, gravity, radiometric, and elevation data compilations from Quebec and Western Canada (area overlapping British Colombia, Alberta, Saskatchewan and Northwest Territories). It also contains electromagnetic conductance data from Quebec's Abitibi area and New Brunswick. The validation dataset consists of airborne magnetic, gravity, radiometric, elevation, and electromagnetic conductance data compilations from Central Canada (area overlapping Ontario and Manitoba).
.. _figure_dataset:
.. figure:: ../../images/algorithm/training_validation_dataset.png
:align: center
:scale: 20%
*The training and validation dataset locations and the magnetic data compilations used.*
.. _dataset table:
.. list-table::
:header-rows: 1
* - Name
- Usage
- Provenance
- Resolution
- Source
* - Airborne Magnetic
- Training
- Québec
- 75 m
- `SIGEOM `_
* - Radiometric Thorium
- Training
- Québec
- 75 m
- `SIGEOM `_
* - Bouguer Gravity
- Training
- Québec
- 2000 m
- `NRCAN `_
* - Elevation
- Training
- Québec
- 30 m
- `SRTMGL1 `_
* - Electromagnetic Conductance
- Training
- Québec
- 50 m
- `SIGEOM `_
* - Airborne Magnetic
- Training
- Western Canada
- 200 m
- `NRCAN `_
* - Bouguer Gravity
- Training
- Western Canada
- 2000 m
- `NRCAN `_
* - Elevation
- Training
- Western Canada
- 30 m
- `SRTMGL1 `_
* - Radiometric Thorium
- Training
- Western Canada
- 250 m
- `NRCAN `_
* - Electromagnetic Conductance
- Training
- New Brunswick
- 50 m
- `NRCan `_
* - Airborne Magnetic
- Validation
- Central Canada
- 200 m
- `NRCAN `_
* - Radiometric Thorium
- Validation
- Central Canada
- 200 m
- `NRCAN `_
* - Bouguer Gravity
- Validation
- Central Canada
- 2000 m
- `NRCAN `_
* - Elevation
- Validation
- Central Canada
- 30 m
- `SRTMGL1 `_
* - Electromagnetic Conductance
- Validation
- Ontario
- 50 m
- `NRCan `_
Data preprocessing
------------------
To train the model, the data presented above has been split into patches of 256x256 pixels. All possible tiles containing no no-data values are potentially used. During training, new tiles (1000) are randomly selected at each epoch.
For validation, a few hundred tiles have been randomly selected from each domain. These tiles are evaluated every 10 epochs to assess the model's performance.
A data augmentation process is randomly applied to the training tiles. This process includes the following transformations, as referenced from torchvision (`torchvision.transforms `_):
- Randomly flipping the input images horizontally and vertically (``RandomHorizontalFlip``, ``RandomVerticalFlip``)
- Randomly cropping the images (``RandomResizedCrop``)
- Applying a random Gaussian filter (``GaussianBlur``)
- Adjusting the sharpness (``RandomAdjustSharpness``)
- Inverting the values of the images (``RandomInvert``)
- Applying elastic deformation (``ElasticTransform``)
These transformations aim to increase the diversity of the training dataset and make the model more robust to different types of noise, as discussed in the `theory`_ **Inference** section.
To normalize the data and ensure extreme values are not too high, each image is normalized at the entrance of the model by the mean and standard deviation. Values above or below the standard deviation (-1 and 1) are passed through a *tanh* function. Finally, the values are rescaled between 0 and 1.
Training the network
--------------------
The `model`_ is trained using a loss function composed of three terms: the reconstruction loss, the soft N-cut loss, and the diversity loss, as described in the `model`_ section.
As the process is unsupervised, having a validation metric to stop the training and avoid overfitting is a challenge. Every 10 epochs, the mean variance within the found domains is computed on the validation dataset. If this value worsens, the training is stopped, and the model with the best validation loss is saved. This process ensures the model has good generalization capacity and does not overfit the specificities of the training dataset.
The parameters used to obtain the model are presented in the `parameters`_ table. These parameters are explained in the `training`_ chapter, which details the training process.
.. _training: ../uijson/training.rst
.. _parameters:
.. list-table::
:header-rows: 1
* - Parameter
- Value
* - Batch size
- 2
* - Number of samples per epoch
- 1000
* - Samples for training
- 100000
* - Learning rate
- 0.0001
* - Factor regularizer
- 0.00001
* - AMSGrad
- ``False``
* - Factor N-cut
- 100
* - Factor variance
- 1.0
* - Scheduler factor
- 0.5
* - Scheduler patience
- 10
* - N-cut radius
- 5
* - N-cut intensity weight
- 10
* - N-cut spatial weight
- 4
* - Dropout
- 0.6
* - Output channels
- 40
* - Validation step
- 10
* - Validation patience
- 20
* - Samples for validation
- 200
.. _inference_mechanisms:
Inference Mechanisms
~~~~~~~~~~~~~~~~~~~~
.. warning::
Due to the stochastic nature of the model, running the inference several times on the same data can yield different results.
Once the model is trained, it can be used to obtain the segmentation of new data. The process to obtain a segmentation from the model is presented in the :numref:`figure_inference`. The different steps are described below.
.. _figure_inference:
.. figure:: ../../images/algorithm/inference.png
:align: center
:scale: 40%
*Inference process: from input data, extract the latent representation, apply NMF, CRF, and obtain the segmentation.*
*W-net* - Generating Latent Representation
------------------------------------------
Once the model is trained, it can generate latent representations of new data. The latent representation is a multi-channel image that can be used to classify the pixels of the input image into different domains. As described in `model`_, the latent representations used for segmentation are obtained by passing the input data through the encoder part of the network.
Multiple-data Input
^^^^^^^^^^^^^^^^^^^
Even though the model processes one data layer at a time during training, it can receive several layers simultaneously during inference. The input data is normalized and passed through the input block of the encoder. The input block concatenates the input data along the channel dimension into a 64-dimensional latent space.
*NMF* - Dimensionality Reduction
--------------------------------
The latent representations obtained by the network are multi-channel images. However, many of these channels can be redundant. To reduce the dimensionality of the latent representation, `Non-negative Matrix Factorization `_ (NMF) is applied, as presented in :numref:`figure_inference`. This process reduces the number of channels until more than 98% of the variance is explained.
NMF is a technique used to break down a large matrix into smaller, meaningful parts by factoring it into two smaller matrices. These matrices represent underlying patterns or components within the data, with all values being non-negative. This process helps reveal hidden structures, making it useful for applications like text analysis, image processing, and more.
*CRF* - Conditional Random Fields
---------------------------------
As in the original `W-net`_ implementation, a Conditional Random Field (CRF) is applied to the NMF representations using the `pydensecrf `_ library. This fully connected CRF model helps smooth the classification results and improve spatial coherence, as presented in :numref:`figure_inference`.
In the context of segmentation, a **DenseCRF 2D** model refines the network's classification output by improving accuracy and coherence. It smooths the segmentation map by encouraging nearby pixels with similar colors to belong to the same class while maintaining sharp boundaries where there are significant color differences. This results in more accurate and visually coherent segmentation maps.
Segmentation
------------
To obtain the final segmentation, users can choose between two options:
*1* - No Disjoint Segments
^^^^^^^^^^^^^^^^^^^^^^^^^^
Non-adjacent segments of a same class are separated into different classes.
This is done in two steps: first, the output of the CRF is used to find the different classes.
Then, the output is segmented by separating non-continuous areas into different classes.
*2* - Clustering
^^^^^^^^^^^^^^^^
Use a given number of clusters to group the data. The clustering is applied to the CRF output using the `k-means `_ algorithm.
Users can set the desired number of clusters, or rely on an optimal number of clusters.
Optimal Number of Clusters
""""""""""""""""""""""""""
This option finds the optimal number of clusters that best separate the data,
using the `Silhouette score `_.
It computes the average silhouette score for different cluster numbers. The cluster number with the highest silhouette score is considered the best.
However, this process is time-consuming because it involves running the k-means algorithm for a range of cluster numbers until the best solution is found.
Remove Small Areas
------------------
An option allows users to remove areas smaller than a given threshold. All areas smaller than the threshold are removed from the final output. If the "*remove small area threshold*" is set to 0, no area is removed.