Abstract
Automatic mapping of land cover in remote sensing data plays an increasingly significant role in several earth observation (EO) applications, such as sustainable development, autonomous agriculture, and urban planning. Due to the complexity of the real ground surface and environment, accurate classification of land cover types is facing many challenges. This thesis provides novel deep learning-based solutions to land cover mapping challenges such as how to deal with intricate objects and imbalanced classes in multi-spectral and high-spatial resolution remote sensing data.
The first work presents a novel model to learn richer multi-scale and global contextual representations in very high-resolution remote sensing images, namely the dense dilated convolutions' merging (DDCM) network. The proposed method is light-weighted, flexible and extendable, so that it can be used as a simple yet effective encoder and decoder module to address different classification and semantic mapping challenges. Intensive experiments on different benchmark remote sensing datasets demonstrate that the proposed method can achieve better performance but consume much fewer computation resources compared with other published methods.
Next, a novel graph model is developed for capturing long-range pixel dependencies in remote sensing images to improve land cover mapping. One key component in the method is the self-constructing graph (SCG) module that can effectively construct global context relations (latent graph structure) without requiring prior knowledge graphs. The proposed SCG-based models achieved competitive performance on different representative remote sensing datasets with faster training and lower computational cost compared to strong baseline models. 
The third work introduces a new framework, namely the multi-view self-constructing graph (MSCG) network, to extend the vanilla SCG model to be able to capture multi-view context representations with rotation invariance to achieve improved segmentation performance. Meanwhile, a novel adaptive class weighting loss function is developed to alleviate the issue of class imbalance commonly found in EO datasets for semantic segmentation. Experiments on benchmark data demonstrate the proposed framework is computationally efficient and robust to produce improved segmentation results for imbalanced classes.
To address the key challenges in multi-modal land cover mapping of remote sensing data, namely, 'what', 'how' and 'where' to effectively fuse multi-source features and to efficiently learn optimal joint representations of different modalities, the last work presents a compact and scalable multi-modal deep learning framework (MultiModNet) based on two novel modules: the pyramid attention fusion module and the gated fusion unit. The proposed MultiModNet outperforms the strong baselines on two representative remote sensing datasets with fewer parameters and at a lower computational cost. Extensive ablation studies also validate the effectiveness and flexibility of the framework.
 
Has part(s)
Paper I: Liu, Q., Kampffmeyer, M., Jenssen, R. & Salberg, A.B. (2020). Dense dilated convolutions’ merging network for land cover classification. IEEE Transactions on Geoscience and Remote Sensing, 58(9), 6309-6320. Published version not available in Munin due to publisher’s restrictions. Published version available at https://doi.org/10.1109/TGRS.2020.2976658. Accepted manuscript version available in Munin at https://hdl.handle.net/10037/20943.
Paper II: Liu, Q., Kampffmeyer, M. Jenssen, R. & Salberg, A.B. (2021). Self-constructing graph neural networks to model long-range pixel dependencies for semantic segmentation of remote sensing images. International Journal of Remote Sensing, 42(16), 6184-6208. Published version not available in Munin due to publisher’s restrictions. Published version available at https://doi.org/10.1080/01431161.2021.1936267.
Paper III: Liu, Q., Kampffmeyer, M., Jenssen, R. & Salberg, A.B. (2020). Multi-View self-constructing graph convolutional networks with adaptive class weighting loss for semantic segmentation. IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 199-205. (Accepted manuscript). Also available in Munin at https://hdl.handle.net/10037/23229. Published version available at https://doi.org/10.1109/CVPRW50498.2020.00030.
Paper IV: Liu, Q., Kampffmeyer, M., Jenssen, R. & Salberg, A.B. Multi-modal land cover mapping of remote sensing images using pyramid attention and gated fusion networks. (Submitted manuscript).