Addressing Sample Inconsistency for Semisupervised Object Detection in Remote Sensing Images
Yuhao Wang, Lifan Yao, Gang Meng, Xinyue Zhang, Jiayun Song, Haopeng Zhang*
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (JSTARS), 2024
Abstract
BibTeX
The emergence of semisupervised object detection (SSOD) techniques has greatly enhanced object detection performance. SSOD leverages a limited amount of labeled data along with a large quantity of unlabeled data. However, there exists a problem of sample inconsistency in remote sensing images, which manifests in two ways. First, remote sensing images are diverse and complex. Conventional random initialization methods for labeled data are insufficient for training teacher networks to generate high-quality pseudolabels. Finally, remote sensing images typically exhibit a long-tailed distribution, where some categories have a significant number of instances, while others have very few. This distribution poses significant challenges during model training. In this article, we propose the utilization of SSOD networks for remote sensing images characterized by a long-tailed distribution. To address the issue of sample inconsistency between labeled and unlabeled data, we employ a labeled data iterative selection strategy based on the active learning approach. We iteratively filter out high-value samples through the designed selection criteria. The selected samples are labeled and used as data for supervised training. This method filters out valuable labeled data, thereby improving the quality of pseudolabels. Inspired by transfer learning, we decouple the model training into the training of the backbone and the detector. We tackle the problem of sample inconsistency in long-tail distribution data by training the detector using balanced data across categories. Our approach exhibits an approximate 1% improvement over the current state-of-the-art models on both the DOTAv1.0 and DIOR datasets.
@ARTICLE{10463140,
author={Wang, Yuhao and Yao, Lifan and Meng, Gang and Zhang, Xinye and Song, Jiayun and Zhang, Haopeng},
journal={IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing},
title={Addressing Sample Inconsistency for Semisupervised Object Detection in Remote Sensing Images},
year={2024},
volume={17},
number={},
pages={6933-6944},
keywords={Training;Remote sensing;Object detection;Measurement;Detectors;Tail;Labeling;Active learning;long-tailed distribution;remote sensing;semisupervised object detection (SSOD)},
doi={10.1109/JSTARS.2024.3374820}
}
Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval
Dingyi Hu, Zhiguo Jiang, Jun Shi, Fengying Xie, Kun Wu, Kunming Tang, Ming Cao, Jianguo Huai and Yushan Zheng *
Medical Image Analysis, 2024
Abstract
BibTeX
Code
Large-scale digital whole slide image (WSI) datasets analysis have gained significant attention in computer-aided cancer diagnosis. Content-based histopathological image retrieval (CBHIR) is a technique that searches a large database for data samples matching input objects in both details and semantics, offering relevant diagnostic information to pathologists. However, the current methods are limited by the difficulty of gigapixels, the variable size of WSIs, and the dependence on manual annotations. In this work, we propose a novel histopathology language-image representation learning framework for fine-grained digital pathology cross-modal retrieval, which utilizes paired diagnosis reports to learn fine-grained semantics from the WSI. An anchor-based WSI encoder is built to extract hierarchical region features and a prompt-based text encoder is introduced to learn fine-grained semantics from the diagnosis reports. The proposed framework is trained with a multivariate cross-modal loss function to learn semantic information from the diagnosis report at both the instance level and region level. After training, it can perform four types of retrieval tasks based on the multi-modal database to support diagnostic requirements. We conducted experiments on an in-house dataset and a public dataset to evaluate the proposed method. Extensive experiments have demonstrated the effectiveness of the proposed method and its advantages to the present histopathology retrieval methods. The code is available at https://github.com/hudingyi/FGCR.
@article{HU2024103163,
title = {Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval},
journal = {Medical Image Analysis},
volume = {95},
pages = {103163},
year = {2024},
issn = {1361-8415},
doi = {https://doi.org/10.1016/j.media.2024.103163},
url = {https://www.sciencedirect.com/science/article/pii/S1361841524000884},
author = {Dingyi Hu and Zhiguo Jiang and Jun Shi and Fengying Xie and Kun Wu and Kunming Tang and Ming Cao and Jianguo Huai and Yushan Zheng},
keywords = {CBHIR, Cross-modal, Diagnosis reports, Digital pathology},
abstract = {Large-scale digital whole slide image (WSI) datasets analysis have gained significant attention in computer-aided cancer diagnosis. Content-based histopathological image retrieval (CBHIR) is a technique that searches a large database for data samples matching input objects in both details and semantics, offering relevant diagnostic information to pathologists. However, the current methods are limited by the difficulty of gigapixels, the variable size of WSIs, and the dependence on manual annotations. In this work, we propose a novel histopathology language-image representation learning framework for fine-grained digital pathology cross-modal retrieval, which utilizes paired diagnosis reports to learn fine-grained semantics from the WSI. An anchor-based WSI encoder is built to extract hierarchical region features and a prompt-based text encoder is introduced to learn fine-grained semantics from the diagnosis reports. The proposed framework is trained with a multivariate cross-modal loss function to learn semantic information from the diagnosis report at both the instance level and region level. After training, it can perform four types of retrieval tasks based on the multi-modal database to support diagnostic requirements. We conducted experiments on an in-house dataset and a public dataset to evaluate the proposed method. Extensive experiments have demonstrated the effectiveness of the proposed method and its advantages to the present histopathology retrieval methods. The code is available at https://github.com/hudingyi/FGCR.}
}
Satellite Video Super-Resolution via Unidirectional Recurrent Network and Various Degradation Modeling
Xiaoyuan Wei, Haopeng Zhang*, Zhiguo Jiang
IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2024
Abstract
BibTeX
A Closed-Loop Network for Single Infrared Remote Sensing Image Super-Resolution in Real World
Haopeng Zhang*, Cong Zhang, Fengying Xie, Zhiguo Jiang
Remote Sensing, 2023
Abstract
BibTeX
Single image super-resolution (SISR) is to reconstruct a high-resolution (HR) image from a corresponding low-resolution (LR) input. It is an effective way to solve the problem that infrared remote sensing images are usually suffering low resolution due to hardware limitations. Most previous learning-based SISR methods just use synthetic HR-LR image pairs (obtained by bicubic kernels) to learn the mapping from LR images to HR images. However, the underlying degradation in the real world is often different from the synthetic method, i.e., the real LR images are obtained through a more complex degradation kernel, which leads to the adaptation problem and poor SR performance. To handle this problem, we propose a novel closed-loop framework that can not only make full use of the learning ability of the channel attention module but also introduce the information of real images as much as possible through a closed-loop structure. Our network includes two independent generative networks for down-sampling and super-resolution, respectively, and they are connected to each other to get more information from real images. We make a comprehensive analysis of the training data, resolution level and imaging spectrum to validate the performance of our network for infrared remote sensing image super-resolution. Experiments on real infrared remote sensing images show that our method achieves superior performance in various training strategies of supervised learning, weakly supervised learning and unsupervised learning. Especially, our peak signal-to-noise ratio (PSNR) is 0.9 dB better than the second-best unsupervised super-resolution model on PROBA-V dataset.
@Article{rs15040882,
AUTHOR = {Zhang, Haopeng and Zhang, Cong and Xie, Fengying and Jiang, Zhiguo},
TITLE = {A Closed-Loop Network for Single Infrared Remote Sensing Image Super-Resolution in Real World},
JOURNAL = {Remote Sensing},
VOLUME = {15},
YEAR = {2023},
NUMBER = {4},
ARTICLE-NUMBER = {882},
URL = {https://www.mdpi.com/2072-4292/15/4/882},
ISSN = {2072-4292},
DOI = {10.3390/rs15040882}
}
Kernel Attention Transformer for Histopathology Whole Slide Image Analysis and Assistant Cancer Diagnosis
Yushan Zheng, Jun Li, Jun Shi, Fengying Xie, Jianguo Huai, Ming Cao and Zhiguo Jiang*
IEEE Transactions on Medical Imaging(TMI), 2023
Abstract
BibTeX
Transformer has been widely used in histopathology whole slide image analysis. However, the design of token-wise self-attention and positional embedding strategy in the common Transformer limits its effectiveness and efficiency when applied to gigapixel histopathology images. In this paper, we propose a novel kernel attention Transformer (KAT) for histopathology WSI analysis and assistant cancer diagnosis. The information transmission in KAT is achieved by cross-attention between the patch features and a set of kernels related to the spatial relationship of the patches on the whole slide images. Compared to the common Transformer structure, KAT can extract the hierarchical context information of the local regions of the WSI and provide diversified diagnosis information. Meanwhile, the kernel-based cross-attention paradigm significantly reduces the computational amount. The proposed method was evaluated on three large-scale datasets and was compared with 8 state-of-the-art methods. The experimental results have demonstrated the proposed KAT is effective and efficient in the task of histopathology WSI analysis and is superior to the state-of-the-art methods.
@ARTICLE{10093771,
author={Zheng, Yushan and Li, Jun and Shi, Jun and Xie, Fengying and Huai, Jianguo and Cao, Ming and Jiang, Zhiguo},
journal={IEEE Transactions on Medical Imaging},
title={Kernel Attention Transformer for Histopathology Whole Slide Image Analysis and Assistant Cancer Diagnosis},
year={2023},
volume={42},
number={9},
pages={2726-2739},
keywords={Transformers;Histopathology;Feature extraction;Kernel;Cancer;Task analysis;Training;WSI;transformer;cross-attention;gastric cancer;endometrial cancer},
doi={10.1109/TMI.2023.3264781}}