Single Image Dehazing: An Analysis on Generative Adversarial Network ()
1. Introduction
Due to the appearance of multiple atmospheric aerosols i.e. fog, dust, fumes and other particles, which reduces visibility is generally known as haze. Hazy images are responsible for several visibility problems by making most commonly for outdoor scenes blur. Several computer vision applications, like object detection, video surveillance, object tracking, remote sensing, autonomous driving, are collapsed because of haze. Sometimes, this leads to serious accidents in bad weather conditions. To overcome such complications, it is necessary to dehaze the degraded images. Image dehazing is a preprocessing technique that generates dehazed images purified from corresponding hazy ones, captured in bad weather. Image dehazing extracts some major contexts from hazy images using computer vision algorithms, trained on clear images.
Image dehazing techniques can be broadly divided into three categories; they are multiple images dehazing, polarizing filter-based dehazing and single image dehazing. Among them, the first two are not applicable in real-world problems as well as real-time applications because several filters are required to simulate the change in different weather conditions. Also, they are not efficient in obtaining spare information about hazy scene through a single image. For these reasons, researchers attempted different approaches using single image dehazing with additional geometrical or depth information.
Single image dehazing is a quite challenging task as a single image contains insufficient information. Most of the previous solutions were handcrafted priors dependent due to this limitation. Recently convolutional neural networks (CNNs) along with advanced image filters are used to learn haze-related priors. Also, generative Adversarial Networks (GANs), introduced by Goodfellow [1] have shown better performance for image dehazing via image generation and manipulation. It is also capable of generating an output distribution for a given noise distribution as an input. As a result, it is possible to generate diverse haze scenarios through GAN. There are different GAN models that are developed for this purpose. However, it is an immediate demand for how these models would perform on hazy images in real situations. Therefore, the main objective of this paper is to analyze the success and explore whether these GAN models will perform in hazy situations.
The main contributions of this work are as follows:
i) Analysis of the working of the four state-of-the-art GAN models, such as AOD-Net, cGAN, and DHSGAN.
ii) Evaluation of the accuracy and effectiveness by using benchmark datasets consisting of both synthetic and real-world hazy images.
iii) Putting some recommendations for future research.
The remainder of this paper is structured as follows: Section 2 presents a brief survey of the related work. Section 3 highlights GAN-based methods for dehazing. Section 4 describes the datasets, experimental results, and discussions. Finally, conclusions are drawn in Section 5.
2. Related Work
To remove the effect of haze on the images, researchers attempted different methods earlier which mainly based on either image enhancement algorithms or model-based haze removal algorithms. Recently, they concentrate their attention on deep learning especially GAN to explore how well it performs the task of haze removal, inspired by the outstanding results of CNN and GAN in high-level vision tasks, such as image classification, image understanding, and deblurring, etc. [2] - [28]. In fact, by default, a deep learning-based approach is always superior to the classical approaches, as it uses deep features rather than superficial features. Therefore a variety of deep leaning-based approaches have been proposed to overcome the degradation caused by haze concerning both single image dehazing and video or multiple frame-based dehazing.
Cheng et al. [4] presented a CNN based dehazing method, inferring color priors based on extracted semantic features from a single image. Their model implemented on both synthetic and real-world hazy images and obtained better performance by recovering clean images from challenging scenarios with strong ambiguity. However, this model is not yet trained with a wider range of images of natural outdoor scenes.
Li et al. [5] introduced a flexible cascaded CNN that jointly estimated the transmission map and the atmospheric light. Their model outperformed other state-of-the-art models for synthetic and real-world hazy images. But they did not investigate end-to-end networks for image dehazing.
Rashid et al. [6] presented a CNN based encoder and decoder architecture, eliminating multiple dehazing obstacles using high-intensity pixel value for single image dehazing. Their model provided more efficient results than the previous results but it should be elongated to dehaze images without having scattered shades and is capable of running for all cases.
Ren et al. [7] worked on a multi-scale deep neural network by estimating hazy images along with medium transmission maps. Their algorithm applied to the NYU Depth dataset and showed better efficacy compared with the state-of-the-art results for both synthetic and real-world hazy images based on quality and speed.
Yeh et al. [8] proposed a deep CNN architecture for dehazing images through image restoration without mapping each pair of hazy images and its corresponding ground truth. The method outperformed other state-of-the-art dehazing algorithms; however, it is a time-consuming process to decompose an input hazy image and to extract detail components.
Song et al. [9] presented a new ranking-CNN model, which is capable of learning haze-relevant features automatically. The proposed method obtained more effective results for both synthetic and real-world data against classical CNN, but its efficiency should be improved further by reducing redundant computations.
Goncalves et al. [10] illustrated an end-to-end CNN model, resulting in a more generic method without requiring any additional parameters. It introduced novel guided layers that adjusted the network weights using the guided filter and restored dehazed images by reducing structural information loss. This method showed outstanding performance by reducing spatial information loss, compared to other machine learning models from a qualitative and quantitative perspective.
Dehazenet [11] and AOD-Net (All-in-One Dehazing Network) [12] show promising performance in single image dehazing using higher priors and assumptions. However, the atmospheric scattering model should be learned with a deep neural network to directly optimize the haze and corresponding dehaze images via an end-to-end mapping without estimating the medium transmission map.
Valeriano et al. [13] presented a comparison using CHIC database [14] [15] among Dehazenet, dark-channel prior (DCP), FAST and CLAHE methods [2] [16] [17]. DCP estimated the transmission map using the dark channel to invert the Koschmieder model, FAST estimated an atmospheric veil responsible for the variation in the intensity of images, and CLAHE introduced contrast-limited adaptive histogram equalization.
A robust end-to-end convolution model, known as de-haze and smoke GAN (DHSGAN) [18] is used for dehazing and desmoking, trained under a GAN architecture to effectively recapture indoor as well as outdoor haze-free scenes from different image degradation scenarios i.e. fog, smoke, mist, fumes, haze and so on.
Suarez et al. [19] presented a stacked conditional GAN model to remove haze degradations in RGB images including fast training convergence and a homogeneous model for generalization. It obtained high-quality dehazed images but requires ground truth dehazed images for training.
Dudhane and Murala [20] introduced a cycle-consistent GAN architecture known as CDNet that examined on four datasets, such as D-HAZY [21], Imagenet [22], SOTS [23] and real-world images and obtained superior results.
Li et al. [24] proposed a conditional GAN (cGAN) algorithm to recover clear images from hazy images directly by an end-to-end architecture including a trainable encoder and a decoder. For better results, they modified the basic cGAN by including the VGG features with an L1-regularized gradient prior. It outperformed other state-of-the-art models for synthetic and real hazy images.
Raj and Venkateswaran [25] proposed a conditional GAN for dehazing without explicitly estimating the transmission map or haze relevant features and replaced the classic U-Net [26] with the Tiramisu model [27]. It obtained better efficiency and performance for both synthetic and real-world hazy images.
Dudhane et al. [28] proposed an end-to-end GAN that outperformed other existing algorithms through conducting experiments on NTIRE 2019 dehazing challenge dataset [29], D-Hazy [30] and indoor SOTS [23] datasets for single image dehazing.
From the above survey, it is clear that there are many GAN based models already developed and all have merits and demerits. However, still, no comprehensive analysis or evaluation was performed. Therefore, this paper tries to fill this gap.
3. GAN-Based Dehazing
Several methods exist for image dehazing, but conventional approaches mostly work by estimating the transmission map and the corresponding air light component of the hazy scene using an atmospheric scattering model to reduce the effect of haze in order to recover the haze-free scene. These methods are based on one or more key assumptions, which exploit haze relevant features. Some of these assumptions do not hold true in all possible cases. A way to circumvent this issue is to use deep learning techniques, and let the algorithm decide the relevant features. Recently, different types of generative adversarial networks (GANs), introduced by Ian Goodfellow et al. [1] proved to be immensely effective in image dehazing. This paper aims to systematically evaluate three state-of-the-art single image dehazing methods: AOD-Net, cGAN, and DHSGAN.
3.1. Generative Adversarial Network
A generic schematic flow diagram of a GAN is shown in Figure 1. The architecture comprises two components, one of which is a discriminator (D) distinguishing between real images and generated images while the other one is a generator (G) creating images to fool the discriminator.
Given a distribution z~pz, G defines a probability distribution pg as the distribution of the samples G(z). The objective of a GAN is to learn the generator’s distribution pg that approximates the real data distribution pr. Optimization of a GAN [1] is performed with respect to a joint loss function for D and G
(1)
3.2. AOD-Net
All-in-One Dehazing Network (AOD-Net) [12] is a light-weight CNN architecture, based on a re-formulated atmospheric scattering model. AOD-Net is capable to generate the clean image J(x) from the hazy image I(x) directly via the joint estimation of transmission matrix t(x) and the atmospheric light, A.
Thus the haze formation model [29] can be reformulated as,
![]()
Figure 1. Generic Architecture of a GAN. Here, two deep neural networks, discriminator (D) and generator (G), are synchronously trained during the learning stage where the discriminator is optimized to distinguish between real images and generated images while the generator is trained by generating images to fool the discriminator.
(2)
where a is a constant bias and
(3)
where,
and A are compacted into one variable K(x) and b is a constant bias.
3.3. cGAN
Conditional Generative Adversarial Network (cGAN) [24] presents a conditional model in which both the generator module and the discriminator module are conditioned on some additional information i.e. class labels or data from several modalities. Image generation can be conditional by feeding this information into both discriminator and generator. A cGAN algorithm is capable of generating clear images through optimization of loss function including adversarial loss, perceptual loss, and L1-regularized gradient prior [23]. It can be expressed according to Equation (1),
(4)
Here, I is the input hazy image, J is the clean image and z is random noise.
3.4. DHSGAN
De-Haze and Smoke GAN (DHSGAN) [18] is a dehazing network without requiring the inversion of an atmospheric model or any kind of post-processing. It directly generates a haze-free image using the final layer of a fully convolutional network. This network works robustly on different scene degradation conditions caused by fog, smoke, mist, haze and so on. DHSGAN can be categorized into two sub-modules: 1) Transmission Module (T) and 2) GAN Module (G) followed by a loss function. The working of DHSGAN can be represented as
(5)
Here, a fully convolutional recurrent architecture is initialized with convolution layers of VGG19 [30] and pre-trained on the ImageNet [31] dataset for the estimation of the transmission map of hazy input images.
4. Experimental Results and Discussions
4.1. Dataset Description
In this work, REalistic Single Image DEhazing (RESIDE) [23] dataset is used for investigation. RESIDE dataset is a large-scale dehazing benchmark dataset consisting of single images along with an empirical and expletive extension, called RESIDE-β. It can be categorized into five subsets: a synthetic large-scale Indoor Training Set (ITS), a Synthetic Objective Testing Set (SOTS) and a Hybrid Subjective Testing Set (HSTS), Outdoor Training Set (OTS) and Real-world Task-driven Testing Set (RTTS). However, here we worked on only SOTS data subset.
4.2. Experimentation
For experimentation, we have worked only on SOTS subset, containing both hazy and corresponding ground truth images for indoor as well as outdoor scenes described in Table 1. It contains approximately 550 indoor images and 992 outdoor images. The training and testing are done at a ratio of 8:2. Some sample hazy and ground-truth images from both indoor and outdoor sets are shown in Figure 2 and Figure 3.
For quantitative evaluation, we used PSNR and SSIM values of the dehazed images. Table 2 and Table 3 list out the average PSNR, and SSIM values of the dehazed images for the three GAN-based techniques: AODNet, cGAN, and DHSGAN. It is seen from the tables that DHSGAN performs comparatively well than the other methods. The visual results for the three GAN-based techniques are shown in Figure 4 and Figure 5, which also confirms the superiority of
![]()
Table 1. Statistics of the experimental SOTS data Subset from the RESIDE dataset.
![]()
Table 2. Average PSNR and SSIM results for the investigated three GAN-Based methods for SOTS data subset indoor images.
![]()
Figure 2. (a)-(e) Sample indoor hazy images from SOTS data subset of the RESIDE dataset, (f)-(j) are the corresponding ground images.
![]()
Table 3. Average PSNR and SSIM results for the investigated three GAN-Based methods for SOTS data subset outdoor images.
![]()
Figure 3. (a)-(e) Sample outdoor hazy images from SOTS data subset of the RESIDE dataset, (f)-(j) are the corresponding ground images.
![]()
Figure 4. Qualitative comparison of different methods after applied on Figure 2(b) image.
![]()
Figure 5. Qualitative comparison of different methods after applied on Figure 3(a) image.
DHSGAN than the other methods, as it is robust than the other methods. This is due to the fact that DHSGAN does not use the inverse atmospheric model for recovering haze-free images, rather its generator learns from training images.
5. Conclusions
Removing haze from images for clear vision is one of the most challenging tasks in computer vision. This research reported a comprehensive study on three state-of-the-art GAN-based image dehazing methods, such as AODNet, cGAN, and DHSGAN. We evaluated the outputs of these methods both objectively (based on PSNR and SSIM) and subjectively (based on visual feeling) using the SOTS data subset of the benchmark RESIDE dataset. We found that among the three methods, DHSGAN generated the best haze-free images from the corresponding hazy images.
However, the size of the input image is restricted to (256 × 256) pixels, so future research can concentrate on developing dataset containing bigger size images. In addition, we expect to present a detail haze model so that we can explore optimum dehazing by a customized GAN.