Quality Assessment from Grayscale to Color Images

Color images reveal more meaningful information to the human observers rather than grayscale ones. Regardless of the advantages of the existing well-known objective image quality metrics, one of the common and major limitations of these metrics is that they evaluate the quality of grayscale images only and don’t make use of color information. In this paper we propose an improved method for image quality assessment that adds a color comparison to the criteria of the well-known Structural Similarity index (SSIM). We validated the proposed metric on the TID2013 image database and the experimental results show that the quality scores given by the proposed metric are more consistent with HVS than other state-of-art metrics.


Introduction
Image quality assessment is an important tool in image processing systems, such as signal acquisition, synthesis, enhancement, watermarking, compression, transmission, storage, retrieval, reconstruction, authentication, display and printing. Image quality assessment methods can be classified into two categories: subjective and objective. The subjective image quality assessment methods are accurate in estimating the visual quality of an image because they are carried out by human subjects but involve a costly process which requires a large number of observers and takes significant time. On the other hand the objective image quality assessment methods are computer based methods that can automatically predict the perceived image quality. Hence the objective image quality assessment methods gained more popularity although they do not necessarily correlate well with the quality as perceived by humans [1,2].
The objective image quality assessment methods can be employed for the following functions: monitoring image quality in real-time applications, benchmarking image-processing algorithms and for embedding in into imageprocessing and transmission systems for optimization purposes [3].
Objective image quality assessment methods also may be classified into full reference, reduced reference, and noreference methods based on the availability of the reference image. Full reference image quality assessment requires complete information about the reference image; and partial information about the reference image is required for the reduced reference image quality assessment; while no information about the reference image is needed in no-reference image quality assessment. This paper focuses on the full reference image quality assessment methods for color images where both the original and the test images are available.
Over the years a large number of objective metrics have been proposed to assess image quality. These include many algorithms that incorporate perceptual models [4,5,6,7,8,9,10] and distortion models [11,12]. Miyahara [10] proposed a Picture Quality Scale (PQS) based on three distortion factors: the amount, location, and structure of error. Wang and Bovik [13] proposed a new universal image quality index (UQI) and its improved form the Structural Similarity (SSIM) index [14] by modeling the image distortion as a combination of loss of luminance, contrast, and correlation. More recent algorithms based on the successful Structural Similarity (SSIM) index that incorporate saliency [15], compression specificity [16], multi-scales [17], amount and local information [18], wavelet-domain processing [19], and color [20]. Information fidelity criterion (IFC) [3] and visual information fidelity (VIF) [21] both are based on information-theory in which the distorted image is modeled as a sequence of passing the reference images through distortion channels and quantify the visual quality as a mutual information between the test image and the reference image. Shnayderman [22] explored the feasibility of Singular Value Decomposition (SVD) for quality measurement. In [23] a two staged wavelet based visual signal to noise ratio (VSNR) was proposed based on the lowlevel and the mid-level properties of human vision.
Consider the original image in Fig. 1 that is distorted by different noises from TID2013 image database [24]. Images in Fig. 1b and Fig. 1d are distorted by two color-related noises, quantization noise and additive noise in color components respectively. Image in Fig. 1c is distorted by additive Gaussian noise which is non color-related noise. It is clear that Structural Similarity index (SSIM) fails to assess the perceptual quality of these color-related noises as it is a grayscale index. Images in Fig. 1c and Fig. 1d are almost of the same perceptual quality but having drastically different SSIM scores as 0.579 and 0.830 respectively. Image in Fig. 1b has the worst perceptual quality among all distorted images but is given the highest SSIM score 0.850. This necessitates considering color information while assessing the perceptual quality of color images. Based on these considerations, a novel image quality assessment method , CSSIM, has been designed and published [25]. Due to limited space in Conference Proceedings, results of comparisons between CSSIM and the existing image quality assessment methods have not been addressed and analyzed in [25].
The remainder of this paper is organized as follows. In Section 2 the grayscale SSIM is simply introduced. The proposed method is described in Section 3. Section 4 discusses the experimental results and discussion. Finally, section 5 draws the conclusion.

The Grayscale Index
Structural Similarity (SSIM) index [26] was proposed based on the assumption that the human visual system is highly adapted to extract structural information from visual scenes. Structural similarity index consists of three local comparison functions namely luminance comparison, contrast comparison, and structure comparison between two images x and y: Where µ x and µ y are the sample means of the images x and y respectively, σ x and σ y are the sample standard deviations of x and y, and σ xy is the sample correlation coefficient between x and y. The constants C 1 , C 2 , and C 3 are used to stabilize the algorithm when the denominators approach to zero. These statistics are calculated within a local window.
Then the general form of the SSIM index, as shown in Figure 2, is given by combining the three comparison functions: where α, β, and γ are parameters which define the relative importance of the three components. Usually, α = β = γ = 1 and C 2 = 2C 3 yielding:

The Color Image Quality Metric
The common RGB color space used for representing images is not consistent with the human visual system (HVS) in which the Euclidean distances between different colors in the color space does not approximately correspond to perceived color differences by HVS. On the other hand The CIE-LAB color space proposed by Commission Internationale de l' Eclairage (CIE) [27] is consistent with the human visual system and is considered perceptually uniform and referred to as uniform color space. This color space was created based on data from color matching and discrimination experiments using large uniform colored areas. With the development of the color image technology, many applications have been developed to process real images. However, most real images are made up of many small region and texture structures rather than of large uniform areas, this makes the application of the CIELAB color space for calculating color differences in images is unsatisfactory. Zhang and Wandell [28] proposed the S-CIELAB, which is an improvement and a spatial extension of CIELAB by adding a color separation and spatial filtering procedure in order to simulate the spatial blurring by the human visual system. The detailed steps and parameters setting are as follow: The implementation of S-CIELAB includes transforming the input color image data, specified in terms of the CIE XYZ tristimulus values, into an opponent-colors space by the following linear transformation: 58 Quality Assessment from Grayscale to Color Images where Q 1 , Q 2 , and Q 3 represent the three opponent colors planes: luminance, red-green and blue-yellow respectively.
Each of these opponent-colors planes Q 1 , Q 2 and Q 3 are then filtered with a specific spatial kernel which is determined by spatial response of human visual system and the values of parameters are obtained by vision physiological and perception experiments. The calculation formula for the kernels is given as following: where k and k i are selected as normalization factors: k i normalizes E i such that E i sums to one and k normalizes each color plane such that the kernel f sums to one. The parameters w i and σ i are the values for weight and spread of the filtering kernels which are different for the three color planes.
x and y represent 2D kernel's width and height in pixels. i varies from 1 to 3 for Luminance plane (Q 1 ) and from 1 to 2 for red-green (Q 2 ) and blue-yellow (Q 3 ) as shown in Table 1. More details of the spatial filtering can be found in [28,29,30,31,32].
Then, the filtered data in each plane is transformed back to CIE XYZ tristimulus values using the following formula: where , and X 0 , Y 0 , and Z 0 are the tristimulus values of the reference white (D 65 ).
The new index is proposed by extending the grayscale metric (4) to include color information. This is done by modeling any image distortion as a combination of four local comparison functions namely luminance comparison, contrast comparison, structure comparison, and color comparison. The proposed quality metric is defined as: where l(x, y), C(x, y), and S(x, y) are the luminance, contrast, and structure comparisons between the original and distorted images. C r (x, y) is the color comparison which is defined as: where ∆E(x, y) is the perceptual color difference between the original and the distorted images in the S-CIELAB color space. k is a weighting constant.
To calculate the perceptual difference between the original image and the distorted one, the ∆E value is computed as follows: In order to chose the best value of the weighting constant k, we randomly selected one of the distortion types from the TID2013 image database. For various values of k, the value that maximized the Pearson Linear Correlation Coefficient (PLCC) was chosen as the value of k. Fig. 3 that shows the Pearson Linear Correlation Coefficient (PLCC) as a function of k for CSSIM (x, y) and consequently 45 was chosen as the best value for k. The CSSIM metric is computed by sliding 11×11 window over the entire image as [26]. At each image coordinate, the CSSIM metric is calculated yielding CSSIM metric map that can be spatially pooled resulting a single descriptor of the image objective quality.

RESULTS AND DISCUSSION
In this section, the performance of the proposed image quality metric, CSSIM, in terms of the ability of predicting the subjective ratings is analyzed. The proposed quality metric and the other image quality metrics used in this analysis were applied to the set of images from Tampere Image Database (TID2013) [24]. This database is the most recent and largest database so far available that includes more images and more distortion types for verification of full reference quality metrics. The TID2013 database contains a 3000 test images obtained from 25 reference images, 24 types of distortions for each reference image, and 5 levels for each type of distortion. Mean opinion scores (MOS) for the new database have been collected by performing 985 subjective experiments with observers from five countries (Finland, France, Italy, Ukraine, and USA). Figure 4 shows the scatter plots for the scores given by the proposed CSSIM metric and the original SSIM against the MOS (Mean Opinion Scores) from TID2013 image database for different distortion types. To accurately reflect the MOS, scatter plots should be close to the line. It is easy to know that the proposed CSSIM metric is better close to the line than SSIM. So, there is clear improvement in the performance of CSSIM compared to SSIM.
For numerical comparison, two criteria were used to validate the performance of the proposed CSSIM metric along with several well-known objective image quality metrics. These criteria characterize two attributes related to the prediction of each image quality metric [33]: Pearson's linear correlation coefficient (PLCC) as a measure of Prediction linearity, and Spearman's rank order correlation coefficient (SRCC) as a measure of Prediction Monotonicity.
For overall performance comparison of image quality metrics, Table 2 and Table 3 show the performance comparison of the Pearson's linear correlation coefficients (PLCC) and the Spearman's rank order correlation coefficients (SROCC) respectively of the nine quality metrics for each distortion type in the TID2013 image database. It is clear from those Tables that the proposed CSSIM metric greatly improves the performance of the classical SSIM for almost all distortion types. This improvement reaches more than 100% in change of color saturation distortion type and around 40% in contrast change distortion. It also can be noticed that the proposed CSSIM metric outperforms the other metrics for many types of distortion and comparable for the rest.
Practically, researchers showed that it is reasonable to study MOS estimated for all types of distortions as well as for particular subsets [36,37]. A subset is usually formed by researchers depending upon an application and it may include one or several types of distortions. Table 4 shows subsets of distortions for TID2013 image database used for verification of image quality metrics (distortions that belong to a given subset are marked by + ).
The subset "Noise" contains different types of noise and distortions in conventional image processing; the subset "Actual" relates to types of distortions most common in practice of image processing including compression, the subset "Sim-60 Quality Assessment from Grayscale to Color Images   Table 4. Distortion types and considered subsets of TID2013 [24].
Type of distortion Noise Actual Simple Exotic New Color Additive Gaussian noise + + + ---Additive noise in color components  ple" includes only three standard types of distortions; the subset "Exotic" corresponds to distortions that happen not often but are among the "most difficult" for visual quality metrics. The subset "new" is for new types of distortions introduced to the old version of the image database TID2013 [36], the subset "color" to distortion types that are in one or another manner connected with changes of color content [24]. Table 5 shows the performance comparison of the Spearman's rank order correlation coefficients (SROCC) for the considered metrics for subsets of distortion types from TID2013 image database. As expected, for the subset "color" the proposed CSSIM metric performs well compared with the other metrics as it fully considers color information in the process of image quality assessment. The situation is similar for the subset "new". For the the subsets "Noise", "Actual" and "Simple" the performance of the proposed CSSIM metric is comparable with the other metrics. In turn, the subset "Exotic" causes problems for most of the metrics. Only two metrics (FSIM, MSSIM) have SROCC with MOS above % 0.80.

CONCLUSIONS
In this paper, we present an improvement to the wellknown Structural Similarity index (SSIM), which follows the HVS's characteristic that color images reveal more meaningful information to the human observers rather than grayscale ones. This may be the primary reason that the proposed CS-SIM metric has greatly improved the performance of SSIM and also more consistent with human perception of image quality assessment, especially for the color-related distortions.