Vertex Component Analysis in Hyperspectral Unmixing
Best viewed with MathJax or Firefox
Contents
2 Image Classification and Statistics
2.1 Image Mean Vector and Covariance Matrix
2.2 Principal Component Transformation
2.3 Minimum Noise Fraction
3 Virtual Dimensionality
3.1 HFC
3.2 NWHFC
3.3 NSP
3.4 OSP
4 Vertex Component Analysis
4.1 VCA Algorithm
1 Introduction
in the Grizzly Peak Caldera, Sawatch Range, Colorado
David W. Coulter
Ph.D. Thesis Colorado School of Mines, 2006
The Gizzly Peak Caldera is located Colorado’s Mineral Belt approximately 15 miles southeast of Aspen. It is the product of a volcanic eruption million years ago that ejected some 600 of magma. With a volcanic explosive index of 7, it was at least four times larger than Tambora.[3, sec 1.5]
Coulter sought to identify acidic conditions by the different weathering states of iron oxide, which forms Hematite (red) in acidic conditions, and Goethite (lite yellow) and Jarosite (dark yellow) as pH increases, Jarosite being the most immediate oxidation product of iron sulfide (pyrite).
Note on Partial Unmixing:
”Unmixing is critical in imaging spectrometry since virtually every pixel contains some macroscopic mixture of materials. The theory of, and methods for unmixing of spectroscopic signatures are found in a number of sources. Hapke (1993) provides models for linear and non-linear spectral mixing and a discussion of the criteria for using each approach. In Earth remote sensing, a linear mixing model is typically used. Full unmixing which assumes that the spectra of all possible components are known, is described by van der Meer (2000) and Boardman (1989). Since it is often impossible to identify, a priori, all possible components, partial unmixing is an important tool. Match Filtering (MF) (Harsanyi and Chang 1994), Constrained Energy Minimization (CEM – similar to Match Filtering) (Farrand and Harsanyi 1997; Farrand 2001), and Mixture Tuned Match Filtering () (Boardman et al. 1995) are commonly used methods for partial unmixing. is probably the most popular unmixing method used for geologic remote sensing.”[3, Coulter sec 1.4.4]
(Emphasis Added.)
2 Image Classification and Statistics
2.1 Image Mean Vector and Covariance Matrix
Image Mean Vector and Covariance Matrix [10, White 2005] If image has bands and pixels, the mean vector is
(1) |
where is the jth pixel vector of the image
(2) |
The image covariance matrix is[2, Chang 2013 (6.6)]
(3) |
where is the matrix of pixel vectors each of length
(4) |
is the matrix of identical mean vectors ( rows by columns):
(5) |
where is an matrix of ones.
2.2 Principal Component Transformation
Principal Component Transformation [7, Smith 1985] Karhunen-Loeve Transformation[11, White 2005] GRASS imagery i.pca: Let
(6) |
(7) |
- input-image multi-pixel vector ( bands by pixels)
- mean vector matrix,
- output-image multi-pixel vector,
- matrix whose rows are the eigenvectors of the covariance matrix , arranged by decreasing magnitude of eigenvalue, as typically returned by SVD routines.
the symmetric positive-definite image covariance matrix | |
are its orthonormal eigenvectors with eigenvalue . |
Magnitudes of impose ordering on transformed component vectors . Those with the largest s.t. are the Principal Components. Tolerance tol should be related to the noise floor.
2.3 Minimum Noise Fraction
Minimum Noise Fraction[9, pg. 38] [4, ] We wish to find a particular coefficient matrix that in some sense maximizes the image S/N, assuming the image pixel vectors are the sum of uncorrelated signal and noise:
Maximize
(19) |
where is generalized eigenvalue of wrt , and are corresponding generalized eigenvectors. Compare with PCA:
(24) |
Noise Covariance Green suggests be of unit variance and band-to-band uncorrelated.
- Sensor error estimate: (Aviris .rcc file) Call supplied error vector
.
Then
(25) is completely uncorrelated. In ideal case all are equal:
since the eigenvectors are orthonormal. if the variance .
Homogeneous Area Method [8, sec 2.9.1] If possible find homogenous area of pixels in image:
Local Means and Local Variances [8, sec 2.9.2]
- 1.
- Divide image into small pixel blocks (4x4, 5x5,...)
- 2.
- For each (block, band) get local mean and variance:
is the local covariance matrix.
- 3.
- bin into classes between band min and max values.
- 4.
- bin with most blocks represents mean noise of image. Hope this bin is same for all bands...
Local Means and Local Variances (con’t)
- 5.
- Suppose ”most popular” bin is a P-cube, each side ranging
, contains
points. Then the average value over the bin
(37) is desired noise covariance matrix.
- 6.
- Caveat: Assumes image is ”slowly varying enough” that enough of the blocks are homogeneous, i.e. their covariance is really due to noise, not true features in the image. Blocks and bins must both be ”small enough” – but not too small!
Other methods: ”unsupervised training” derived-endmember classification schemes e.g. LAS’ search ([12, ]) and GRASS’ cluster/maxlik are based upon local covariance minimization.
3 Virtual Dimensionality
We implement the Harsani-Farrand-Chang (HFC) and noise-whitened Harsani-Farrand-Chang (NWHFC) esimators of the image virtual dimensionality VD = as described in [1, C.-I Chang and Du 2004] and [2, C.-I Chang 2013] Chapter 5.3.6.
3.1 HFC
Let be the input image matrix of pixel (sample) vectors each of length spectral bands: . The image spectral correlation matrix and covariance matrix are then
Let and denote the (ordered) correlation and covariance eigenvalues. By assuming that signal sources are nonrandom unknown positive constants and noise is white with zero mean, we may expect
where is the noise variance in the th spectral channel. Formulate the VD determination as a binary hypothesis problem:
The null hypothesis and the alternative hypothesis represent the case that the correlation eigenvalue is equal to its corresponding covariance eigenvalue (no signal), and the case that the correlation eigenvalue is greater than its corresponding covariance eigenvalue. When is true (i.e. fails), it implies there is an endmember contributing to the correlation eigenvalue in addition to noise, since the noise energy represented by the eigenvalue of in that particular component is the same as the one represented by the eigenvalue of in its corresponding component.
If we assume the noise in each spectral dimension has zero mean and varance , then and where is the sample mean in the th spectral dimension and is the channel noise. Despite the fact the and are unknown constants, we can model each pair of eigenvalues under hypothesis and as random variables with asymptotic probability densities given by
respectively, where is an unknown constant for each and the variance is given by
(49) |
“When the total number of samples is sufficiently large, and . Therefore, the noise variance in (42) can be estimated and approximated using (49).” (Although its not obvious exactly how. Play statisitics on collections of subimages, perhaps? In his MATLAB code, Chang approximates , so we do as well.)
Chang and Du ([1, eq. 8]) use Schwartz’ inequality to bound
From (46), (47), and (49), we define the false alarm probability and detection probability as
We then choose a given false-alarm probability (for example 0.001) and invert (54) to give
A case of fails the null hypothesis test, and indicates there is signal energy assumed to contribute to the eigenvalue in the th spectral dimension. Do note the threshold is different for each spectral dimension .
3.2 NWHFC
As noted by Chang ([2, Section 5.3.6]), the signature variance is generally very small and can be influenced by interband covariances. Interband covariance may be reduced by “noise whitening”: preprocessing by multiplying the input image by a noise covariance matrix estimated by a technique developed by Roger and Arnold, which may be performed prior to the HFC method. This is discussed far too sparsely in ([1, Chang and Du]) and hardly at all in ([2, Chang 2013]), so we’ll merely reference the former and translate Matlab code from the latter (Section A.1).
The noise covariance matrix is the LxL diagonal matrix whose elements are the inverse of the diagonal elements of . For noise whitening we want the square root of its inverse, denoted , also diagonal:
is the noise-whitened image matrix. The NWHFC VD estimate is obtained simply by running the HFC algorithm on rather than .
3.3 NSP
The Noise Subspace Projection method ([1, ]) estimates VD solely from the spectral covariance matrix . Let the matrix be as above and define the noise-whitened covariance matrix
As result, the noise variance of each band in the whitened is reduced to unity. Let be a set of eigenvectors generated by , which can then be expressed as
where and span signal subspace and noise subspace respectively. The variances of the noise components of (67) have been whitened and normalized to unity, so and otherwise. The problem of VD estimation can then be formulated as a binary hypothesis testing problem:
where
where is an unknown constant for each and the variance is given by
(73) |
which can be further reduced under hypothesis to
(74) |
Finally, find the Neyman-Pearson detector for (68) to determine VD. The false-alarm probability is the probability that we think we’ve detected a signal when none was actually present. In terms of eigenvalues this the probability of measuring a value when the “true” value was its minimum noise-only value . For a desired we seek a threshold such that
with solution
The virtual dimensionality VD is then the number of eigenvalues whose value exceeds the threshold : .
3.4 OSP
Orthogonal Subspace Projection ([2, 5.4.1]). OSP is a general technique that also works well as part of VCA (4).
Assume that there are signal sources with L bands, , present in the data to be processed and every data sample vector can be expressed by a linear mixture of these p signal sources as
(83) |
where is a signal matrix made up of the p signal sources, , and can be interpreted as the noise vector or model error vector.
Let be the p-signal projection matrix formed by the signal matrix . It can then be used to map all data sample vectors into the spectral space linearly spanned by the signal sources . In other words, every data sample vector can be expressed by a linear mixture of the signal sources,, specified by (83) via the -signal projection matrix where the noise in (83) is included to account for the linear mixture model error:
Since each sample vector produces a different model error and residual, the sample mean vector is used to represent an averaged model error and residual. In this case, from (83) the sample mean vector can be expressed by
where and . There are two ways to find .
without involving noise covariance matrix estimation. Theoretically, the value of OSP() in (89) increases as the value of increases. For any given error threshold , VD can be determined by a stopping rule by . The value of determining the OSP() in (89) or (90) is denoted by . Two criteria are developed to detect the abrupt change of the OSP() value. One that is based on the gradient, denoted by “”, is defined by
The other is based on the difference and is denoted by minus “-”:
For the difference criterion, the sample mean vector is normalized before orthogonal projection so that the values of threshold for these two criteria are comparable for analysis. The threshold in (91) and (92) is generally selected according to a sudden drop or a clear gap between two consecutive values of in plots of the gradient in (91) and difference (92) versus the value of .
It should be noted the above definitions involve two key parameters: one is the error threshold ,and the other is the algorithm used to produce the -signal matrix , which is not specified in (91) and (92). We shall use “VCA” for “algorithm” at least initially, but there are other choices including target-specific algorithms such as ATGP, as well as generic SVD. See ([2, 5.4.1] for further discussion.
4 Vertex Component Analysis
Vertex Component Analysis is an “unsupervised training” derived-endmember classification scheme. We use the notation of [6, Nascimento and Dias], which is slightly different than that used in previous sections.
4.1 VCA Algorithm
Assume linear mixing, and let
- number of pixels in image
- number of spectral bands recorded at each pixel
- number of endmembers, , and usually for hyperspectral images.
- is an endmember spectral vector of length ,
- is a scale factor modeling illumination variability due to surface topography at pixel
- is the abundance vector containing the fractions of each endmember at pixel . Positivity: and , that is .
- is the “true” noise-free spectral vector of length at pixel .
- is the noise vector at pixel .
- is an Lxp mixing matrix that maps an abundance vector to a “true” spectral vector .
Then the recorded spectral vector at pixel may be given by
Our goal is to find a set of abundance vectors corresponding to some endmember set . An appropriate endmember set is to be determined as part of the VCA algorithm. Endmember identification – matching one or more of the VCA generated endmembers to actual specimen spectral samples such as from USGS spectral library or field ground-truth sampling – may be done in subsequent processing steps.
Since the set is a simplex, the set is also a simplex. However, even assuming , the observed vector set belongs to convex cone owing to different scale factors .
But the projective projection of the convex cone onto a properly chosen hyperplane is a simplex with vertices corresponding to the vertices of simplex . The simplex is the projective projection of the convex cone onto the hyperplane , where the choice of assures there are no observed vectors parallel to the hyperplane ( orthogonal to its defining vector ): .
Geometry Note : I’m going to suggest there’s just a shade more going on here. Suppose we neglect noise and normalize all the pixel values such that , which is easy enough to do. Consider a three band image, such as an rgb color image. the are now neatly mapped onto the surface of the first (all axis non-negative) octant of the rgb unit sphere – a pixel’s intensity of any color cannot be negative. For convenience of visualization, assume we may choose which, if there were pure pixels lying on each unit axis ie (1,0,0), (0,1,0), (0,0,1) assures the simplex of points projected onto the plane is a pure equilateral triangle whose verteces correspond to the “1” values on the rgb unit axis.
But real photos rarely if ever span the complete rgb bandwidth. Their normalizations will usually map to some sub-patch of that first unit octant. Their projections onto the plane will still form a simplex, but will it have more vertices than a triangle? The VCA algorithm asserts “no”, and starts by finding the vertices of that projected simplex on the plane.
4.1.1 SNR
VCA algorithm accuracy is dependent on image SNR. Several ways to estimate:
- Classic definition: SNR = Power(signal)/Power(noise). Select homogeneous blocks from image, areas where the image is expected to be flat so that variations are due to noise. Calculation of data SNR using a (Mean)/(Standard Deviation) method for a homogeneous target. is the average intensity of the i-th spectral band in the block with pixels. is the standard deviation. Then . We may manually select homogeneous regions from the image, such as still water (lakes, ponds) or snowfields, or we may use GRASS i.cluster/i.maklik to (perhaps) find such regions automatically.
- VCA does an image SVD or PCA as an initial dimensionality reduction step. We might try looking at the ratio of the Power in the SVD (and/or PCA) components with large singular values – the ones of interest – with the Power in the components with small singular values. Compare this with the ratio of the sum of the large eigenvalues with the sum of the small eigenvalues – or sums of squares of each, to see if there’s an easy way to compute an “effective SNR” that is usable to select between SVD and PCA in the first step of the VCA algorithm.
- Compare the above with sensor data e.g. AVRIS .rcc files Aviris has very high SNR, over 500 on a good day[5, Sec. 3.3] Satellites e.g. Hyperion might have a tenth this, SNR is an important consideration and worth some synthetic image modelling. Nascimento and Dias use for their and SNR estimator. We note with an SNR greater than 500 (on a good day), AVIRIS is considered extemely good.[5, Kruse JPL 2002] But with 224 bands it will spectacularly fail this particular test, as . We speculate this test is insufficent for any hyperspectral image, and something better related to the number of significant principal components (usually d) might be in order. Nascimento and Dias have quite a bit on the topics of estimating SNR and the virtual dimensionality (VD) of an image. See [6, eqs 10 - 13] and C.-I Chang and Q. Du, Estimation of number of spectrally distinct signal sources in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens., vol. 42, no. 3, pp. 608-619, Mar. 2004. [1, C.-I Chang and Du].
4.1.2 Algorithm 1: Vertex Component Analyis (VCA)
Inputs:
- is the image matrix of pixel vectors each of length spectral bands.
- , desired dimensionality of problem, the number of endmembers. may be estimated as the number of “large” eigenvalues of the image covariance matrix. .
- is the fixed threshold value of the SNR measured wrt the signal subspace. Different values might be used for different SNR estimators. See above.
Outputs:
- is the estimated mixing matrix returned by VCA.
Notations:
- is the jth column of
- are the ith through kth columns of
- is the matrix formed from the columns of listed in index vector
- denotes the pseudo-inverse of
pxp matrix as computed
by SVD. is what you think
it is. Write the SVD as
where
is the pxp diagonal matrix whose diagonal elements are the (ordered) singular values:
Then the pseudo-inverse
may be written where
and the pseudo-inverse
of any scalar
if and
otherwise.
Thus
if is
non-singular. If
and ,
then
- is the orthogonal projector onto the range of
- is the orthogonal projector onto the range of
- is the orthogonal projector onto the kernel (null-space) of
- is the orthogonal projector onto the kernel (null-space) of
- and has d non-zero diagonal entries, . is a full pxp symetric matrix, none of whose components contain values from lower right p-d block of . If p=d then and are the identity matrix.
See Moore-Penrose_pseudoinverse In particular “ the pseudoinverse for matrices related to can be computed by applying the Sherman-Morrison-Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[12][13] exist that exploit the relationship.” – a fact that might be quite useful here.
- is an Lxd matrix whose columns are the first d eigenvectors of either (which is LxL) or the LxL spectral covariance matrix . is the left eigenmatrix computed by SVD whose columns are ordered (left to right) in order of decreasing singular value. Its first d columns are the most significant (principal) relative to the rest. We usually select d such that the singular values are in some sense “too small to be significant”.
1: Compute SNR and Virtual Dimensionality
2: if
then
else
Here is the
sample mean of ,
for .
end if
14: initialize
, a pxp auxiliary matrix,
and its pseudo inverse
, and
Uxcor, its SVD eigenvectors.
15: for () do
22: end for
Stopping Criteria: if
in 15: above has been pre-selected e.g. by HFC as described in section (3.1), we’re done. But if OSP (or
similar) is desired, there are other tests that might be applied.
23: if
then
24: is a
estimated mixing matrix.
25: else
26: is a
estimated mixing matrix.
27: end if
4.1.3 VCA Programming Notes
- There’s a fair bit of SVD here, we’ll use LAPACK routines LAPACKE_sgesvd(..) and friends. See http://www.netlib.org/lapack/lug/node32.html
- local implementations are in /usr/lib64/liblapacke.so and /usr/include/lapacke/lapacke.h. The LAPACK ”e” variant is an Intel implementation. We might eventually compile a custom version with local compiler optimization switches, but this one will get us going. Be sure to -DHAVE_LAPACK_CONFIG_H and use the #included primitive data types.
- Image comparison: do “man compare” – mathematically and visually annotate the difference between an image and its reconstruction. From ImageMagick, looks pretty sophisticated
4.1.4 Icarus Image GUI Notes
Needs:
- Opts.write(output_configfile) Need to be able to save Options configuration at any time with a “save options” button. Likewise will need a “read options” button that is active whenever user wants to read in previous saved values. A separate “Options Editor” window might be appropriate. (Done)
- direct hyperspectral image display in false colors. See ColorDisplayforHyperspectralImagery-TGRS-VIS2.pdf in RemoteSensing/doc. (TBD. current hyperspectral display is true-color only from visible bands)
- overlay on separate high-res USGS aerial photo if available. (Done)
- overlay Various virt_endmembers (virtual endmembers) in false color (TBD. Got to get VCA working for images without pure endmembers first.)
- virt_endmember to chem_endmember (chemical endmembers) conversion e.g. by constreained least-squares from selected spectral libraries (TBD)
- overlay Various chem_endmembers in false color on original false-color image and/or aerial photo. (TBD)
- highlight “weak target” pixels. (TBD)
- Wavelength (nm) to Hue: Hue = (650 - wavelength)*240/(650-475);
Adjust this formula for the different spectral ranges you want for true or false images. http://www.mathworks.com/matlabcentral/answers/17011-color-wave-length-and-hue
http://www.physics.uoguelph.ca/applets/Intro_physics/refraction/LightRefract.java (Partly done. We only specify a fixed visible range at this point (true color only). - HSV to RGB is complicated but the C code is at
http://www.cs.rit.edu/ ncs/color/t_convert.html
But its also in Qt: QColor Class, whose methods also include alpha. We may also implement alpha blending in software. See setOverlayImage.cpp and alphablend.h. - QColorMap, QGraphicsScene, QImage, QImageIOHandler, QImageIOPlugin QImageReader QImageWriter
- Qt has convient conversion methods.
- CIE. But what you really want – conversion from spectral power distribution (SPD) to RGB
is at
http://hyperphysics.phy-astr.gsu.edu/hbase/vision/cie.html
Google “CIE color matching functions”
http://jcgt.org/published/0002/02/01/
XYZ to RGB: http://www.easyrgb.com/index.php?X=MATH&H=01#text1 (Done)
References
[1] C.-I Chang and Q. Du. Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 42(3):608–619, March 2004.
[2] Chein-I Chang. Hyperspectral Data Processing: Algorithm Design and Analysis. John Wiley & Sons, 111 River St, Hoboken, NJ 07030, 2013.
[3] David W. Coulter. Remote Sensing Analysis of Alteration Mineralogy Associated with Natural Acid Drainage in the Grizzly Peak Caldera, Sawatch Range, Colorado. PhD thesis, Colorado School of Mines, Golden, Colorado, 2006.
[4] A.A. Green, M. Berman, P. Switzer, and M.D. Graig. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. Journal of Geophysical Reseach, 90:797 – 804, 1988.
[5] Fred A. Kruse. Comparison of aviris and hyperion for hyperspectral mineral mapping. http://w.hgimaging.com/PDF/Kruse_JPL2002_AVIRIS_Hyperion.pdf, 2002.
[6] José M. P. Nascimento and José M. Bioucas Dias. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(4), April 2005.
[7] M.O. Smith, P.E. Johnson, and J.B. Adams. Quantitative determination of mineral types and abundances from reflectance spectra using principal component analysis. IEEE Transactions on Geoscience and Remote Sensing, 36:65 – 74, 1985.
[8] Frank D. van der Meer and Steven M. de Jong. Imaging Spectroscopy. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.
[9] Frank D. van der Meer, Steven M. de Jong, and W. Bakker. Imaging Spectroscopy: Basic analytical techniques, pages 17–62. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.
[10] R. A. White. Image mean and covariance: http://dbwww.essc.psu.edu/lasdoc/user/covar.html, 2005.
[11] R. A. White. Karhunen-loeve transformation: http://dbwww.essc.psu.edu/lasdoc/user/karlov.html, 2005.
[12] R. A. White. Search unsupervised training site selection: http://dbwww.essc.psu.edu/lasdoc/user/search.html, 2005.