¡article¿ Spectral Unmixing]Optimization in Spectral Unmixing ] > [


Edward Leaver
Icarus Resources LLC

June 15, 2009



1 Introduction
2 Cross-Correlation Spectral Matching
3 Optimized Cross-Correlation Spectral Matching
4 Linear and Non-Linear Least-Squares Fitting
5 Image Classification and Statistics
 5.1 Image Mean Vector and Covariance Matrix
 5.2 Principal Component Transformation
 5.3 Minimum Noise Fraction
6 Vertex Component Analysis
 6.1 VCA Algorithm
  6.1.1 SNR
  6.1.2 Algorithm 1: Vertex Component Analyis (VCA)

1 Introduction


Remote Sensing Analyis of Alteration Minerology Associated With Natural Acid Drainage
in the Grizzly Peak Caldera, Sawatch Range, Colorado
David W. Coulter
Ph.D. Thesis Colorado School of Mines, 2006

West Red (Ruby Mountain) from Enterprise Peak


West Red Fe Endmembers from Aviris
Red: Hematite    Green: Goethite    Blue: Jarosite
(low pH)                                    (high pH)

2 Cross-Correlation Spectral Matching

Cross-Correlation Spectral Matching

Linear Correlation Coefficient (Pearson’s r):

r = i=1N(y i ȳ)(zi z̄) i=1N(yi ȳ)2 i=1N(zi z̄)2 1 r 1[6,eq.(13.7.1)]. (1)

. Generalize: cross-correlation at position m < N:

rm = i=1Nm(y i+m ȳ)(zi z̄) i=1Nm(yi ȳ)2 i=1Nm(zi z̄)2[10,eq.(33)]. (2)

Measures of fit:

  1. r0 relative to 1 is not particularly good, but
    t = rN 2 1 r2[6,eq.(13.7.5)ff.] (3)

    is distributed in null case as Student’s A(t|N 2).

  2. χ2 statistic:
    χ2 = (1 r2) i=1N(y i ȳ)2[6,eq.(14.2.13)] (4)

    For non-uniform weights, weight the sums in eq. (1) by 1σi2. Then goodness-of-fit

    Q = Q N 2 2 , χ2 2 [6,eq.(14.2.12)] (5)

    where the incomplete gamma function Q(a,x) is

    Q(a,x) = Γ(a,x) Γ(a) = 1 Γ(a)xetta1dt (6) Q(a, 0) = 1             and                Q(a,) = 0[6,eq.(6.2.3)] (7)

    If e.g. Q > 0.1, then the goodness-of-fit is believable. If Q is larger than, say, 0.001, the fit may be acceptable if the the errors are non-normal or have been moderately underestimated. If Q < 0.001 then question the model and/or estimation proceedure eq. (2). If latter, consider robust estimation.

Other useful measures Skewness

  1. ¡1-¿ Skewness: cross-correlogram of perfect match is parabola:
    Skew(rm) = 1 M m=1M(r m rm)(2) 0 (8)
  2. RMS Skewness error [10, eq. (34)]:
    RMS = m=0M(rm rm)2 M (9)

    where rm is auto-correlogram of z with itself, M is number of match positions, m is match number.

  1. Another possibly useful statistical measure, if (y,z) is binormal and N is moderately large (N 10), is Fisher’s z-transform z = (12) ln[(1 + r)(1 r)]. Then if you measure the correlation coefficients of the unknown signal y against two reference signals (z1,z2), the significance of a difference between the two measured correlation coefficients r1 and r2 is given by the complementary error function [6, eq. (13.7.10)]
     erfc |r1 r2| 2 N13 + 2 N23 (10)

    This can be useful when trying to assess whether a parameterized fit changes ”significantly” when a given change of parameter produces the two references (z1,z2).

  2. Non-parametric or Rank Correlation (Spearman’s Rank-Order Correlation Coefficient). See [6, Sec. 13.8] and [8].

3 Optimized Cross-Correlation Spectral Matching

Optimized Cross-Correlation Spectral Matching In ”traditional” CCSM, the reference spectrum z to which y is compared is taken to be a single (pure) endmember.
Next assume M endmembers Xk of interest, and linear mixing. Seek weighting factors {ak; k = 1,M}

zi = k=1Ma kXik (11)

(synthesized pixel intensity at band i) that maximize the cross correlogram

rm = i=1Nm(y i+m ȳ)(zi z̄) i=1Nm(yi ȳ)2 i=1Nm(zi z̄)2 (12)

at match position m = 0, places that maximum at m = 0 , and minimizes its skew, subject to 0 ak 1 and k=1Ma k = 1. Significance of different values of r0 resulting from different choices of endmember set {Xk; k = 1,M} may be assessed using eq. (10). Coulter [2] chooses endmember set that maximizes r0. As written (12) is the cross-correlation of two unit vectors:

rm = i=1Nmŷ i+mẑi 1 rm 1  where (13) ẑi = k=1Ma k(Xik Xk ̄) i=1N k=1Mak(Xik Xk ̄) 2 (14)

which is independent of the normalization of a. Likewise, if y is spectrum at a particular image pixel, then the unit normalization ŷ rends the rm somewhat independent of shadow and striping.

Equation (14) makes rm non-linear in the ak; eq (13) may be maximized w.r.t the ak by the non-linear constrained optimizer of one’s choice. It does not give much insight into the relative band-to-band errors inherent in y. As written, it assumes they are all equal, and we can pretend:

σ2 = 1 N i=1N(ŷ i ẑi)2 (15)

Given spectrometer calibration, we can do better than this.

4 Linear and Non-Linear Least-Squares Fitting

Least-Squares Fitting Since rm is expressed in terms of unit vectors, maximizing r0 = ŷ ẑ is equivalent to minimizing

(ŷ ẑ)2 = ŷ ŷ + ẑ ẑ 2ŷ ẑ (16) = 2(1 ŷ ẑ) (17)

Minimizing the squared-difference of unit vectors ŷ and ẑ in (16) is still a non-linear problem in the ak because of how they appear in the denominator of the normalization of ẑ (14). However, if we relax the restrictions that z be a unit vector (and that ak 1), we can define the equivalent problem

min F(ŷ; a1...aM) = ŷ k=1Ma kXk 2 (18)

which is linear in the ak, but not yet quite as general as we want. At each band i there are (at least) three uncertainties in the sensor measurement: those in the received radiance yi, those in the center wavelength xi, and those in the sensor’s bandwidth, assumed as a fwhm of a gaussian. Fwhm is typically on order of band-to-band wavelength spacing. Fwhm errors are usually of secondary significance and will be ignored.

If we allow for uncertain band centers xi and assume normal independent distributions, we can write

P(xi0,y i0) = e(yi0y(xi))2 2σyi2 e(xi0xi)2 2σxi2 (19)

as joint probability sensor records band radiance yi0 at recorded wavelength xi0, when it actually received (unknown) radiance yi at actual (and unknown) wavelength xi. Joint probability of pixel’s measured spectrum across N bands is

P(x0,y0) = i=1Ne1 2 yi0y(xi) σyi 2+xi0xi σxi 2 (20)


χ2 = 2 log P(x0,y0) (21) = i=1N yi0 y(x i) σyi 2 + xi0 x i σxi 2 (22)

Associate the (unknown) actual spectrum y(xi) with the modeled mixture z(xi; a1,aM) = k=1Ma kXk(xi). Then

χ2 = i=1N yi0 z(x i; a1,aM) σyi 2 + xi0 x i σxi 2 (23)

represents a constrained optimization problem wherein we wish to minimize χ2 as a function of N + M variables (a1,aM,x1,xN) subject to ai 0i. The motivation for including the (x1,xN) is to (hopefully) obtain ”more” non-negative ak in the optimal solution.

Normal Equations

χ2 ak = 0k = 1,M (24) χ2 xi = 0i = 1,N (25)


χ2 ak = 2 i=1N yi0 z(x i; a1,aM) σyi2 z(xi; a1,aM) ak (26)


χ2 xi = 2 i=1N yi0 z(x i; a1,aM) σyi2 z(xi; a1,aM) xi + xi xi0 σxi2 (27)


z(xi; a1,aM) = j=1Ma jXj(xi) (28) z(xi; a1,aM) ak = Xk(xi) (29) z(xi; a1,aM) xi = j=1Ma jXj(x) x x=xi (30)

Eqs (28) and (30) make eq. (27) quadratic in the ak, so the optimization including center wavelengths uncertainties is non-linear. The Xj(x) x can be computed efficiently enough, but the Xk(xi) will need to be reconvolved against the new center wavelengths for each new set of center wavelengths {xi; i = 1N} required during the optimization. The problem will obviously be much simpler if we assume the provided center wavelength values are ”good enough” and simply ignore eqs (27). Dropping the superscript 0, equations (26) then become

χ2 ak = 2 i=1N 1 σyi2 yi j=1Ma jXj Xk = 0 (31)

which are the usual linear least-squares normal equations for the ak. For what its worth, for the unconstrained problem the minimal solution vector {ak; k = 1,M} will be unique provided the matrix Mij = Xi Xj is non-singular.

5 Image Classification and Statistics

5.1 Image Mean Vector and Covariance Matrix

Image Mean Vector and Covariance Matrix [11, White 2005] If image has P bands and N pixels, the mean vector is

mp = (m1,m2,...mP )T = 1 N j=1Nf j (32)

where fj is the jth pixel vector of the image

fj = (fi,f2,...fP )jT (33)

The image covariance matrix C PxP is

C PxP = 1 N 1 F PxN M PxN F PxN M PxN T (34)

where F PxN is the matrix of N pixel vectors each of length P

F PxN = f1,f2,f3,...,fN (35)

M PxN is the matrix of N identical mean vectors (P rows by N columns):

M PxN = mP ,mP ,mP ,...,mP = mP 1 PxN (36)

where 1 PxN is an P × N matrix of ones.

5.2 Principal Component Transformation

Principal Component Transformation [7, Smith 1985] Karhunen-Loeve Transformation[12, White 2005] GRASS imagery i.pca: Let

X PxN = (F PxN M PxN) zero-mean image matrix (37)
Z PxN = A PxPX PxN (38)

F  PxN
input-image multi-pixel vector (P bands by N pixels)
M  PxN
mean vector matrix,
Z  PxN
output-image multi-pixel vector,
A  PxP
P × P matrix whose rows are the eigenvectors of the covariance matrix C PxP, arranged by decreasing magnitude of eigenvalue, as typically returned by SVD routines.
Zik = j=1P a ijXjki = 1, 2P; k = 1, 2N (39) ZkT = A PxPX kT  orthogonal w.r.t the N   pixels: (40)

λiδil = k=1NZ ikZlk (41) λi = k=1N j=1P a ijXjk m=1P a lmXmk (42) = j=1P m=1P a ijalm k=1NX jkXmk (43) j=1P m=1P a ijalmCjm (44) = aiT C PxPa l (45)

C PxPthe symmetric positive-definite image covariance matrix
ai are its orthonormal eigenvectors with eigenvalue λi.

Magnitudes of λi impose ordering on transformed component vectors Zik = j=1P a ijXjk. Those with the largest λi s.t. λiλmax > tol are the Principal Components. tol should be related to the noise floor.

5.3 Minimum Noise Fraction

Minimum Noise Fraction[10, pg. 38] [3, ] We wish to find a particular coefficient matrix {aij; i,j = 1,P} that in some sense maximizes the image S/N, assuming the image pixel vectors {Xk,k = 1,N} are the sum of uncorrelated signal and noise:

Xik = Sik + Niki = 1,Pk = 1,N (46) Zik = j=1P a ijXjk = j=1P a ij(Sjk + Njk) (47) ZiT = a iT X PxN     where        a iT = (a i1,ai2,aiP ) (48) = aiT S PxN + a iT N PxN (49)


R = V ar(Zi signal) V ar(Zi noise) = (aiT S PxN)(a iT S PxN)T (aiT N PxN)(aiT N PxN)T (50)
R = (aiT S PxN)(a iT S PxN)T (aiT N PxN)(aiT N PxN)T = aiT (S PxNS PxNT )a i aiT (N PxNN PxNT )ai (51) = aiT C PxPSa i aiT C PxPNai = aiT (C PxP C PxPN)a i aiT C PxPNai (52) = aiT C PxPa i aiT C PxPNai 1;         IF        C PxP = C PxPS + C PxPN (53) = λi 1 (54)

where λi is generalized eigenvalue of C PxP wrt C  PxPN, and ai are corresponding generalized eigenvectors. Compare with PCA:

λiPCA = a iT C PxPa l        (eq. 45) (55)

Noise Covariance C PxPN Green suggests C PxP be of unit variance and band-to-band uncorrelated.

Homogeneous Area Method [9, sec 2.9.1] If possible find homogenous area of Nh pixels in image:

M L = 1 Nk k=1Nh Xk            local mean, vector over bands (60) σ L = 1 Nk 1 k=1Nh (Xk M L)2 12 (61) σLi = 1 Nk 1 k=1Nh (Xik MLi)2 12 (62) C Lij = 1 Nk 1 k=1Nh (Xik MLi)(Xjk MLj)       (general) (63) = δij Nk 1 k=1Nh (Xik MLi)2          (zero band-to-band) (64)

Local Means and Local Variances [9, sec 2.9.2]

  1. Divide image into small Nb pixel blocks (4x4, 5x5,...)
  2. For each (block, band) get local mean and variance: M Li = 1 Nb k=1Nb Xiki = 1, 2,P   bands (65) σ Li2 = 1 Nb 1 k=1Nb (Xik M Li)2 (66) C Lij = (δij) Nk 1 k=1Nb (Xik M Li)(Xjk M Lj) (67)

    C is the local Nb × Nbcovariance matrix.

  3. bin {σ Li2} into classes between band min and max values.
  4. bin with most blocks represents mean noise of image. Hope this bin is same for all bands...

Local Means and Local Variances (con’t)

  1. Suppose ”most popular” bin is a P-cube, each side ranging [σi,σ i + σ i], contains N points. Then the average value over the bin
    C Lij¯ = 1 N k=1N (C Lij)k (68)

    is desired noise covariance matrix.

  2. Caveat: Assumes image is ”slowly varying enough” that enough of the N total blocks are homogeneous, i.e. their covariance is really due to noise, not true features in the image. Blocks and bins must both be ”small enough” – but not too small!

Other methods: ”unsupervised training” derived-endmember classification schemes e.g. LAS’ search ([13, ]) and GRASS’ cluster/maxlik are based upon local covariance minimization.

6 Vertex Component Analysis

Vertex Component Analysis is an “unsupervised training” derived-endmember classification scheme. We use the notation of [5, Nascimento and Dias], which is slightly different than that used in previous sections.

6.1 VCA Algorithm

Assume linear mixing, and let

N number of pixels in image
L number of spectral bands recorded at each pixel
p number of endmembers, p L, and usually p L for hyperspectral images.
mj is an endmember spectral vector of length L, 1 j p
γi is a scale factor modeling illumination variability due to surface topography at pixel i
αi = [αi1,αi2,...αip]T is the abundance vector containing the fractions of each endmember at pixel i. Positivity: αik 0 and 1pT α i = 1 ( k=1pα ik = 1).
xi is the “true” noise-free spectral vector of length L at pixel i.
M Lxp = [m1,m2,mp] is an Lxp mixing matrix that maps an abundance vector α to a “true” spectral vector x.

Then the recorded spectral vector at pixel i may be given by

ri = xi + ni = Mγiαi + nii = 1,N (69) = xi + ni = Msi + nii = 1,N (70) si γiαi scaled abundance vector (71)

Our goal is to find a abundance vectors {αi,i = 1,N} corresponding to some endmember set {mj,j = 1,p}. An appropriate endmember set {m} is to be determined as part of the VCA algorithm. Endmember identification – the matching one or more of the VCA generated endmembers to actual specimen spectral samples such as from USGS spectral library or field ground-truth sampling – may be done in a subsequent processing step.

Since the set {α p : 1T α = 1,α 0} is a simplex, the set Sx = {x L : x = Mα,1T α = 1,α 0; γ 0} is also a simplex. However, even assuming n = 0, th observed vector set belongs to convex cone Cp = {r L : r = Mγα,1T α = 1,α 0; γ 0} owing to different scale factors γ.

But the projective projection of the convec cone Cp onto a properly chosen hyperplane is a simplex with vertices corresponding to the vertices of simplex Sx. The simplex Sp = {y L : y = r(rT u),r C p} is the projective projection of the convex cone Cp onto the hyperplane rT u = 1, where the choice of u assures there are no observed vectors r orthogonal to the hyperplane: riT u|u| < |r|,1 = 1,...N.

6.1.1 SNR

VCA algorithm accuracy is dependent on image SNR. Several ways to estimate:

6.1.2 Algorithm 1: Vertex Component Analyis (VCA)


R [r1,r2,rN] is the image matrix of N pixel vectors rn each of length L spectral bands.
p, desired dimensionality of problem, the number of endmembers. p may be estimated as the number of “large” eigenvalues of the image covariance matrix. p L.
 SNR th is the fixed threshold value 101.5L of the SNR measured wrt the signal subspace. Different values might be used for different SNR estimators. See above.


M̂ is the L × p estimated mixing matrix returned by VCA.


[M̂]:,j is the jth column of M̂
[M̂]:,i:k are the ith through kth columns of M̂
[X]:,[index] is the matrix formed from the columns of X listed in index vector [index]
A+ denotes the pseudo-inverse of pxp matrix A as computed by SVD. A+ is what you think it is. Write the SVD as A = UΣVT where Σ is the pxp diagonal matrix whose diagonal elements are the (ordered) singular values: Σ =  diag{w1,w2,wp} Then the pseudo-inverse may be written A+ = VΣ+UT where Σ+ =  diag{w 1+,w 2+,w p+} and the pseudo-inverse of any scalar w+ = 0 if w 0 and w+ = w1 otherwise. Thus A+ = A1 if A is non-singular. If P = AA+ and Q = A+A, then

See http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse In particular “ the pseudoinverse for matrices related to can be computed by applying the ShermanMorrisonWoodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms[12][13] exist that exploit the relationship.” – a fact that might be quite useful here.

Ud is an Lxd matrix whose columns are the first d eigenvectors of either RT R (which is LxL) or the LxL spectral covariance matrix C. Ud is the left eigenmatrix computed by SVD whose columns are ordered (left to right) in order of decreasing singular value. Its first d columns are the most significant (principal) relative to the rest. We usually select d such that the singular values ωi : {i > d} are in some sense “too small to be significant”.

1: Compute SNR
2: if ( SNR >  SNR th) then

d := p; (72) X := UdT R; U d obtained by SVDR is LxN andX is dxN (73) u :=  mean(X); u is a 1 x d vector (74) [Y]:,j := [X]:,j([X]:,jT u);  {projective projection} (75)


d := p 1; (76) [X]:,j := UdT ([R] :j r̄); Ud obtained by PCAR is LxN andX is dxN (77) c := arg max j=1N[X]:,j; (78) c := [c,cc]; c is a 1 x N vector (79) Y := [cX] (80)

end if
14: A := [eu,0,00]; eu = [0,, 0, 1]T initialize A, a pxp auxiliary matrix, and A+ by SVD
15: for (i = 1 top) do

w :=  randn(0,Ip); {w is a zero-mean random Gaussian (normal) vector of varianceIp} (81) f := ((I AA+)w)((I AA+)w); {f is a vector orthonormal to the subspace spanned by[A] :,1:i} (82) v := fT Y; (83) k := arg max j=1,,N|vj|;  {find the projection extreme, the index of the component of Y which has the maximum projection in the f direction} (84) [A]:,i := [Y]:,k;  (if i = 1 find A+ by SVD, else by an update algorithm) (85) [indice]i := k;  {store the pixel index of the extreme, i.e. the i-th endmember} (86)

22: end for
23: if ( SNR >  SNR th) then
24: M̂ := Ud[X]:,[indice]; M̂ is a L × p estimated mixing matrix.
25: else
26: M̂ := Ud[X]:,[indice] + r̄; M̂ is a L × p estimated mixing matrix.
27: end if


[1]   C.-I Chang and Q. Du. Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Transactions on Geoscience and Remote Sensing, 42(3):608–619, March 2004.

[2]   David W. Coulter. Remote Sensing Analysis of Alteration Mineralogy Associated with Natural Acid Drainage in the Grizzly Peak Caldera, Sawatch Range, Colorado. PhD thesis, Colorado School of Mines, Golden, Colorado, 2006.

[3]   A.A. Green, M. Berman, P. Switzer, and M.D. Graig. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. Journal of Geophysical Reseach, 90:797 – 804, 1988.

[4]   Fred A. Kruse. Comparison of aviris and hyperion for hyperspectral mineral mapping. http://w.hgimaging.com/PDF/Kruse_JPL2002_AVIRIS_Hyperion.pdf, 2002.

[5]   José M. P. Nascimento and José M. Bioucas Dias. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(4), April 2005.

[6]   William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C. Cambridge University Press, Cambridge, New York, Port Chester, Melbourne, Sidney, 1988.

[7]   M.O. Smith, P.E. Johnson, and J.B. Adams. Quantitative determination of mineral types and abundances from reflectance spectra using principal component analysis. IEEE Transactions on Geoscience and Remote Sensing, 36:65 – 74, 1985.

[8]   Frank D. van der Meer. Extraction of mineral absorbtion features from high-spectral resolution data using non-parameteric geostatistical techniques. International Journal of Remote Sensing, 15:2193–2214, 1994.

[9]   Frank D. van der Meer and Steven M. de Jong. Imaging Spectroscopy. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.

[10]   Frank D. van der Meer, Steven M. de Jong, and W. Bakker. Imaging Spectroscopy: Basic analytical techniques, pages 17–62. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.

[11]   R. A. White. Image mean and covariance: http://dbwww.essc.psu.edu/lasdoc/user/covar.html, 2005.

[12]   R. A. White. Karhunen-loeve transformation: http://dbwww.essc.psu.edu/lasdoc/user/karlov.html, 2005.

[13]   R. A. White. Search unsupervised training site selection: http://dbwww.essc.psu.edu/lasdoc/user/search.html, 2005.