Optimization in Spectral Unmixing

Optimization in Spectral Unmixing

Edward Leaver
Icarus Resources LLC

June 15, 2009



1 Introduction
2 Cross-Correlation Spectral Matching
3 Optimized Cross-Correlation Spectral Matching
4 Linear and Non-Linear Least-Squares Fitting
5 Image Classification and Statistics
 5.1 Image Mean Vector and Covariance Matrix
 5.2 Principal Component Transformation
 5.3 Minimum Noise Fraction

1 Introduction


Remote Sensing Analyis of Alteration Minerology Associated With Natural Acid Drainage
in the Grizzly Peak Caldera, Sawatch Range, Colorado
David W. Coulter
Ph.D. Thesis Colorado School of Mines, 2006

The Gizzly Peak Caldera is located Colorado’s Mineral Belt approximately 15 miles southeast of Aspen. It is the product of a volcanic eruption 3̃5 million years ago that ejected some 600  km3 of magma. With a volcanic explosive index of 7, it was at least four times larger than Tambora.[1, sec 1.5]


Figure 1: West Red (Ruby Mountain) From Enterprise Peak

Coulter sought to identify acidic conditions by the different weathering states of iron oxide, which forms Hematite (red) in acidic conditions, and Goethite (lite yellow) and Jarosite (dark yellow) as pH increases, Jarosite being the most immediate oxidation product of iron sulfide (pyrite).


Figure 2:                                 Red: Hematite    Green: Goethite    Blue: Jarosite
                                          (low pH)                                       (high pH)
                                              West Red Iron Endmembers from Aviris

Note on Partial Unmixing:

”Unmixing is critical in imaging spectrometry since virtually every pixel contains some macroscopic mixture of materials. The theory of, and methods for unmixing of spectroscopic signatures are found in a number of sources. Hapke (1993) provides models for linear and non-linear spectral mixing and a discussion of the criteria for using each approach. In Earth remote sensing, a linear mixing model is typically used. Full unmixing which assumes that the spectra of all possible components are known, is described by van der Meer (2000) and Boardman (1989). Since it is often impossible to identify, a priori, all possible components, partial unmixing is an important tool. Match Filtering (MF) (Harsanyi and Chang 1994), Constrained Energy Minimization (CEM similar to Match Filtering) (Farrand and Harsanyi 1997; Farrand 2001), and Mixture Tuned Match Filtering ( MTMF tm) (Boardman et al. 1995) are commonly used methods for partial unmixing.  MTMF tm is probably the most popular unmixing method used for geologic remote sensing.”[1, Coulter sec 1.4.4]

(Emphasis Added.)

2 Cross-Correlation Spectral Matching

Cross-Correlation Spectral Matching

Linear Correlation Coefficient (Pearson’s r):

r = i=1N(y i ȳ)(zi z̄) i=1N(yi ȳ)2 i=1N(zi z̄)2 1 r 1[3,eq.(13.7.1)]. (1)

See [3, eq. (13.7.1)]. Generalize: cross-correlation at position m < N:

rm = i=1Nm(y i+m ȳ)(zi z̄) i=1Nm(yi ȳ)2 i=1Nm(zi z̄)2[7,eq.(33)]. (2)

Ref. [7, eq. (33)]. Measures of fit:

r0 relative to 1 is not particularly good, but
t = rN 2 1 r2[3,eq.(13.7.5)ff.] (3)

(see [3, eq. (13.7.5) ff.]) is distributed in null case as Student’s A(t|N 2).

χ2 statistic:
χ2 = (1 r2) i=1N(y i ȳ)2[3,eq.(14.2.13)] (4)

See [3, eq. (14.2.13)]. For non-uniform weights, weight the sums in eq. (1) by 1σi2. Then goodness-of-fit

Q = Q N 2 2 , χ2 2 [3,eq.(14.2.12)] (5)

([3, eq. (14.2.12)]) where the incomplete gamma function Q(a,x) is

Q(a,x) = Γ(a,x) Γ(a) = 1 Γ(a)xetta1dt (6) Q(a, 0) = 1             and                Q(a,) = 0[3,eq.(6.2.3)] (7)

See [3, eq. (6.2.3)]. If e.g. Q > 0.1, then the goodness-of-fit is believable. If Q is larger than, say, 0.001, the fit may be acceptable if the the errors are non-normal or have been moderately underestimated. If Q < 0.001 then question the model and/or estimation proceedure eq. (2). If latter, consider robust estimation.

Other useful measures Skewness

Skewness: cross-correlogram of perfect match is parabola:
Skew(rm) = 1 M m=1M(r m rm)(2) 0 (8)
RMS Skewness error [7, eq. (34)]:
RMS = m=0M(rm rm)2 M (9)

where rm is auto-correlogram of z with itself, M is number of match positions, m is match number.

Another possibly useful statistical measure, if (y,z) is binormal and N is moderately large (N 10), is Fisher’s z-transform z = (12) ln[(1 + r)(1 r)]. Then if you measure the correlation coefficients of the unknown signal y against two reference signals (z1,z2), the significance of a difference between the two measured correlation coefficients r1 and r2 is given by the complementary error function [3, eq. (13.7.10)]
 erfc |r1 r2| 2 N13 + 2 N23 (10)

This can be useful when trying to assess whether a parameterized fit changes ”significantly” when a given change of parameter produces the two references (z1,z2).

Non-parametric or Rank Correlation (Spearman’s Rank-Order Correlation Coefficient). See [3, Sec. 13.8] and [5, ].

3 Optimized Cross-Correlation Spectral Matching

Optimized Cross-Correlation Spectral Matching In ”traditional” CCSM, the reference spectrum z to which y is compared is taken to be a single (pure) endmember.
Next assume M endmembers Xk of interest, and linear mixing. Seek weighting factors {ak; k = 1,M}

zi = k=1Ma kXik (11)

(synthesized pixel intensity at band i) that maximize the cross correlogram

rm = i=1Nm(y i+m ȳ)(zi z̄) i=1Nm(yi ȳ)2 i=1Nm(zi z̄)2 (12)

at match position m = 0, places that maximum at m = 0 , and minimizes its skew, subject to 0 ak 1 and k=1Ma k = 1. Significance of different values of r0 resulting from different choices of endmember set {Xk; k = 1,M} may be assessed using eq. (10). Coulter [1, ] chooses endmember set that maximizes r0. As written (12) is the cross-correlation of two unit vectors:

rm = i=1Nmŷ i+mẑi 1 rm 1  where (13) ẑi = k=1Ma k(Xik Xk ̄) i=1N k=1Mak(Xik Xk ̄) 2 (14)

which is independent of the normalization of a. Likewise, if y is spectrum at a particular image pixel, then the unit normalization ŷ rends the rm somewhat independent of shadow and striping.

Equation (14) makes rm non-linear in the ak; eq (13) may be maximized w.r.t the ak by the non-linear constrained optimizer of one’s choice. It does not give much insight into the relative band-to-band errors inherent in y. As written, it assumes they are all equal, and we can pretend:

σ2 = 1 N i=1N(ŷ i ẑi)2 (15)

Given spectrometer calibration, we can do better than this.

4 Linear and Non-Linear Least-Squares Fitting

Least-Squares Fitting Since rm is expressed in terms of unit vectors, maximizing r0 = ŷ ẑ is equivalent to minimizing

(ŷ ẑ)2 = ŷ ŷ + ẑ ẑ 2ŷ ẑ (16) = 2(1 ŷ ẑ) (17)

Minimizing the squared-difference of unit vectors ŷ and ẑ in (16) is still a non-linear problem in the ak because of how they appear in the denominator of the normalization of ẑ (14). However, if we relax the restrictions that z be a unit vector (and that ak 1), we can define the equivalent problem

min F(ŷ; a1...aM) = ŷ k=1Ma kXk 2 (18)

which is linear in the ak, but not yet quite as general as we want. At each band i there are (at least) three uncertainties in the sensor measurement: those in the received radiance yi, those in the center wavelength xi, and those in the sensor’s bandwidth, assumed as a fwhm of a gaussian. Fwhm is typically on order of band-to-band wavelength spacing. Fwhm errors are usually of secondary significance and will be ignored.

If we allow for uncertain band centers xi and assume normal independent distributions, we can write

P(xi0,y i0) = e(yi0y(xi))2 2σyi2 e(xi0xi)2 2σxi2 (19)

as joint probability sensor records band radiance yi0 at recorded wavelength xi0, when it actually received (unknown) radiance yi at actual (and unknown) wavelength xi. Joint probability of pixel’s measured spectrum across N bands is

P(x0,y0) = i=1Ne1 2 yi0y(xi) σyi 2+xi0xi σxi 2 (20)


χ2 = 2 log P(x0,y0) (21) = i=1N yi0 y(x i) σyi 2 + xi0 x i σxi 2 (22)

Associate the (unknown) actual spectrum y(xi) with the modeled mixture z(xi; a1,aM) = k=1Ma kXk(xi). Then

χ2 = i=1N yi0 z(x i; a1,aM) σyi 2 + xi0 x i σxi 2 (23)

represents a constrained optimization problem wherein we wish to minimize χ2 as a function of N + M variables (a1,aM,x1,xN) subject to ai 0i. The motivation for including the (x1,xN) is to (hopefully) obtain ”more” non-negative ak in the optimal solution.

Normal Equations

χ2 ak = 0k = 1,M (24) χ2 xi = 0i = 1,N (25)


χ2 ak = 2 i=1N yi0 z(x i; a1,aM) σyi2 z(xi; a1,aM) ak (26)


χ2 xi = 2 i=1N yi0 z(x i; a1,aM) σyi2 z(xi; a1,aM) xi + xi xi0 σxi2 (27)


z(xi; a1,aM) = j=1Ma jXj(xi) (28) z(xi; a1,aM) ak = Xk(xi) (29) z(xi; a1,aM) xi = j=1Ma jXj(x) x x=xi (30)

Eqs (28) and (30) make eq. (27) quadratic in the ak, so the optimization including center wavelengths uncertainties is non-linear. The Xj(x) x can be computed efficiently enough, but the Xk(xi) will need to be reconvolved against the new center wavelengths for each new set of center wavelengths {xi; i = 1N} required during the optimization. The problem will obviously be much simpler if we assume the provided center wavelength values are ”good enough” and simply ignore eqs (27). Dropping the superscript 0, equations (26) then become

χ2 ak = 2 i=1N 1 σyi2 yi j=1Ma jXj Xk = 0 (31)

which are the usual linear least-squares normal equations for the ak. For what its worth, for the unconstrained problem the minimal solution vector {ak; k = 1,M} will be unique provided the matrix Mij = Xi Xj is non-singular.

5 Image Classification and Statistics

5.1 Image Mean Vector and Covariance Matrix

Image Mean Vector and Covariance Matrix [8, White 2005] If image has P bands and N pixels, the mean vector is

mp = (m1,m2,...mP )T = 1 N j=1Nf j (32)

where fj is the jth pixel vector of the image

fj = (fi,f2,...fP )jT (33)

The image covariance matrix C PxP is

C PxP = 1 N 1 F PxN M PxN F PxN M PxN T (34)

where F PxN is the matrix of N pixel vectors each of length P

F PxN = f1,f2,f3,...,fN (35)

M PxN is the matrix of N identical mean vectors (P rows by N columns):

M PxN = mP ,mP ,mP ,...,mP = mP 1 PxN (36)

where 1 PxN is an P × N matrix of ones.

5.2 Principal Component Transformation

Principal Component Transformation [4, Smith 1985] Karhunen-Loeve Transformation[9, White 2005] GRASS imagery i.pca: Let

X PxN = (F PxN M PxN) zero-mean image matrix (37)
Z PxN = A PxPX PxN (38)

F  PxN
input-image multi-pixel vector (P bands by N pixels)
M  PxN
mean vector matrix,
Z  PxN
output-image multi-pixel vector,
A  PxP
P × P matrix whose rows are the eigenvectors of the covariance matrix C PxP, arranged by decreasing magnitude of eigenvalue, as typically returned by SVD routines.
Zik = j=1P a ijXjki = 1, 2P; k = 1, 2N (39) ZkT = A PxPX kT  orthogonal w.r.t the N   pixels: (40)

λiδil = k=1NZ ikZlk (41) λi = k=1N j=1P a ijXjk m=1P a lmXmk (42) = j=1P m=1P a ijalm k=1NX jkXmk (43) j=1P m=1P a ijalmCjm (44) = aiT C PxPa l (45)

C PxPthe symmetric positive-definite image covariance matrix
ai are its orthonormal eigenvectors with eigenvalue λi.

Magnitudes of λi impose ordering on transformed component vectors Zik = j=1P a ijXjk. Those with the largest λi s.t. λiλmax > tol are the Principal Components. Tolerance tol should be related to the noise floor.

5.3 Minimum Noise Fraction

Minimum Noise Fraction[7, pg. 38] [2, ] We wish to find a particular coefficient matrix {aij; i,j = 1,P} that in some sense maximizes the image S/N, assuming the image pixel vectors {Xk,k = 1,N} are the sum of uncorrelated signal and noise:

Xik = Sik + Niki = 1,Pk = 1,N (46) Zik = j=1P a ijXjk = j=1P a ij(Sjk + Njk) (47) ZiT = a iT X PxN     where        a iT = (a i1,ai2,aiP ) (48) = aiT S PxN + a iT N PxN (49)


R = V ar(Zi signal) V ar(Zi noise) = (aiT S PxN)(a iT S PxN)T (aiT N PxN)(aiT N PxN)T (50)
R = (aiT S PxN)(a iT S PxN)T (aiT N PxN)(aiT N PxN)T = aiT (S PxNS PxNT )a i aiT (N PxNN PxNT )ai (51) = aiT C PxPSa i aiT C PxPNai = aiT (C PxP C PxPN)a i aiT C PxPNai (52) = aiT C PxPa i aiT C PxPNai 1;         IF        C PxP = C PxPS + C PxPN (53) = λi 1 (54)

where λi is generalized eigenvalue of C PxP wrt C  PxPN, and ai are corresponding generalized eigenvectors. Compare with PCA:

λiPCA = a iT C PxPa l        (eq. 45) (55)

Noise Covariance C PxPN Green suggests C PxP be of unit variance and band-to-band uncorrelated.

Homogeneous Area Method [6, sec 2.9.1] If possible find homogenous area of Nh pixels in image:

M L = 1 Nk k=1Nh Xk            local mean, vector over bands (60) σ L = 1 Nk 1 k=1Nh (Xk M L)2 12 (61) σLi = 1 Nk 1 k=1Nh (Xik MLi)2 12 (62) C Lij = 1 Nk 1 k=1Nh (Xik MLi)(Xjk MLj)       (general) (63) = δij Nk 1 k=1Nh (Xik MLi)2          (zero band-to-band) (64)

Local Means and Local Variances [6, sec 2.9.2]

Divide image into small Nb pixel blocks (4x4, 5x5,...)
For each (block, band) get local mean and variance: M Li = 1 Nb k=1Nb Xiki = 1, 2,P   bands (65) σ Li2 = 1 Nb 1 k=1Nb (Xik M Li)2 (66) C Lij = (δij) Nk 1 k=1Nb (Xik M Li)(Xjk M Lj) (67)

C is the local Nb × Nbcovariance matrix.

bin {σ Li2} into classes between band min and max values.
bin with most blocks represents mean noise of image. Hope this bin is same for all bands...

Local Means and Local Variances (con’t)

Suppose ”most popular” bin is a P-cube, each side ranging [σi,σ i + σ i], contains N points. Then the average value over the bin
C Lij¯ = 1 N k=1N (C Lij)k (68)

is desired noise covariance matrix.

Caveat: Assumes image is ”slowly varying enough” that enough of the N total blocks are homogeneous, i.e. their covariance is really due to noise, not true features in the image. Blocks and bins must both be ”small enough” – but not too small!

Other methods: ”unsupervised training” derived-endmember classification schemes e.g. LAS’ search ([10, ]) and GRASS’ cluster/maxlik are based upon local covariance minimization.


[1]   David W. Coulter. Remote Sensing Analysis of Alteration Mineralogy Associated with Natural Acid Drainage in the Grizzly Peak Caldera, Sawatch Range, Colorado. PhD thesis, Colorado School of Mines, Golden, Colorado, 2006.

[2]   A.A. Green, M. Berman, P. Switzer, and M.D. Graig. A transformation for ordering multispectral data in terms of image quality with implications for noise removal. Journal of Geophysical Reseach, 90:797 – 804, 1988.

[3]   William H. Press, Brian P. Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes in C. Cambridge University Press, Cambridge, New York, Port Chester, Melbourne, Sidney, 1988.

[4]   M.O. Smith, P.E. Johnson, and J.B. Adams. Quantitative determination of mineral types and abundances from reflectance spectra using principal component analysis. IEEE Transactions on Geoscience and Remote Sensing, 36:65 – 74, 1985.

[5]   Frank D. van der Meer. Extraction of mineral absorbtion features from high-spectral resolution data using non-parameteric geostatistical techniques. International Journal of Remote Sensing, 15:2193–2214, 1994.

[6]   Frank D. van der Meer and Steven M. de Jong. Imaging Spectroscopy. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.

[7]   Frank D. van der Meer, Steven M. de Jong, and W. Bakker. Imaging Spectroscopy: Basic analytical techniques, pages 17–62. Kluwer Academic Publishers, Dordrecht, Boston, London, 2001.

[8]   R. A. White. Image mean and covariance: http://dbwww.essc.psu.edu/lasdoc/user/covar.html, 2005.

[9]   R. A. White. Karhunen-loeve transformation: http://dbwww.essc.psu.edu/lasdoc/user/karlov.html, 2005.

[10]   R. A. White. Search unsupervised training site selection: http://dbwww.essc.psu.edu/lasdoc/user/search.html, 2005.