Icarus Image Vertex Component Analysis (VCA)

Edward W. Leaver

Last update 6 July 2024

Figure 1: Ruby Mountain a.k.a. “West Red” as seen looking east from Enterprise Peak a.k.a. Larson’s Mountain. “East Red” (Mount Blaurock) in background, upper right. Garfield Peak is the craggy highpoint to the left of Ruby. Chaffee County, Colorado. (Photograph by author, August 2008)

Preface and Acknowledgement. Icarus Image and my own studies into hyperspectral image processing were inspired by David Coulter’s Remote Sensing Analysis of Alteration Mineralogy Associated with Natural Acid Drainage in the Grizzly Peak Caldera, Sawatch Range, Colorado, Colorado School of Mines doctoral dissertation, 2006. Apart from the obligatory Cuprite cameo (16), all aerial photography and imagery used herein were taken directly from Coulter’s dissertation, which represents a substantial effort on Coulter’s part organizing his acid-drainage study, conducting the ground mineralogical survey, and coordinating the remote-sensing overflights by the Jet Propulsion Laboratory’s AVIRIS aircraft and SpectIR Corporation’s HST instrument. My own desk-efforts in no way compare with David Coulter’s.

1 Introducing Hyperspectral Imaging and End Member Matching
2 Icarus Image VCA: Vertex Component Analysis with Simplex Refinement
3 Prelude to Spectroscopy
4 Picture Show
5 Initial Synthetic Image Tests
6 Next Steps
A Synthetic Image Tests

1 Introducing Hyperspectral Imaging and End Member Matching

The human eye is sensitive to three primary colors, red, green, and blue, broadly centered around wavelengths 560, 530, and 420nm, with considerable overlap. In contrast, hyperspectral imaging cameras, such as the Jet Propulsion Laboratory’s airborne AVIRIS sensor, may record up to 224 separate wavelengths in narrow bands between 370 (near UV), and 2500nm (near-thermal infrared) with little overlap, thereby affording detailed spectroscopy of reflected light and identification of the chemical composition of the reflecting surface. Hyperspectral or Multispectral Imaging may also feature in some MRI, CT, and LIDAR applications, but airborne and space-based remote sensing are the best-known examples.

Figure (1) is a ground photograph of Coulter’s field targets. Ruby Mountain (aka Red Mountain or “West Red”) and its “East Red” companion lie in the southwest part of Colorado’s Grizzly Caldera, an extensive volcanic feature in Chaffee County fifteen miles southeast of Aspen. Ruby Mountain and mine are readily found on Google Earth.¹ Photo was taken from Enterprise Peak (Larson’s Mountain), on the Sawatch Range separating Grizzly Caldera from Taylor Park. Foreground drainage is the southern end of Lincoln Gulch, which drains north (left) to the Roaring Fork on the western approach to Independence Pass. The distinctive orange color is characteristic of Hematite (a.k.a. “rust”), the primary natural oxidation state of iron. But there are contributions from Goethite and Jarosite as well, which form when iron weathers under more acidic conditions. Coulter sought to map, by remote sensing imagery, the geographic distribution of these three iron oxides in the target areas, and thereby characterize naturally occuring acid seeps.

[Picture]

Figure 2: Visible-IR spectrum of a 2.5mx2.5m pixel near the summit of West Red, together with spectra of optimal mixtures of the three principal iron oxide end-members.

Figure (2) represents a simple example of supervised end-member classification. In this case the supervisor (David Coulter) painstakingly took “ground-truth” samples from select locations in the target area, and identified Hematite, Goethite, and Jarosite as the end-members of interest. These three spectrally pure endmembers were determined by mineralogical estimation of the selected surface rocks, their spectra measured in the lab and recorded for later matching against many (possibly millions) such pixels in the hyperspectral images.

The target pixel’s spectrum is shown in red, the spectrum resulting from the best fits of the three end members by least-squares fitting and by our implementation of the Optimized Cross-Correlation Matching (OCCM) algorithm used by Coulter are shown in blue and green respectively, with vertical offsets added for clarity. The lowest-most curve illustrates goodness-of-fit (Pearson’s $R = 0.998$ ). Matching was done over two wavelength bands: 370-1100 nm, and 2000-2440 nm. The intervening 1100-2000 nm range is characterized by highly variable water absorbtion, and is frequently excluded from this sort of study. (Hence the figure’s flat plateau in this region.)

2 Icarus Image VCA: Vertex Component Analysis with Simplex Refinement

In contrast to OCCM, Icarus Image is part of an ongoing research project to investigate algorithms for unsupervised hyperspectral endmember classification, that is, how to identify and map chemical endmember spectra in an image in absence of local ground-truth sampling. This problem is more difficult, for without ground truth a wider selection of endmember spectra must be applied to match the spectra of each pixel in the image. Restricting the endmember set to a the (relatively) few compounds appropriate to a particular image region becomes an interesting problem.

A common approach, which we follow here, is to first estimate the number M of spectrally distinct endmembers in the area of interest – itself a good problem – then attempt to find a minimal-volume M-dimensional simplex in spectral space that best contains all (or nearly all) the individual spectra from each image pixel. Under the linear-mixing assumption the spectra of each pixel contained within the simplex may then be expressed through a (relatively) low-dimension mixing matrix as a linear combination of the spectra represented by the vertices of the simplex. Linear mixing is generally valid for the remote sensing reflectance spectroscopy considered here, but might not be in all applications. Mathematically, given a hyperpectral image matrix $F_{LxN}$ of $N$ pixels and $L$ spectral channels, we wish to solve

F_{LxN} = {M i x}_{L \times V_{d}} * {A b u n d a n c e}_{V_{d} \times N}

(1)

for a mixture matrix Mix whose column vectors each represent the spectrum of a linear combination of distinct endmembers, and where $P \leq L$ is the number of spectrally distinct endmembers in the scene. All coefficients of Mix must be non-negative, and the norm of each column vector is usually required to be 1. If each column of R is also so normalized, then likewise will be the Abundance matrix

{M i x}_{V_{d} \times L}^{- 1} * F_{LxN} = {A b u n d a n c e}_{V_{d} \times N}

(2)

The parameter $V_{d}$ is termed the virtual dimensionality. Determining an appropriate value for a particlar image is part of the problem. The channel length $L$ can be excessively large: up to 224 for AVIRIS airborne cameras and 2048 for modern MRI. Dimensionality reduction techniques include slective subsampling, Principle Component Analysis and, if additional noise properties are known, Minimum Noise Fraction. Eq. (1) is then reduced to smaller row dimension:

R_{S_{d} \times N} = {M i x}_{S_{d} \times V_{d}} * {A b u n d a n c e}_{V_{d} \times N}

(3)

where $P = S_{d}$ , the subspace dimension with $P \leq L$ and usually $P ≪ L$ . $S_{d}$ and $V_{d}$ are often the same.

Earlier approaches include the N-Findr algorithm proposed by Winter in 1999[4]. N-Findr finds a maximum-volume simplex whose vertices lie within the set of image pixels. A related approach is Vertex Component Analyis (VCA)[2], which we incorporate here as an intermediate step.

Related to our final simplex optimization (SimFit) is the Shrink-Wrap approach discussed by Fuhrmann[1]. But although our goals are similar, the two are not the same. In particular, Shrink-Wrap reportedly can yield simplices that completely contain all the pixel spectra in the area of interest, which is not necessarily what one desires. First there is sensor noise. Even a few mis-read pixels may result in an unsuitably large simplex with vertices in spectral space lying far outside the chemically-realizable spectral set of interest. Second is the possibility of a relatively small number of pixels from endmember compounds that either are not of interest and could expand the simplex in directions not appropriate to the area at large, or which are of interest but whose small number warrant separate independent analysis and classification.

Further noise reduction is afforded by the nature of the SimFit algorithm itself: the simplex vertices are defined by the orientation and location of its faces, which are determined from a weighted average of pixels nearby to each face. The contributions from each member in the set of adjacent pixels are weighted by distance and summed, which tends to smooth out individual pixel noise.

Icarus Image is a research project whose principal algorithmic goal is to find a minimal-volume spectral simplex that contains the spectra of almost all the pixels in the area of interest, and can identify those pixels that lie far outside. “Almost” and “far” are quantifiable parameters of the problem. As with nearly all M-dimensional optimizers, our minimal-volume solution simplices will not be unique, but for a given dataset each solution should yield nearly the same final chemical end-member mixing matrix.

To this end Icarus Image’s graphical interface serves as a flexible test platform, enabling rapid changes in program option editing and display. At this point (November 2023) the GUI is completely standalone, with ability to export image and spectral signature files to Ball Aerospace’ Opticks GIS.

Programmatically, our VCA (Vertex Component Analysis) and SimFit (simplex optimization and refinement) functionalities are implemented in a shared code library, for which there are separate GUI and command-line drivers. The latter allows rapid and scripted execution of simple test cases for development, and batch processing when desired. The Qt GUI introduces flexibility into the command line’s otherwise linear control flow, and enables convenient display of the input and output images. Briefly

An input hyperspectral image is selected. Format may be either ENVI Standard or GeoTiff.
Optionally, the coordinates of a desired sub-image of interest may be given either on the command line, in a configuration file, or set in the GUI’s Options Editor.
Optionally, a suitably georeferenced rgb or greyscale background aerial photo covering the same areas of interest may also be loaded. Rgb bands from the hyperspectral image or its derived PCA, VCA, or SimFit images may then be be overlayed and alpha-blended over the background image.
Dimensionality Reduction. We presently incorporate three traditional eigenthresholding based methods: Harsanyi-Farrand-Chang (HFC), noise-whitened HFC (NWHFC), and noise subspace projection (NSP) to determine the first M PCA bands most likely to represent the major hyperspectral endmembers. VCA-related Othogonal Subspace Projection (OSP), and Simplex Fitting-related condition-number based methods, are also under consideration. Dimensionalty-reduction possibilities are far from exhausted.

In the case of lower-dimension multispectral images, such as from the 16-band World View 3 satellite, or by downsampling hyperspectral images to multispectral dimensions, first-step PCA analysis is still possible if not strictly necessary. (Such down-sampling capability is in progress).
Run Vertex Component Analyis (VCA) to get an initial spectral simplex with vertices associated with specific and identifiable hyperspectral image pixels. Classic VCA starts with an arbitrary direction in spectral space that is not orthogonal to any image pixel spectra, then locates the suitably-projected image pixel spectral with the greatest projection in this chosen spectral direction. This pixel is the first VCA component vertex.

The second VCA vertex is detemined by choosing a second spectral dimension orthogonal to the spectral direction of the first pixel vertex, and finding the next projected pixel with greatest projection in this second spectral direction. A third spectral direction is choosen orthogonal to the spectral direction of the first two, and a third VCA vertex pixel found as that with greatest projected projection in this third direction.
The remaining M-3 VCA vertices are similarly found by recursion.
Vertex Component Analysis is rapid, and Really Neat – provided there neatly exist M image pixels that represent pure and distinct spectral (and chemical) endmembers. Here on planet earth such neatness is rarely obtained save in carefully constructed synthetic images. Most natural hyperspectral images will contain a large number of pixels whose spectral coordinates lie outside (usually far outside) the M-1 dimensional simplex defined by the M VCA coordinates.
Our goal here is to find a Mixing Matrix that defines the spectrum of each image pixel in terms of the spectra of the simplex vertices. But this is only meaningful for image pixels whose spectra lie inside the spectral simplex whose vertices are defined by the M VCA vertices. So while VCA is intuitively appealing, it somewhat lacks in practical geophysical application. Therefore...
Run Simplex Fitting optimization and refinement, which expands the spectral coordinates of the VCA simplex vertices such that the resulting spectral simplex contains most of the image pixels, where “most” is characterized by an input tuning parameter $α$ . In this respect the SimFit algorithm used may be thought of as an improved Shrink-Wrap, skirting the shrink-wrap issues mentioned above.

SimFit finds an optimal simplex with some pixels’ spectra lying on each side of each simplex face. By far most pixels will lie inside the final simplex when converged, and most of the (relatively few) outside pixels will not lie far outside, as illustrated in Figures (10) and (11). The remaining (presumably very few) far outliers, whether noise or otherwise, are another story. Their characterization is TBD.

Icarus Image screenshots are illustrated in Figures (3) through (9):

[Picture]

Figure 3: Icarus Image opens to a blank canvas. File menu allows selection of a program Options file and input images. The input image paths may themselves be set in the Options file, in which case the image file browser opens to those default paths. Others may still be selected.

[Picture]

Figure 4: Open USGS aerial photo of Grizzly Caldera region. The long south-north trending drainage on the far left (west) is Lincoln Gulch, running north to the Roaring Fork. Enterprise Peak (a.k.a. Larson’s Mountain), from which the photograph of Ruby Mountain and mine in Figure (1) was taken, appears in the far southwest corner. Then zoom to the Ruby Mountain (Coulter’s “West Red”) target.

[Picture]

Figure 5: Image overlay and (right) alpha blend where foreground alpha = 155/255

Open an AVIRIS hyperspectral image and enhance contrast. Then overlay its RGB components on the USGS background and adjust alpha channel to taste.

[Picture]

Figure 6: Options Editor

Edit some VCA Options, and read the complete hyperspectral data for the selected subimage and wavelength intervals. Prior to this we had read only three RGB channels: the AVIRIS camera records 224 channels spanning the visible and infrared; a 1.2 megapixel by 224 band ENVI file runs about a gigabyte. After “Estimate Virtual Dimensionality” is run, three or four virtual dimension estimates are displayed in the Options Editor; editable spinners are provided to make an actual selection. You may then select ”Run VCA” to crunch the algorithm.

A simple image band selector is shown at right. Grayscale is provided for visualization of single bands. Color normalization can be either over the entire image (photogenically correct), or over individual bands, which affords simple enhancement of more weakly reflective bands. Linear and non-linear contrast enhancement is also available..

[Picture]

Figure 7: Optimized Cross-Correlation Match result

The endmember map above was generated via the original OCCM algorithm, and the overlay image processed in Geographic Resources Analysis Support System (GRASS). As iron oxides were the target of David Coulter’s investigation, an NDVI of 0.15 was used to mask exposed soil and vegetation. Here the false colors represent reflectance contributions from iron minerals Hematite (red), Geothite (green), and Jarosite (blue) in the AVIRIS hyperspectral regions [374-1100] and [2000-2440] nm. Ground truth data for this image was laboriously field-sampled by Coulter for his 2006 School of Mines PhD dissertation. OCCM is a supervised algorithm, and while ground-truth is not a strict requirement, results are much easier to generate (and are more meaningful) if they are available.

[Picture]

Figure 8: Vertex Component Analysis result

In contrast VCA algorithm is unsupervised and should not require ground-truth to the same degree. However, strictly speaking VCA assumes each pure endmember spectrum is present in at least one pixel in the image, which is frequently not the case. This restriction may be alleviated via iterative refinement, illustrated below. Above left is the raw AVIRIS full-visible spectrum image (SPIE-mapped to RGB) laid over the USGS aerial photo background. The VCA-processed image in the center ignores NDVI, vegetation and soil show as blue and purple. Above right, vegetation and soil pixels are masked with NDVI greater then 0.15; here Goethite and Jarosite show as blue.

[Picture]

Figure 9: VCA, Refined VCA, and PCA results

Grayscale has its merit. Zooming to the sub-region of interest, the first VCA band (left) compared with its refinement (center), shows noticeable sharpening as this vertex is pushed toward spectral purity. 124 AVIRIS bands between [374, 1100] and [2000, 2440] nm are included, ndvi is 0.15 with range parameters as indicated in Figure (6). Although not directly comparable, the first PCA band is included (far right) for direct comparison. Virtual dimensionality was suggested to be 32 by HFC, 85 by noise-whitened HFC, and 42 by noise-subspace projection. For simplicity we chose a virtual dimension of 6, and 24 total PCA channels.

[Picture]

Figure 10: Histograms illustrate how the number of outlying image pixels – those with negative pixel values, indicative of the pixel’s spectral distance outside the spectral vertex simplex – decreases as vertex refinement proceeds. 6 virtual dimensions, 24 channels. Histograms are displayed in Quantum GIS for the VCA (left) and Refined Vertex mixing matrices (center and right), illustrating the progress of the vertex refinement algorithm. Such histograms may be useful aid in determining when refinement is “good enough”, and in the identification and characterization of outlying pixels.

[Picture]

Figure 11: Lastly, from the ever-popular “any image worth processing is worth overprocessing” category, we have corresponding histograms of 64 virtual dimensions and 72 channels.

3 Prelude to Spectroscopy

2015 efforts included refining the refinement algorithm, noting some limitations, and contemplating future modifications to avoid them. Some cpu-intensive portions were moved to custom CUDA kernels. GPU speedup was good enough to suggest other approaches were well within grasp. Year 2022 saw substantial speedups to both CPU and GPU portions of the code, and incorporation of an independent gradient optimizer for confirmation and comparison.

Spectral matching has progressed with introduction of Ball Aerospace’s Opticks GIS. Experience with the earlier OCCM supervised matching algorithms afforded some insight into what an automated matching program might entail; I was relieved to find a robust open-source GIS that will largely fill our needs.

Opticks is written with specific focus on multi- and hyper- spectral image analysis, with underlying image data structure and file formats tailored to that end. Its Spectral Plugin and Library Builder will likely be adequate as-is, after convolution of JPL/USGS spectral libraries to the resolution and band shapes of AVIRIS and other sensors. Some early results:

[Picture]

Figure 12: False color Red Mountain and Ruby Mine area of interest displayed in Opticks GIS with three visible - near IR bands of 884, 675, and 548 nm as RGB. Erstwhile “green” vegetation reflects strongly in the near infrared, hence the strong red signal at lower elevations (far right) on the mountain’s east flank. In the inset on right, the geographic locations of VCA vertices are indicated by yellow triangles.

[Picture]

Figure 13: In contrast to the full VNIR Opticks images shown above, an NDVI index of 0.15 masked the vegetation prior to VCA processing and simplex refinement in the 380 - 1100 nm region. The gravel roadways on the east (right) side were not vegetated and clearly stand out: here the roadways appear green against the white background. Of the 8 VCA vertexes considered, the sixth, second, and third were arbitrarily chosen as RGB for display purposes. There is a clear difference between the VCA (left) and its refinement.

[Picture]

Figure 14: Just what that VCA / SimFit difference signifies is a matter of misinterpretation.

Here we show the spectra at two vertices for increasingly tight values of the refinement parameter

α

, as it varies from no refinement at VCA, to a minimum of 1e-05. The VCA spectra are boldly shown in black. These correspond to real reflectance signatures at real pixel locations, and in important respects represent the “purest” spectra in an image that might not have any actual pure end members. Although less than half the image might be representable as combinations of such pixels, they are real and do represent real signatures. The left and right pixel signatures are located in Figure (12) within the yellow triangles at (361878E, 4320002N) and (361929E, 4319967N) respectively. One can locate them on a real map and send real field tech out to collect real samples.

In contrast, while the refined signatures retain and often exaggerate spectral features of the “real” VCA signatures, and in principle represent purer end members in terms of which larger portions of the image may be represented, they do not correspond to the reflectance spectra of any actual pixel in the image. Rather, they must be matched against a suitably small set of library spectra to see if each may be repesented as a combination of geologically believable end members. If they can, then their corresponding abundance matrices may be used to characterize much larger portions of the image than are representable by VCA. That’s where we’re headed. But we aren’t there yet.

4 Picture Show

We’ve done nearly all our development work referencing the small oxide-rich Ruby Mountain area with AVIRIS data illustrated above. This section includes a few other datasets from the Grizzly Caldera – including some from the SpectIR HST sensor – and the public AVIRIS image from the Cuprite district in Nevada. AVIRIS and HST images both have pixel size 2.5m x 2.5m. This is not analysis, which must await spectral matching capability. We merely illustrate SimFit robustness, and give some flavor for execution speed.

To this end all images in this section were processed using 16 PCA channels and 16 Virtual Dimensions, resulting in 16x16 Mixture Matrices. There were 122 AVIRIS bands in the wavelength intervals of interst. The workstation has a 16-core 3.4 GHz AMD-5950X processor, 128 GB ECC ram, and Nvidia Quadro RTX 4000 (Turing) 8GB graphics. We build both 32-bit and 64-bit floating point (fp32 and fp64) executables. Only the 32-bit version uses the GPU, affording approximately 4x speedup over fp64 for the larger images.

[Picture]

Figure 15: AVIRIS/HST WestRed subimage comparison. AVIRIS images (top row) display first 3 VCA and SimFit bands as rgb, and PCA bands 1, 3, and 4. HST PCA images (bottom row) retain bands 1, 3, and 4, but instead shows VCA and SimFit bands 2, 3, and 4 as the HST data was collected earlier in the season than AVIRIS and its band 1 is dominated by highly reflective snow: note the cornice location of VCA index 1, lower-left of HST hyperspectral true-color image.


Aviris West Red 1.20M pixel image	Hst West Red 998K pixel image	Aviris East Red 498K pixel image	Hst East Red 886K pixel image	Aviris Cuprite Sc01 1.17M pixel subimage

Figure 16: These particular screenshots are recorded as animated MP4. The videos show true-color hyperspectral images followed by PCA, VCA, and SimFit. Most rapid visually apparent SimFit improvement occurs during the first 50 iterations, as the

α

parameter is being decremented. Left-click a video to start in new tab, or optionally select fullscreen mode. Initial segments after the true-color hyperspectral image display the first 3 PCA bands, while subsequent VCA and SimFit segments display bands 1, 3, and 4 – except the Cuprite video which displays bands 1, 2, and 3. Simple contrast enhancement is employed for the Hst and Cuprite hyperspectral images, and for all PCA images. PCA, VCA, and SimFit images are normalized by band. Each video’s SimFit refinement progresses in real-time with GPU assist.

Processing Statistics

Image	Pixels	Bands	Subspace Dimensions	SimFit Iter	Time


Aviris West Red	1.20M	122	16x16	140	26 sec.
Hst West Red	998K	117	16x16	82	14 sec.
Aviris East Red	498K	122	16x16	61	6 sec.
Hst East Red	886K	117	16x16	43	8 sec.
Aviris Cuprite Sc01	1.17M	224	16x16	112	21 sec.
Aviris Cuprite Sc01	4.03M	224	16x16	109	72 sec.
Aviris Cuprite Sc01	4.03M	224	16x16	97	254 sec.

The full 4.03M pixel 9x16 Cuprite Sc01 image is shown separately; its NWHFC virtual dimensionality was estimated to be 9. The final 16x16 results were double precision running on the CPU without GPU assist.

5 Initial Synthetic Image Tests

Brief descriptions of Icarus Image automated tests are listed in Appendix A. The most recent (as of 27 June 2024) tests the final SimFit Mix matrix against the original Synthetic Image’s Mix matrix 6 in the presence of $\pm 50 %$ added gaussian noise, or 100% positive gaussian noise, with variance $σ = 10$ . Nearly all components of final SimFit Mix matrix agree with those of the initial clean Synthetic Mix matrix – the clean mix matrix before noise was added – within 5% of maximum value.

This is a satisfying result. These initial synthetic images are completely random: they have no spatial structure and no spatial noise filtering was done.

The final SimFit Abundance matrix contains negative components corresponding to noisy pixels that lie outside the final fitted simplex. The 12% of Abundance columns that contain such negative components are set to zero. Although there are extremes near unity, nearly all the remaining non-negative SimFitAbundance coefficients lie within 1% of the original (noisy) Synthetic Abundance, and are representable by positive combinations of SimFit Mix matrix columns. It remains to decompose the final SimFit mixture matrix columns into their basis set components.

To this eventual end, as an internal consistency check this testcase also shows how SimFitMix = SynMix*X may be solved for decomposition matrix X via SVD. The decomposition is diagonal-dominant and accurate to within a few tenths of a percent in this test case where the original Synthetic Mix matrix and its dimensions are known. Beyond self-consistency and known original Mix Matrix, future tests will replace SVD with a constrained least-squares solution to satisfy real-world requirements for non-negative mixing coefficients. Analysis of the resulting residual matrices should be interesting.

These encouraging results suggest our next test suite:

Replace SVD placeholder with Constrained Least Squares.
Allow SimFit virtual_dimensions larger or smaller than the number of Synthetic endmembers.
Relax current testcase requirement that SimFit subspace_dimension = virtual_dimension. (This is a testcase limitation; IcarusImage itself allows subspace_dimension $\geq$ virtual_dimension.)
In analysis of converged SimFit Mix Matrix, allow test endmember basis sets not necessarily the same as the original Synthetic:
- Test basis set smaller than the Synthetic basis, cannot span the actual spectral space.
- Test basis larger than the Synthetic basis, assumes endmembers not in evidence.
- Test basis of right size, but with one or more endmembers different from those in the synthetic.
Such Constrained Least Squares tests should inform future endmember unmixing using Cross-Correlation matching, Constrained Energy Minimization, or Spectral Angle Mapping.
When a test SimFit Mix matrix with endmembers not in the Synthetic Image is projected into a noisy Synthetic Image’s PCA space, what to it’s PCA component coefficients look like regarding the Synthetic Image’s singular vectors and values?
How do these all behave as Signal to Noise Ratio drops beneath 1?
Add simple spatial structure to Synthetic Images, such as concentric rings, so one might visualize noise rejection as simplex optimization progresses.

Synthetic and phantom images incorporating real-world chemical surface or LIDAR reflectance, NMR, or X-Ray absorbtion endmember spectra, and sensor-specific noise models, might also be of interest.

6 Next Steps

The Simplex Fitting and optimization discussed here was an open problem when we first initiated this study. The N-Findr and Shrink-Wrap algorthms were introduced in 1999, Vertex Component Analyis in 2005. Upon these SimFit is an improvement. But we are not terribly unique, and SimFit or similar algorithms may well have been implemented by others in the interum. A thorough literature search, and requests for comments, are much in order before the Icarus Image project proceeds much further.

But in the event Icarus Image does offer worthwhile additions, the following might be pursued:

Explore implementing compatibility with data generated by NASA’s EMIT and PACE space-based sensors.
If there is interest, explore implementing compatibility with ENVI or other commercial products.
Compare VCA and SimFit with N-Finder, Shrink-Wrap, and more recent solutions when found.
Finish capability to down-sample hyperspectral (224 channel AVIRIS and HST) images to lower-dimension Multispectral (e.g. 16 channel World View 3, more for newer sensors) as alternative to PCA dimensionality reduction. Explore possible down-sampling to arbitrary Multispectral dimensions with band widths and spacing adjustable on a per-image basis. If such band adjustments can improve the resolution of particlar features, can one visualize how one might optimize their selection?
Spectroscopy: write USGS / JPL spectral database interfaces to Opticks’ Spectral spectral-matching plugin and Library Builder. Government and commercial hyperspectral analysts already have production-quality spectral databases. Reimplementing our own will require some effort, and might be unnecessary depending upon potential collaboration.
Explore possible applications to MRI, multi-energy CT, and multispectral LIDAR.

Improved dimensionality and noise[3] estimations should be considered throughout.

A Synthetic Image Tests

Testcases automated via the Qt Test module:

Pure Legendre Endmembers Tests with virtual_dim = 8 pure endmember image pixels that the generated VCA and Simfit Mixture and Abundance matrices are identical with their respective Synthetic image originals, within roundoff. These tests are trivial, but must be checked.
- virtual_dim: 8
- subspace_dim: 10
- Nchannels: 256
- NpurePixels: 8
- PercentNoise: 0
- Endmember_t: legendre
- DimensionalityReduction_t: correlation
This testcase is significant as it demonstrates expected Legendre polynomial orthogonality, that Vca algorithm behaves as expected with sufficient pure endmember image pixels, and that running SimFit following the Vca analysis does not change this result.
Impure Legendre Tests, with no pure endmember image pixels, that
1. Original image R can be regenerated from Reduced Image X via Correlation Component back projection.
2. Synthetic Mix matrix column inner products have expected Legendre polynomial orthogonality.
3. SimFit Mix matrix can be regenerated from Legendre polynomials and the component matrix C of inner products of the original SimFit Mix matrix with Legendre polynomials, which is possible since the original Synthetic Mix matrix is composed of sums of two Legendre polynomials.
4. The SimFit Mix component matrix C for the above test may be generated both directly from the inner products of SimFit Mix matrix columns with Legendre polynomials, AND by SVD solution of SimFitMix = PolyVal * C, where PolyVal is a 256x8 matrix for Legendre polynomials sampled on [-1, 1]
These tests are relatively trivial, but must be checked.
- virtual_dim: 8
- subspace_dim: 10
- Nchannels: 256
- NpurePixels: 0
- PercentNoise: 0
- Endmember_t: legendre
- DimensionalityReduction_t: correlation
This testcase is significant as it demonstrates expected final SimFit Mixture matrix unmixing properties against endmember polynomials.
Impure Unitvec Tests final SimFit Mix and Abundance matrices against their Synthetic Image originals. Demonstrates the final SimFitMix is is the same as the original SynMix when the original Synthetic Image Mixture matrix is composed of orthogonal unit vectors. In this case each SynMix column j = 0, 1,...7 is unit vector of length 8 of values δ(j)
- virtual_dim: 8
- subspace_dim: 8
- Nchannels: 8
- NpurePixels: 0
- PercentNoise: 0
- Endmember_t: unitvec
- DimensionalityReduction_t: none
Such unit vectors are orthogonal for any length, we choose Nchannels=8 for this test case.
This test is significant as it demonstrates the final SimFit Mixture, Abundance, and Image matrices are identical within round-off to the Synthetic image originals, for noise-free synthetic images generated from unit vector endmembers, with no pure endmember image pixels. Here Vca alone fails as there are insufficient pure endmember image pixels.
Impure Synthetic Mix Tests final SimFit Mix and Abundance matrices against their Synthetic Image originals. Demonstrates the SimFitMix can be transformed back to the original SynMix when one knows the latter, which we do for synthetic tests. In this case each SynMix column j = 0, 1,...7 is a sum of two Legendre polynomials L0(x) + Lj+1(x) where L0(x) provides an offset to make all samples positive, and x is sampled just 8 times on [-1,1]
- virtual_dim: 8
- subspace_dim: 8
- Nchannels: 8
- NpurePixels: 0
- PercentNoise: 0
- Endmember_t: legendre
- DimensionalityReduction_t: none
With so few channels our inner product integral returns true Legendre orthogonality only between the linear polynomials $L_{0}$ and $L_{1}$ . But we can still use Eigen’s Svd.solve() to accurately compute SimFitMix unmixing against the original Synthetic image Mix matrix.
This test is significant as it demonstrates the consistency of final SimFitMix unmixing for noise-free synthetic images with no pure endmember image pixels, a common case where Vca alone fails.
Impure Synthetic Mix with Noise Tests final SimFit Mix and Abundance matrices against their Synthetic Image originals that contain additive white gaussian noise. Demonstrates the SimFitMix can be transformed back to the original noisy synthetic SynMix when the problem is not underspecified, that is, when the number of channels is equal to the size of the endmember basis set. In this case each SynMix column j = 0, 1,...7 is a sum of Legendre polynomials L0(x) + Lj+1(x) where x is sampled just 8 times on [-1,1]
- Nendmembers: 8
- Nchannels: 8
- virtual_dim: 8
- subspace_dim: 8
- NpurePixels: 0
- PercentNoise: 5
- Noise Sigma: 10
- Endmember_t: legendre
- DimensionalityReduction_t: none
With so few channels our inner product integral returns true Legendre orthogonality only between. the linear polynomials $L_{0}$ and $L_{1}$ . But with Nendmembers = Nchannels we can use Eigen’s Svd.solve() to accurately compute SimFitMix unmixing against the original Synthetic image Mix matrix. In this case the resulting SimFit Abundance matrix is the same as the original noisy Synthetic Abundance, and we can set to zero all Abundance columns which contain one or more negative coefficients as these represent noisy pixels that lie outside the fitted simplex.
This test is significant as it demonstrates the consistency of this SimFitMix unmixing for noisy synthetic images with no pure endmember image pixels, a common case where Vca alone fails, and rejection of those noisy pixels that cannot be represented as linear combinations of the origian synthetic basis set. Many remaining pixels will still have noise not present in the original Synthetic Image Clean Abundance matrix, but these remaining pixels lie inside the fitted simplex and are representable by the original basis.
Impure Synthetic Mix with Noise and PCA/Correlation dimensionality reduction Tests final SimFit Mix and Abundance matrices against their Synthetic Image originals that contain additive white gaussian noise (awgn). Demonstrates the SimFitMix can be transformed back to the original noisy synthetic SynMix when the number of channels is large, 64 in this example, using pca via correlation to reduce the dimensionality to the known dimensionality of the synthetic image: 8 in this example. In this case each SynMix column j = 0, 1,...7 is a sum of Legendre polynomials L0(x) + Lj+1(x), where x is sampled 64 times on [-1,1]
- Npixels: 2097152 (2048x1024)
- Nendmembers: 8
- Nchannels: 64
- virtual_dim: 8
- subspace_dim: 8
- NpurePixels: 0
- PercentNoise: 100
- Noise Sigma: 10
- Final alpha: 2.94e-3
- Endmember_t: legendre
- DimensionalityReduction_t: correlation
With 64 channels our inner product integral returns expected Legendre orthogonality for all endmembers within 0.05%.
In this testcase we tuned alpha_final to roughly zero the final SimFitMix.minCoeff(): min,max = (-6e-6, 2.000e+00). Larger alpha results in a larger more positive minimum, while smaller values yield a negative minimum. The root search to zero minCoeff() is applied algorithmically.
In presence of 100% added gaussian noise nearly all components of the final SimFitMix agree with those of the initial clean Synthetic Mix matrix within 5% – the clean Synthetic mix *before* noise was added. Smaller alpha values include more pixels within the simplex, hence fewer with negative mixing coefficients.
Although the extremes are near unity, nearly all non-negative SimFitAbundance coefficients lie within 1% of the original (noisy) Synthetic Abundance.
As a consistency check, with pca and virtual_dim small, we use Eigen’s Svd.solve() to accurately decompose SimFitMix component matrix X with respect to the original Synthetic Mix in the reduced pca subspace. The component matrix X is then applied to the SimFit Abundance Matrix. In this case the resulting SimFit Abundance matrix product is the same as the original noisy Synthetic Abundance. We can again set to zero all Abundance columns which contain one or more negative coefficients, as these represent noisy pixels that lie outside the fitted simplex. In this test case 254643 negative coefficient noise pixels were removed from SimFitAbundance, 12% of the 2097152 pixels total. In practice coefficients within a small negative tolerance e.g. -1e-5 might be allowed.
This test is significant as it demonstrates the internal consistency of this SimFitMix unmixing for noisy synthetic images with no pure endmember image pixels, a common case where Vca alone fails, and rejection of those noisy pixels that cannot be represented as linear combinations of the original synthetic basis set. Many remaining pixels will still have noise not present in the original Synthetic Image Clean Abundance matrix, but these remaining pixels lie inside the fitted simplex and are representable by the original basis.
This test also demonstrates that for this sort of Synthetic, the number of channels in the Synthetic Image may greatly exceed VCA’s subspace_dimension, which nonetheless must still equal its virtual_dimension, for this test.
In the special case where we wish to recover the original Image by back-projecting the PCA Reduced image, virtual_dim must equal Nchannels so that the full set of covariance eigenvectors is retained. However, such large virtual and subspace dimensions might introduce errors into the SimFit Mix matrix, and are unnecessary for the SimFit analysis. Here we used virtual_dim=subspace_dim=8, and Nchannels=64.

(End)

References

[1] Daniel R. Fuhrmann. Simplex shrink-wrap algorithm. SPIE Proceedings, Automatic Target Recognition IX, 3718, 24 August 1999.

[2] José M. P. Nascimento and José M. Bioucas Dias. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(4), April 2005.

[3] M. Tejaswi, K. Suguna, P. Chandanakala, and B. Lakshmi Devi. Additive white gaussian noise estimation in svd domain for single images. International Journal of Advanced Research in Computer and Communication Engineering, 7(5):150 – 156, May 2018.

[4] Michael E. Winter. N-findr: an algorithm for fast autonomous spectral end-member determination in hyperspectral data. SPIE Proceedings, Imaging Spectrometry V, 3753, 27 October 1999.

¹WGS84 / UTM Zone 13N at [362000E, 4320000N] meters.