Prashant Kumar answered . 2024-12-21 15:49:35
Let me wade in here. From your question, you have a measured aggregate spectrum, and on the side, measured components that you will assume of which the aggregate is composed. Since they are measured, they are NOT Gaussian components, which is often only a poor approximation to the shape of those components. (Gaussians are symmetric and they have a very specific shape.) And since you have them measured, there seems no reason to approximate them with Gaussians anyway.
So, given a "function", F, sampled at a set of discrete set of wavelengths. Thus you have the measured spectrum at a set of n wavelengths. At those same wavelengths, you have 9 separate components, I'll call them S_i. Actually, F is a discretely sampled function of wavelength, lambda, as are the components.
You now pose the mixture model for F,
F = a_1*S_1 + a_2*S_2 + a_3*S_3 + ... + a_9*S_9
Thus at any wavelength, the measured spectrum is presumed to be some (unknown) linear combination of the measured component sub-spectra. You wish to estimate the component fractions perhaps as a vector
A = [a_1; a_2; ... ; a_9]
Logically, the a_i will be constrained to be non-negative, an issue I'll discuss at some length below. I've defined A as a column vector because that is how most code would return it in MATLAB.
The simple approach to estimation of the mixture coefficients in A is to use a basic linear regression. Here we would minimize the sum of squares of the residuals for the mixture model. The simple solution to that is:
Assuming columns vectors for F and the S_i that are all the same lengths, then define the n by 9 matrix S where the columns of S are the 9 component subspectra.
S = [S_1,S_2,S_3,S_4,S_5,S_6,S_7,S_8,S_9];
Then if F is also a column vector of length n,
A = S\F;
This is a simple linear regression (not unlike that which regress would return), and it will work acceptably SOME of the time, but it will fail terribly on occasion, because it employs no non-negativity constraints on the coefficients in A.
The point is, a negative component makes no physical sense. You cannot have a negative amount of some sub-spectra in the mixture, yet the simple linear regression will probably yield exactly that. It will happen because you have some noise in the measurement process, because your measured spectra were not perfectly measured, or because you might have some contribution from something you have not actually measured (often described as lack-of-fit), or for a few other problems I'm forgetting to mention. The point is, it WILL happen.
A negative component here might indicate a serious problem in your data, or it might be just trash. So it is always a good thing to look at the coefficients you would generate, to look at the resulting fit. Plot the residuals. Is there significant lack of fit?
Anyway, a more logical and better solution is to use a non-negative least squares solution. MATLAB offers such a solver in the form of lsqnonneg.
A = lsqnonneg(S,F);
A will now be a vector with non-negative components, that yield the best possible solution, subject to non-negativity constraints. In fact, sometimes some of the components of A may have some TINY negative numbers in them, on the order of eps, so roughly -1e-16 or so. That is floating point trash, and nothing to worry about here, but if it bothers you, just use
A = max(0,lsqnonneg(S,F));
instead. (This is where knowing something about numerical analysis helps, in knowing when you can safely discard something as trash, and when it is potentially important.) A nice thing about lsqnonneg is you have it in basic MATLAB, with no toolboxes required.
There are other ways you can do the estimation. One approach would be to minimize the sum of absolute differences of the model residuals, instead of a sum of squares of residuals. This can be achieved using a linear programming tool (linprog for example) with some slack variables as I recall. (I know I have a solver written for that problem somewhere laying around.) The difference between the lsqnonneg solution and the linprog solution will probably not be that important here, so I would just recommend lsqnonneg.
Finally, there is the question of estimates of the standard deviation of the parameters. One nice thing about a tool like regress is it will offer estimated standard deviations for the estimated mixture coefficients. The problem with those estimated uncertainties is they are based on an approximation that fails when the problem was a bounded one. So if some of your coefficients are zero or near zero, or worse, negative, those standard deviations are no longer really meaningful.
Not satisfied with the answer ?? ASK NOW