nawab_faruk asked . 2022-04-21
Limit divergence of polynomial fitting
I am approximating a set of data with a polynomial to later find the cumulative density function. I am using a polynomial since I do not know the distribution of the set of data, that may even vary, and because it should be fully automatized. The problem is thatat the end of my set of data, once the polynomial reaches the last data, divergest to infinity. Since it should approximate a probability density function it shouldn't happen, because like so it influence all the results. How can I fix that?
p=polyfit(X,N,21);
x=linspace(0,4000,1000);
y=polyval(p,x);
matlab , AI, Data Science, and Statistics , Statistics and Machine Learning Toolbox
Prashant Kumar answered . 2024-12-21 17:56:41
The simple answer for this is to ask a question. WHY IN THE NAME OF GOD AND LITTLE GREEN APPLES ARE YOU USING A POLYNOMIAL????? Ok, maybe I'm being too hard on you. Polyfit is there for you to use. Nothing in the documentation tells you it does silly things, or that this is the fundamental nature of polynomials. Somewhere teachers should teach you, right after they tell you that you CAN use a polynomial, that you probably, usually, should not do so. And then explain why not.
Seriously, DON'T USE A POLYNOMIAL. And NEVER use a high order polynomial. They go to hell faster than a low order one. A polynomial order of 21? WAY TOO HIGH. But then you should not use a polynomial at all here.
Polynomials do exactly what you saw. They go to infinity. Think about it. what other behavior can you possibly expect? Can a polynomial approximate a function that does non-trivial (non-constant) stuff within some region, then goes constant as x goes to plus or minus infinity? Think about it. A polynomial can never be a model for a process that behaves like a CDF. You have this really high order polynomial, with powers as high as 21. X is as large as 4000?
What is 4000^21? How about other numbers near there?
4000^21
ans =
4.398e+75
3950^21
ans =
3.3771e+75
3500^21
ans =
2.6634e+74
The highest power of X with a non-zero coefficient will dominate the behavior of a polynomial as X goes to +/-inf. So your high order polynomial produces complete and utter garbage out there. You know this is the behavior that you see. And simple mathematics tells you that you cannot expect any other behavior from a polyomial model. That is exactly the behavior you expect from a polynomial, the innate, fundamental behavior that exists in any polynomial, and something you cannot change.
The obvious answer to my original question is you have no idea what else you might use. Sorry, but that is clearly true.
And, the answer to that is to use a spline. There are several choices you might use there. The simplest is pchip, which will interpolate your data. (Of course, you could have just tried interp1, which can use pchip, or various options as you would direct it to use.)
pp = pchip(X,N);
x = linspace(0,4000,1000);
y = ppval(pp,x);
DON'T use the function spline, as it will show oscillatory behavior in the tails. If you use interp1 instead, then the 'linear', 'pchip', or 'cubic' methods should all be ok. While 'linear' will just do piecewise linear interpolation, that is probably entirely adequate for this purpose.
IF you feel that you needed to smooth the curve instead of interpolate it, then download my SLM toolbox from the file exchange. It will require the optimization toolbox. But pchip should be entirely adequate here.
Not satisfied with the answer ?? ASK NOW