Neeta Dsouza answered . 2024-12-21 01:47:37
Counting the number of peaks, and using that to infer written language seems a very broad brush to apply to the problem. As well, it is a difficult thing to measure, since on some samples there may easily be a second peak. Looking at the second figure, I see it might easily lead you astray.
Anyway, there are simpler methods that might not be so easily led astray, that are trivial to compute. For example, compute a normalized area under that curve, when viewed as x(y). Thus, viewing x as the independent variable, compute the area of the curve as trapz(x), then divide that result by max(x).
measure = trapz(x)/max(x);
The point is, figures 1 and 2 have relatively little area under that curve, relative to the maximum value that x attains. Whereas figures 3 and 4 will show a seriously different result from the above trivial computation.
I'm not sure how the above curves are defined, so you might gain the same information from a tool like polyarea, rather than trapz. And since I don't know if the points on the curve are equally spaced, it is hard to be sure how exactly to compute that result. But you should get the general idea.
The point I'm making is clear though regardless of what simple MATLAB code you use to compute it: counting the number of peaks is a broad brush that will be terribly difficult to evaluate. It will yield a number that is a small integer, thus 1, 2, or 3. While there will be signal in that measure, it will be difficult to use to predict the result that you are looking for.
The alternative is a measure as I've proposed, that will yield a measure as a real number, that at least in the plots that I see, seems to be far more strongly correlated with what you want to infer. The signal will be stronger. In the end, what you need is a signal that you can then attach a probability to, i.e., we are looking at a picture of text from language X with probability p.
Again, finding and counting peaks is a terribly difficult thing to do. It is subjective, since what constitutes a peak? How much does it need to protrude to become a peak? And the result is a small integer, so any signal is easily lost in the noise.
Look for measures like that which I've proposed that are simple to compute, that yield predictions that will have a high correlation with your result. Better yet might be to use more sophisticated methods of pattern recognition, but that could become a far more complicated problem to solve.
Finally, look for MULTIPLE measures. Any SINGLE measure will be more easily confused. Now, if you can find other uncorrelated measures, the problem becomes a multi-dimensional one. Methods of statistics, such as discriminant analysis, etc., might now be applied to the problem. Or neural nets, etc.
Not satisfied with the answer ?? ASK NOW