Moritz Hesse asked . 2022-04-11
Does predict function work in parallel when predicting k-nearest neighbour?
Hi ,
I have a k-nearest neighbour classifier which I have trained with fitcknn. I am wondering, when predicting labels on the model using predicit does it work in parallel?
I have tested using predict in a for loop and parfor loop. The simple for loop performs a bit faster which makes me think there is some optimisation and built in parallelisation that the predict function is taking advantage of. However, the documentation makes no reference to this, and I thought MATLAB always runs in a single thread unless specifically using a parallel pool? In both cases, I am supplying the predict function with a vector of 1000 rows of test data at a time. Although my test dataset is a million rows, I am doing this so I can get see progress while the program is making predictions.
So basically:
- Does the predict function use parallisation on k-nn models?
- Any other tips to increase test time when using a large k-nn model?
AI, Data Science, and Statistics , Deep Learning Toolbox , Deep Learning in Parallel and in the
Neeta Dsouza answered . 2024-12-21 18:05:16
Perhaps you are mistaken. Most high level tools in MATLAB do not directly, intentionally use parallel processing, splitting the problem up. It is the lower level computations that do so, where you see the gains. And you can check when that is happening. It is quite easy to create a situation where MATLAB will use all the CPU power you have available. For example, just form a matrix multiply between two very large matrices. MATLAB passes this operation to the BLAS, lower level routines that can intelligently use multiple cores to do the work more efficiently. So if you watch a CPU monitor when that happens, suddenly your computer will get very busy. But it was not really MATLAB that did the parallel split, it was done more deeply under the hood.
Another case where I frequently see this happen is in a large computation where I might want to do a powermod operation, where I choose to compute mod(b^n,p) for huge values of n, and for many millions of numbers in the vector p. (Did you know there are roughly 51 million primes less than 1e9? I do know that.) My system fan kicks on immediately when I do these computations, with all CPUs running flat out. But again, it is not a high level MATLAB code that will decide when to parallelize code, but low level routines that see a process that can be split efficiently when that makes sense.
These computations are only split when the problem becomes sufficiently large, of course. For small problems the overhead of the parallelization becomes more effort than the gain would be worth. Add 3 or 4 numbers together, and one core is useful, and done in the blink of a computational eye. Add a billion numbers together, and there is real gain to be found in splitting the problem up.
There are also huge problems where no simple parallelization seems to be available. For example, perform a VERY large symbolic computation. Much of the time this is not so easily broken up. So I can give examples where my computer might be seen to be running flat out, but in only one thread for as long as hours at a time, so only one core is being used for the entire operation. And of course, any code where there would be lots of branches will screw up parallelized processes, which absolutely thrive on feeding the long lists of numbers through adds or multiplies.
Not satisfied with the answer ?? ASK NOW