Prashant Kumar answered . 2024-12-20 19:11:09
The FFT method may be faster if the two arrays to be convolved are of similar size. You can pad the smaller to be same size as the larger and then use the fft method. The FFT is very well conditioned and will give the same answer as conv2 to within a small tolerance.
If one array (the mask or kernel) is much smaller than the other, then you can sometimes get a big speedup over conv2 (and filter2 etc.) using my fex contribution convolve2.
You can only establish what is fastest for your problems by doing some experiments.
EDIT: Out of interest, I did some experiments, and the results are as follows.
Notes:
- Timings are in ms, and each is the minimum of the times for 10 convolutions. Note that cputime on my machine seems to have a resolution no better than 16 ms.
- I used MATLAB release 2010b, on a Dell E4300 laptop.
- For the CONV2 results, I used the built-in conv2 function, with a wrapper that implemented periodic boundary conditions by padding the initial matrix. The time to do the padding is included in the measurements.
- For the CONVOLVE2 results, I used convolve2 from the file exchange. This calls conv2, after possibly decomposing the mask into a sum of separable masks.
- For the CONV_FFT2 results, I used conv_fft2 from the file exchange. This implements the FFT method straightforwardly, without any additional padding to make the arrays FFT-friendly sizes. There are other FFT-based convolutions on the FEX which may have advantages over this.
- The array arguments were generated using rand or fspecial (from the Image Processing Toolbox).
- X is the first argument to the convolution routines (this would be the image in an image processing application). M is the second argument (this would be the mask).
- In every case, the maximum absolute difference between the results from any pair of methods was less than 1e-11.
FFT-friendly comparison
X = rand(1024)
M = rand(mask_size)
boundary conditions: periodic ('wrap' in convolve2 and conv_fft2)
Mask sizes: 1x1 2x2 4x4 8x8 16x16 32x32 64x64
conv2: 16 16 16 47 140 546 2153
convolve2: 16 16 16 47 156 530 2153
conv_fft2: 312 296 296 296 312 296 265
Periodic boundary conditions and an array size that is a power of 2 are good for the FFT method, which wins for masks NxN and upwards, where N is between 16 and 32. conv2 and convolve2 are essentially identical here, because convolve2 can't do anything useful with a random mask. The FFT method takes a time that is independent of mask size.
A decomposable mask
As above, except
M = fspecial('log', mask_size, min(mask_size)/10)
Mask sizes: 1x1 2x2 4x4 8x8 16x16 32x32 64x64
conv2: 0 0 16 47 140 546 2168
convolve2: 16 16 16 125 125 218 374
conv_fft2: 281 296 312 281 296 312 296
conv2 and conv_fft2 are unaffected by this change, but convolve2 can exploit symmetries in the mask, and so there is now a range of mask sizes where it does best. However, the FFT method is still better once the mask size reaches 64x64.
FFT-hostile comparison
As immediately above, except
X = rand(1031)
Mask sizes: 1x1 2x2 4x4 8x8 16x16 32x32 64x64
conv2: 0 0 16 47 140 562 2200
convolve2: 0 16 16 125 140 218 390
conv_fft2: 858 858 811 827 764 796 780
A prime array size is the worst case for the FFT, and conv_fft2 does nothing to try to help, leaving size optimisation to the caller. Even so, the FFT beats conv2 by the time the mask size is 64x64. convolve2 is the fastest at this point because of the mask structure, but for very large masks conv_fft2 would beat even this. (Note that for separable masks convolve2 and imfilter would do better still.)
As immediately above, except we revert to an apparently FFT-friendly size, and use the 'same' shape option.
X = rand(1024)
boundary conditions: zero-padding ('same' option in all functions)
Mask sizes: 1x1 2x2 4x4 8x8 16x16 32x32 64x64
conv2: 0 0 16 62 281 920 3011
convolve2: 0 16 0 109 125 250 390
conv_fft2: 281 608 421 499 515 296 281
To implement conv2's 'same' option, the arrays have to be extended with zeros for the FFT. This means that the size of the array that is actually transformed is changed, affecting the time of the FFT. For consistent and predictable FFT timing, the user needs to control the array size and use the 'wrap' or 'valid' shape options, which don't require padding.
"Sparse" masks: conv2-friendly
As above, except that the mask is
M = (rand(mask_size) > 0.9) .* rand(mask_size)
Mask sizes: 1x1 2x2 4x4 8x8 16x16 32x32 64x64
conv2: 16 0 0 16 47 156 499
convolve2: 0 16 16 0 31 156 484
conv_fft2: 296 608 437 484 515 250 296
Here we have a mask that is sparse in having many zero elements (not a MATLAB sparse array though). conv2 is evidently optimised to handle this efficiently (and so convolve2 gets this advantage automatically). conv2 beats conv_fft2 up to much larger mask sizes as a result.
Conclusions
Many factors affect the speed of different convolution methods, including array size, mask size, mask sparsity and mask decomposability. Some of these you can control, some you can't. For very large masks the FFT method will be fastest, and for very small masks conv2 will be fastest. For many useful masks at intermediate sizes convolve2 will be fastest. If speed is critical, you probably need to do your own timing tests.
Not satisfied with the answer ?? ASK NOW