While it really shouldn't happen, let's just add a quick if()
and make Coverity shut up.
Fixes:
- CID 1191912
- CID 1191911
- CID 1191910
- CID 1191909
The new files (i386, sse3 and neon) are basically empty and fallback
to the C version. This is just to pave the way for full low-level
optimization... if someone has the time and skills to do it :)
Add both Alpha and RGBA template files.
If the buffer size is smaller than the blurring kernel, then
special precautions must be taken to properly read the source
pixels. Also, fix the corner cases near the left & right edges
(or top & bottom).
Use two optimizable functions for BOX blur: vertical and horizontal.
These functions will run as many times as requested (from 1 to 6 max).
The horizontal case is pretty straightforward as the source is already
contiguous (nice in terms of cache hits). The only catch is to swap
src and dst without ever writing to the input buffer.
In case of vertical blur, we apply the same method as above, after
rotating the column into a horizontal (contiguous) span, and rotating
it back afterwards.
Now, the same needs to be done for RGBA :)
Actually, there is a very nice trick with BOX blur.
Pass BOX blur 3 times and you can approximate a GAUSSIAN
blur with up to 3% accuracy. This is way more than enough
for just a simple graphical effect.
So, despite the crappy quality of BOX blur, we should
optimize it a lot so we can replace large GAUSSIAN blurs
with series of BOX blurs instead.
Source: Wikipedia's page on box blur :)
This commit also moves around some duplicated definitions.
Prepare optimization paths for blur operations, as they are VERY
costly. This simple change, when using gcc -O3 flag, boosts
horizontal blur performance by > 50%, because STEP is 1 (and
so, memory accesses, increments, etc... are all very simple)
The objective is to have support for NEON, MMX, SSE, too, with
runtime detection.