The new files (i386, sse3 and neon) are basically empty and fallback
to the C version. This is just to pave the way for full low-level
optimization... if someone has the time and skills to do it :)
Add both Alpha and RGBA template files.
If the buffer size is smaller than the blurring kernel, then
special precautions must be taken to properly read the source
pixels. Also, fix the corner cases near the left & right edges
(or top & bottom).
Use two optimizable functions for BOX blur: vertical and horizontal.
These functions will run as many times as requested (from 1 to 6 max).
The horizontal case is pretty straightforward as the source is already
contiguous (nice in terms of cache hits). The only catch is to swap
src and dst without ever writing to the input buffer.
In case of vertical blur, we apply the same method as above, after
rotating the column into a horizontal (contiguous) span, and rotating
it back afterwards.
Now, the same needs to be done for RGBA :)
Actually, there is a very nice trick with BOX blur.
Pass BOX blur 3 times and you can approximate a GAUSSIAN
blur with up to 3% accuracy. This is way more than enough
for just a simple graphical effect.
So, despite the crappy quality of BOX blur, we should
optimize it a lot so we can replace large GAUSSIAN blurs
with series of BOX blurs instead.
Source: Wikipedia's page on box blur :)
This commit also moves around some duplicated definitions.
Prepare optimization paths for blur operations, as they are VERY
costly. This simple change, when using gcc -O3 flag, boosts
horizontal blur performance by > 50%, because STEP is 1 (and
so, memory accesses, increments, etc... are all very simple)
The objective is to have support for NEON, MMX, SSE, too, with
runtime detection.
Remove true Gaussian kernel code, as it is not usable over 12px and
was disabled because it gives different visual results than the
fake Gaussian curve using sin().
The structure should not be changed, despite the union modification.
I am renaming for consistency with older branches that had a mask
field in RGBA_Image. Also, the mask.data or data8 is really just
a way to avoid casting between DATA8 and DATA32 (and it shows
clearly what kind of data you are dealing with).
If the buffer is smaller than the blur kernel, then artifacts appear
and CRASHES happen because we read/write out of the buffer bounds.
Output buffer must be larger than the kernel diameter.
Input buffer's size is used to reduce the blurring effect on the edges.
Currently supported:
- Box blur for Alpha and RGBA
- Gaussian blur for Alpha and RGBA
Motion blur is not implemented.
These effects are SLOW. Gaussian blur is based on a true gaussian
curve for small radii, or a sine-based approximation.
The true gaussian might need to be removed for consistency, since
it gives slightly different results from the sine one (less blur).
It is not possible to mix Alpha and RGBA surfaces in this filter.