This will reuse existing buffers by resetting only the minimum
required in the filter context (also reused). Work in progress,
as the actual reuse is disabled for now.
This will be most useful in a special case, where a filter is
used in a window decoration, applied to a snapshot object.
Another optimization that might be wanted is passing a list
of update regions (from the proxy or snapshot).
The filters don't support the obscuring region yet, only some
of the high-level logic is implemented.
Summary:
There's problem in Tizen3.0.
1. Clip set mask_obj to obj for masking.
2. Unset mask_obj from obj, and del mask_obj.
3. obj has clip.mask still. So obj is trying to do mask_subrender() for freeed mask_obj.
So reset clip.mask to NULL, If there isn't clipper.
Now, there's no routine for reseting clip.mask when clipper object is freed. isn't it?
Actually I'm not sure that clip.mask=NULL should be there as this patch.
Test Plan: Tizen3.0 wearable
Reviewers: cedric, raster, wonsik, jpeg
Subscribers: scholb.kim, dkdk
Differential Revision: https://phab.enlightenment.org/D4721
Signed-off-by: Jean-Philippe Andre <jp.andre@samsung.com>
This function was moved out of inline (see d7c6fca6c0) but
unfortunately the early checks at its beginning are likely
to result in an early return. Inline this part so we get back a
better performance. Inlining the whole function does not improve
the performance, as GCC simply gives up with inlining.
Note: Between 1.18 and master the number of calls to clip_recalc
has simply blown out. It is thus crucial to find out where those
calls come from but also micro-optimize the function itself. This
patch does the latter only.
@optimize
This avoids calling efl_isa and locking the async mutex.
In callgrind analysis, this reduces the count of calls to
efl_isa from 1.96M to 1.02M.
@optimization
The previous patch (b184874fa5) was preventing
post-event callbacks from triggering any form of input event,
including side-effects due to mouse,in. In fact by tracking
which exact events we want to post-process we can support
proper recursion. This fixes crashes in Bryce.
I'm not changing the documentation as this is still a dubious
code design.
Fixes T3144
Fixes T5157
several calls, specifically evas_object_change_reset,
evas_object_cur_prev, and evas_object_clip_changes_clean that are
called directly or indirectly as part of evas render on at least every
active object if not more, were doing full eo obj lookups when their
calling functions already all had the eo protected data looked up.
tha's silly and just adds overhead we don't need. my test dropped
_eo_obj_pointer_get overhead in perf profiles from 4.48% to 2.65%. see:
4.48% libeo.so.1.18.99 [.] _eo_obj_pointer_get
4.23% libevas.so.1.18.99 [.] evas_render_updates_internal
2.61% libevas.so.1.18.99 [.] evas_render_updates_internal_loop
1.68% libeo.so.1.18.99 [.] efl_data_scope_get
1.57% libc-2.24.so [.] _int_malloc
1.42% libevas.so.1.18.99 [.] evas_object_smart_changed_get
1.09% libevas.so.1.18.99 [.] evas_object_clip_recalc.part.37
1.08% libpthread-2.24.so [.] pthread_getspecific
1.05% libevas.so.1.18.99 [.] efl_canvas_object_class_get
1.01% libevas.so.1.18.99 [.] evas_object_cur_prev
0.99% libeo.so.1.18.99 [.] _efl_object_event_callback_legacy_call
0.87% libevas.so.1.18.99 [.] _evas_render_phase1_object_ctx_render_cache_append
0.82% libpthread-2.24.so [.] pthread_mutex_lock
0.81% libevas.so.1.18.99 [.] _evas_render_phase1_object_process
0.79% libc-2.24.so [.] _int_free
vs now the improved:
4.82% libevas.so.1.18.99 [.] evas_render_updates_internal
3.44% libevas.so.1.18.99 [.] evas_render_updates_internal_loop
2.65% libeo.so.1.18.99 [.] _eo_obj_pointer_get
2.22% libc-2.24.so [.] _int_malloc
1.46% libevas.so.1.18.99 [.] evas_object_smart_changed_get
1.04% libeo.so.1.18.99 [.] _efl_object_event_callback_legacy_call
1.03% libevas.so.1.18.99 [.] _evas_render_phase1_object_ctx_render_cache_append
0.97% libeina.so.1.18.99 [.] eina_chained_mempool_malloc
0.93% libevas.so.1.18.99 [.] evas_object_clip_recalc.part.37
0.92% libpthread-2.24.so [.] pthread_mutex_lock
0.91% libevas.so.1.18.99 [.] _evas_render_phase1_object_process
0.84% libc-2.24.so [.] _int_free
0.84% libevas.so.1.18.99 [.] evas_object_cur_prev
0.83% libeina.so.1.18.99 [.] eina_chained_mempool_free
0.80% libeo.so.1.18.99 [.] efl_data_scope_get
of course other things "increase their percentage" as oe overhead now
dropped, and things seem to move around a bit, but it does make sense
to do this with no downsides i can see as we already are accessing the
protected data ptr in the parent func.
This should save a bit of memory for all image & text
objects. This exploits the previous patch for the post-render
job queue added to evas, and simplifies this bit of code.
Object size hints are stored in a specially allocated struct
(from a mempool) and even a call to size_hint_set(default_values)
will allocate this struct. This patch avoids unnecessary allocations.
Originally I was trying to fix an infinite recalc loop but it
magically vanished...
Only the size of events_whitelist isn't enough, because in some
cases the user may be disabling the usage of a specific seat.
Considering the following scenario, the issue will easy to understand:
- an application with two entries (one to be used by seat 1 and other
by seat 2)
- the first seat is announced - it is enabled for entry 1 and
disabled for entry 2
- the second seat is announced
Before second seat is announced, the first seat would be able
to input the entry 1, because the events_whitelist of such
object will continue empty.
So a flag will be used to identify an object with active
filter.
Reviewed By: iscaro
Differential Revision: https://phab.enlightenment.org/D4498
It doesn't make sense to remove it when a seat is added
to the list. It should be removed only when this seat
is blocked.
But when the list receives its first item, then
it also should be checked if the focused seat
is the one just added, otherwise the previous one
must be removed.
This makes efl_loop_get() work on evas objects, returning the
main loop as expected. Also make the loop a property of the
Loop_User class (shouldn't it be called Efl.Loop.User instead?)
The hash implementation demonstrated that too much memory was being used
to store the Evas_Object_Pointer_Data. In order to reduce the memory usage
this patches now changes the Evas_Object_Pointer_Data storage to an Eina_Inlist and
now Massif profiles shows that the memory usage was drastically reduced.
In order to properly block events from a given seat, Efl.Canvas.Object must
override the efl_event_callback_[legacy]_call to check if the event
is allowed or not.
With this new API one can block or unblock keyboard, mouse and
focus events that was originated from a seat. This is useful to
create applications that wants to establish some kind of seat segregation.
This patch introduces the possibility to set the pointer mode and
query other properties like current position per pointer device.
The old API will still works, however it will only act on the default seat.
evas_object_clip_recalc is big. it's fat. it shouldnt be inline. so
make it a real function. being inline just hurts performance by making
our code bigger, hurting l1 instruction prefetch and cache
performance. this function isn't small. it's huge and should not be
inline basically because of that reason.
also throw in some likely/unlikely hints etc.
@optimize
After a few patches trying to fix clipping of frame or
non-frame objects the icon finally ended up invisible. Even
if the elm_icon was marked as is_frame, its internal evas
object image would not have the flag set, thus it would be
clipped out.
Solution: Propagate the is_frame flag to all smart children,
not only when setting it but also when adding new members.
A hack with the API indicates that the frame edje is a very
special object that does not propagate the flag.
See also:
7ce79be1a10f6c33eff19c9c8809a7ac5ca9281c
Using the multi-seat support, Evas is able to handle multiple focused objects.
This implementation allows one focused object per seat.
This patch introduces new APIs and events to handle this new scenario,
while keeping compatible with the old focus APIs.
See T4749, 11b7cf6b72 introduced an issue and
e1e28ce320 fixed it but caused a massive
performance impact.
This should fix that. Thanks @zmike for the first patch.
Fixes T4840
so i have been doing some profiling on my rpi3 ... and it seems
memcmp() is like the number one top used function - especially running
e in wayland compositor mode. it uses accoring to perf top about 9-15%
of samples (samples are not adding up to 100%). no - i cant seem to
get a call graph because all that happens is the whole kernel locks up
solid if i try, so i can only get the leaf node call stats. what
function was currently active at the sample time. memcmp is the
biggest by far. 2-3 times anything else.
13.47% libarmmem.so [.] memcmp
6.43% libevas.so.1.18.99 [.] _evas_render_phase1_object_pro
4.74% libevas.so.1.18.99 [.] evas_render_updates_internal.c
2.84% libeo.so.1.18.99 [.] _eo_obj_pointer_get
2.49% libevas.so.1.18.99 [.] evas_render_updates_internal_l
2.03% libpthread-2.24.so [.] pthread_getspecific
1.61% libeo.so.1.18.99 [.] efl_data_scope_get
1.60% libevas.so.1.18.99 [.] _evas_event_object_list_raw_in
1.54% libevas.so.1.18.99 [.] evas_object_smart_changed_get
1.32% libgcc_s.so.1 [.] __udivsi3
1.21% libevas.so.1.18.99 [.] evas_object_is_active
1.14% libc-2.24.so [.] malloc
0.96% libevas.so.1.18.99 [.] evas_render_mapped
0.85% libeo.so.1.18.99 [.] efl_isa
yeah. it's perf. it's sampling so not 100% accurate, but close to
"good enough" for the bigger stuff. so interestingly memcmp() is
actually in a special library/module (libarmmem.so) and is a REAL
function call. so doing memcmp's for small bits of memory ESPECIALLY
when we know their size in advance is not great. i am not sure our own
use of memcmp() is the actual culprit because even with this patch
memcmp still is right up there. we use it for stringshare which is
harder to remove as stringshare has variable sized memory blobs to
compare.
but the point remains - memcmp() is an ACTUAL function call. even on
x86 (i checked the assembly). and replacing it with a static inline
custom comparer is better. in fact i did that and benchmarked it as a
sample case for eina_tiler which has 4 ints (16 bytes) to compare
every time. i also compiled to assembly on x86 to inspect and make sure
things made sense.
the text color compare was just comparing 4 bytes as a color (an int
worth) which was silly to use memcmp on as it could just cast to an
int and do a == b. the map was a little more evil as it was 2 ptrs
plus 2 bitfields, but the way bitfields work means i can assume the
last byte is both bitfields combined. i can be a little more evil for
the rect tests as 4 ints compared is the same as comparing 2 long
longs (64bit types). yes. don't get pedantic. all platforms efl works
on work this way and this is a base assumption in efl and it's true
everywhere worth talking about.
yes - i tried __int128 too. it was not faster on x86 anyway and can't
compile on armv7. in my speed tests on x86-64, comparing 2 rects by
casting to a struct of 2 long long's and comparing just those is 70%
faster than comapring 4 ints. and the 2 long longs is 360% faster than
a memcmp. on arm (my rpi3) the long long is 12% faster than the 4 ints,
and it is 226% faster than a memcmp().
it'd be best if we didnt even have to compare at all, but with these
algorithms we do, so doing it faster is better.
we probably should nuke all the memcmp's we have that are not of large
bits of memory or variable sized bits of memory.
i set breakpoints for memcmp and found at least a chunk in efl. but
also it seems the vc4 driver was also doing it too. i have no idea how
much memory it was doing this to and it may ultimately be the biggest
culprit here, BUT we may as well reduce our overhead since i've found
this anyway. less "false positives" when hunting problems.
why am i doing this? i'm setting framerate hiccups. eg like we drop 3,
5 or 10 frames, then drop another bunch, then go back to smooth, then
this hiccup again. finding out WHAT is causing that hiccup is hard. i
can only SEE the hiccups on my rpi3 - not on x86. i am not so sure
it's cpufreq bouncing about as i've locked cpu to 600mhz and it still
happens. it's something else. maybe something we are polling? maybe
it's something in our drm/kms backend? maybe its in the vc4 drivers or
kernel parts? i have no idea. trying to hunt this is hard, but this is
important as this is something that possibly is affecting everyone but
other hw is fast enough to hide it...
in the meantime find and optimize what i find along the way.
@optimize
This is an override of efl_gfx_size_set. Same as before, the
order of operations matter so it is possible that a corner
case will break. In particular, legacy code was:
- intercept
- smart resize (do stuff), super, super, super
- evas object resize
The new code is more like:
- intercept
- super, super, super, evas object resize
- do stuff
But unfortunately this broke elm_widget (read: all widgets) as
the internal resize was done before the object resize. So,
inside the resize event cb, the resize_obj size would not match
the smart object size. >_<
This is an override of efl_gfx_position_set.
As for the other patches, I hope I didn't break anything.
A problem likely to happen is that the super call was inserted
too early or too late in the call flow. For instance:
_myclass_position_set(obj, x, y) {
position_set(super(obj), x, y);
position_get(obj, &prevx, &prevy);
do_something_with_delta_xy();
}
The above code flow is obvisouly wrong, but may have crept in this
patch (such a bug sneaked in inside smart object, breaking
everything at first).
These should be just overrides of Efl.Gfx.visible.set. Many
widgets were handling smart show() and hide() manually, which
means this patch is quite large.
Hopefully this doesn't break anything, obviously. But here are
some widgets known to be problematic, as the old code flow was
really strange (sometimes not calling the efl_super function):
- window
- notify
Widgets should simply override efl_gfx_color_set and call
super all the way up to evas object.
Legacy compatibility with call interceptors and early call
abortion (eg. delete_me or obj->layer == NULL) are implemented
with an internal call. See the previous commit introducing the
API.
This is a poor man's solution to get rid of group functions such
as clip_set, clip_unset, color_set, etc... See the following
commits.
This API needs to be EAPI for elementary but shouldn't be used
outside EFL. This is required purely for legacy compatibility.
Here's the call flow, inside show(obj):
1. if (intercept_show(obj)) return;
2. show(super(obj));
3. do other stuff