if for some reason we fail to validate a class, then we should skip
that extension. This may result in an empty vtable, then check for
that and avoid a crash.
This is very unlike to happen in practice, but I've forced some
validation errors and could get to that.
Instead of 2 sets of macro, one for HAVE_EO_ID and another without,
use a single set of macros and have the implementation of
_eo_class_pointer_get() and _eo_obj_pointer_get() to do the actual
These functions now take the source information so the logs reflect
that and not always the same function.
_eo_pointer_error() was kinda a bitch to debug as it provided a nice
breakpoint location, but did not provide a good output since the file,
line and function were always the same.
Change that to be a thin wrapper on top of eina_log_vprint(), then we
keep the breakpoint location yet provide useful information.
In that sense, change other error messages so they carry as much
information as possible.
Normally when debugging Eo with gdb you can just use any of the internal
eo functions to resolve the id to its internal pointer. However, when
loading a coredump you can't execute any code, not even the id resolve
code.
This change adds a gdb function that resolves the id to its pointer form
without executing any code in the process space. This plugin is
essentially the id resolve code written in python as a gdb function.
Usage:
Print the pointer:
(gdb) print $eo_resolve(obj)
$1 = (_Eo_Object *) 0x5555559bbe70
Use it directly (e.g. to print the class name):
(gdb) $eo_resolve(obj)->klass->desc.name
This plugin requires that the coredump would be loaded with the exact
same libeo.so binary (or at least one that hasn't changed eo internals),
and that the debug symbols for libeo.so would be available for gdb to
use.
Note:
This feature is incomplete and only resolves IDs that are owned by the
main thread and in the main domain. This is not a big issue at the
moment, because almost all of our IDs are like that.
@feature
When resolving a function for an object the object would get reference,
and then in some failure cases won't be freed.
I suspect this is a regression following the reshuffling that was done
in that function recently.
Thanks to zmike for investigating and reporting this.
Fixes T4740
@fix
This happens with shared objects.
The situation seems to be:
1. object has composited object a of class A in thread 1
2. call something on object a from thread 2, deadlock
In fact, do anything from thread 2 on a shared object and you deadlock.
this moves a lot of error case handling into goto's so the code gets
out of the hot path and this should help expecially since variou
smacros do things like:
do { char buf[256]; sprintf(buf, fmt, ptr); _eo_pointer_error(buf); } while (0)
_Efl_Class *klass; \
do { \
klass = _eo_class_pointer_get(klass_id); \
if (!klass) { \
_EO_POINTER_ERR("Class (%p) is an invalid ref.", klass_id); \
return ret; \
} \
} while (0)
so putting quite a chunk of code inside a rare "if this errors"
handler that will cause l1 cache misses and this we don't want, thus
moving stuff in eo core out of hot paths to cut down on overhead. yes
it might not be pretty but it's kind of the right thing at such a core
level of efl. this also does the same to the eo base class as this is
also going to be relatively hot given it's the core of every other
object.
so there were a few issues. one we had a spinlokc on the eoid table
for shared objects AND then had a mutex for accessing those objects
(released on return from any eo function). BUT this missed some funcs
like eo_ref, eo_unref and so on in eo.c ... oops. so fixed. but then i
realized there was a race condition. we locked the eoid table then
unlocked with our pointer THEN locked the sharted object mutex ...
then unlocked it. that was a race condtion gap. so we should share the
same lock anyway - if it's a shared object, grab the shared object
mutex then do a lookup and if the lookup does not fail, KEEP the lock
until it is released by the return from eo function or by some special
macro/funcs that released a matching lock. since its a recursive lock
this is all fine. as its also a universal single lock for all objects
we just need the eoid to know if it's shared and needs locking based
on the domain bits. so now do this locking properly with just a single
mutex, not both a spinlock and mutex and keep the lock around until
totally done with the object. this plugs the race condition holes and
goes from 1 spinlock lock and unlock then a mutex lock and unlokc to
just a single mutex lock and unlock. this means shared objects are
actually truly safe across threads and only have the overhead of a
single recursive mutex to lock and unlock in every api call.
Summary:
as told in _eina_stringshared_key_cmp in eina_hash.c:
originally we want to do this:
return key1 - key2;
but since they are ptrs and an int can't store the different of 2 ptrs in
either 32 or 64bit (signed hasn't got enough range for the diff of 2
32bit values regardless of their type... we'd need 33bits or 65bits)
So changing this to the same logic.
Reviewers: tasn, raster
Subscribers: cedric, jpeg
Differential Revision: https://phab.enlightenment.org/D4298
This change lets us remove a field from the structure that leads to
around 20KiB more of saving in private dirty pages in elementary.
This also looks a bit better and feels a bit cleaner.
Breaks API and ABI.
Before this commit, function overrides were explicit. That is, you'd
have to explicitly state you were overriding a function instead of
creating a new one. This made the code a tad more complex, and was also
a bit more annoying to use. This commit removes this extra piece of
information.
This means we now store much less information per function, that will
let us further optimise out structures in the future.
Now that we have recursive locks, the class creation code can be much simpler.
All the code there was essentially our own implementation of recursive locks,
or rather a special case of those.
This is no longer needed.
this adds a signle mutex (recursive) mutex for all eo objects that is
auto-called by _efl_object_call_resolve() and _efl_object_call_end()
that wrap all eo method calls and since its recursive it can be
blindly called for sub-calls. this will lock all shared objects during
any call to any shared object so only the thread calling now has
access until it releases. not fine-grained but good enough and the
best we can do "simplistically".
This moved all the eoid tables, eoid lookup caches, generation count
information ad eo_isa cache into a TLS segment of memory that is
thread private. There is also a shared domain for EO objects that all
threads can access, but it has an added cost of a lock. This means
objects accessed outside the thread they were created in cannot be
accessed by another thread unless they are adopted in temporarily, or
create4d with the shared domain active at the time of creation. child
objects will use their parent object domain if created with a parent
object passed in. If you were accessing EO (EFL) objects across threads
before then this will actually now cause your code to fail as it was
invalid before to do this as no actual objects were threadsafe in EFL,
so this will force things to "fail early".
ecore_thread_main_loop_begin() and end() still work as this uses the
eo domain adoption features to temporarily adopt a domain during this
section and then return it when done.
This returns speed back to eo brining the overhead in my tests of
lookup for the elm genlist autobounce test in elementary from about
5-7% down to 2.5-2.6%. A steep drop.
This does not mean everything is perfect. Still to do are:
1. Tests in the test suite
2. Some API's to help for sending objects from thread to thread
3. Make the eo call cache TLS data to make it also safe
4. Look at other locks in eo and probably move them to TLS data
5. Make eo resolve and call wrappers that call the real method func do
recursive mutex wrapping of the given object IF it is a shared object
to provide threadsafety transparently for shared objects (but adding
some overhead as a result)
6. Test test est, and that is why this commit is going in now for wider
testing
7. Decide how to make this work with sending IPC (between threads)
8. Deciding what makes an object sendable (a sendable property in base?)
9. Deciding what makes an object shareable (a sharable property in base?)
This is another follow up to the investigations of T4227. As stated
there, in any PIE (a shared library is one), structures, even const ones
end up being written to because of dynamic relocation. This means that
using static const structures has actually lead to no savings, only
waste. Since we never really needed them, using them made things even
worse than just having a different API that doesn't save them.
Thus, this commit changes the way we set the functions. Instead of
passing a pre-populated struct, we now just have an initialiser function
where you set the functions. This on its own doesn't significantly reduce
the amount of dirty memory pages for a reason I have yet to uncover,
though I believe it's done as a misguided compiler optimisation.
However, this design is flexible enough so we can change to another one
that is quite ugly, but I have already tested and proven that does that.
This patch series doesn't include the better improvement (passing
everything on the stack as va_args) because the API was too ugly
for me to bear, and I would rather first make sure there is no way to
force the compiler to do the right thing here.
Unfortunately this commit gives up on useless stricter validation.
Before this commit we would make sure that we are only overriding
functions correctly defined in our hierarchy. With this one, we don't
anymore. This is not a big problem though because this is a check that
is also enforced by Eolian. So as long as you are using Eolian, you
should be fine.
Breaks API and ABI!
@feature
This improve speed of processing events in genlist scrolling benchmark by 30%
inside the efl_object_event_callback_call code. Not a really big deal as it
goes from 0.9% to 0.6% of the total time spend. Welcome to micro optimization.
Rename the type to something more sensible and change it to remove the
last remanent of Eo1. This fixes a fixme that has been there for a
while.
The type doesn't really matter, it just looks nicer with the va_list.
This commit implements a sort of CoW for the vtables. The vtables are
usually just linked to and refcounted. When we need to change them we
allocate new ones and copy them over so we can write to them.
I wrote some code to measure the effectiveness of this change. When
running elementary_test (and immediately exiting) I saw that out of the
total number of vtable chains (561) that were needed by the classes in
the EFL, 79 (14.08%) were reused. Considering that I had to add
refcounting (unsigned short, but let's consider it's the size of a word
because of alignment), I would calculate the saving as such (in bytes):
Number of items in a chain (refcounted block): 32
32 bit:
sizeof(chain_node) = 8
Mem wasted on refcounting: 561 * 4 = 2244
Mem saved because of sharing: 79 * (32 * 8) = 20224
Total save: 17980 bytes
64 bit:
sizeof(chain_node) = 16
Mem wasted on refcounting: 561 * 8 = 4488
Mem saved because of sharing: 79 * (32 * 16) = 40448
Total save: 35960 bytes
Wow, we use a lot of memory in Eo classes, I'm sure we can
save even more if we put our hearts into it (change the shareable units
to be smaller to increase the chance of sharing).
This is internal and doesn't affect API/ABI so we can change this even
further with time.
This also improves efl_object_override(). This should now be quite
memory efficient (don't abuse, but it's not a big hogg as it was), so
feel free to abuse that one and rely on it in API.
@feature
ok. so here's the issue at least now. we have eo objects in the canvas
and they have a refcount of 2 user_refcount is 0. the calls stack does
NOT show we are calling callbacks at that time on these objects. they
are not in the backtrace (the canvas is, the objects themselves are
not).
SOMETHING is keeping 2 eo "internal" refs on these objects and i have
no idea what/how/who. it's a royal pain in the butt to find out as the
only way is lots and lots of logging and you get drowned in the
logging...
so what I have now done is a super ugly workaround that detects these
zombie objects that refuse to die and just FORCES them to die when the
evas canvas frees and clears out layers.
ac10a00acc doesn't really cause the
issue, it just brings it out in the open for all to see far more
easily. but something is deeply wrong SOMEWHERE with SOME objects and
our refcounts.
this fixes T4187
This is a (minor) API & ABI break in Eo.h!
I say minor as eo_override shouldn't be used yet (EO is unstable
and this patch includes all the use cases in EFL).
I'm not very happy about the new form of the macro, but it avoids
two things:
- passing in a struct (valid in C, but never used in EFL)
- using a GCC construct to create structs on the fly
It was inspired by the event array define, but I don't think
we need the runtime memcpy here.
See also:
https://gcc.gnu.org/onlinedocs/gcc/Compound-Literals.html
eo_override would leak the vtable if called multiple times, this
fixes that. Also, it is now possible to revert back to the original
class' vtable by passing in { NULL, 0 }
I believe it is thus possible to incrementally override more
functions on an object. Absolutely not recommended, but should work.
But it is not possible to selectively revert back to the original
class implementation on a single method. Use eo_super for that,
or revert back the entire object overrides.
PS: Is it normal that we pass in a struct? We never do that in EFL...
When using eo_add_ref, it was increasing the refcount before the user
context in the addition has fully ended. This means the object had its
reference increased while still not finalized, which means it was
sometimes passed with an increased refcount to unsuspecting class code.
The correct behaviour is to increase the reference count just before
returning the object to the user at the end of eo_add so the reference
count is only increased for whoever asked for it.
Breaks ABI!
@fix
This commit changes the way refcount is dealt with internally. Before
this commit, there was one refcount shared between Eo internals and
users. Now there is a refcount for eo operations (like for example,
function calls) and one for user refcount (eo_ref).
An example bug that this protects against (which is seemingly rather
common) is:
some_eo_func(obj);
// Inside the implementation of that func:
pd->a = 1; // The object's private data
eo_unref(obj); // To delete the object
eo_unref(obj); // A big one extra unref
pd->a = 2; // Segfault, this data has already been freed
This is a feature, but really just a fix for a class of bugs.
@feature
I was getting a crash in eo_shutdown, inside
_efl_event_pointer_class_destructor as I was calling eo_del
from there. But the parent class was already destroyed.
Assuming class IDs can only go up, and child classes are only
instanciated after all their parents, it is safer to call the
class destructors in reverse order.
Obviously, still pretty sure eo_del() in a class_destructor
is not a good idea...
This was needed when the eo composite object was still in beta. Since commit
d7c45e41d4 this is no longer the case. No beta
part left in eo base so we can safely remove this define.