edje_cc has non-deterministic output #41

Open
opened 2023-10-12 03:13:01 -07:00 by bmwiedemann · 5 comments

While working on reproducible builds for openSUSE, I found that our efl 1.26.3 and enlightenment-theme-openSUSE packages varied in .edj files, even when trying to vary as little as possible.

A similar previous issue was fixed in commit 4d2117ef2a

Steps to reproduce (in the openSUSE package after build):

cd /home/abuild/rpmbuild/BUILD/efl-1.26.3/x86_64-suse-linux
export EFL_RUN_IN_TREE=1
for i in 1 2 3 ; do
  rm -f data/elementary/themes/default.edj
  /home/abuild/rpmbuild/BUILD/efl-1.26.3/x86_64-suse-linux/src/bin/edje/edje_cc -beta -fastdecomp -deps data/elementary/themes/default.edj.d -sd /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/snd -id /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/img -id /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/fdo -fd /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/fnt ../data/elementary/themes/default.edc data/elementary/themes/default.edj
  md5sum data/elementary/themes/default.edj
done

Actual results:

0ade49720591d8922cdf7a3636e73a11  data/elementary/themes/default.edj
7bd0192cdd4cc1fd00a84d2df1dbdb78  data/elementary/themes/default.edj
5b048e1942ec40ad04563098400a2ff6  data/elementary/themes/default.edj

Running the loop 100 times produces 14 different results. This suggests that at least 4 bits of entropy are involved. There is a 6+% chance to get two identical results by chance.

Expected results:
output should be deterministic, so only change when inputs change.

There seems to be some issue with ordering when looking at the diff:

--- strings RPMS.1/usr/share/elementary/themes/default.edj
+++ strings RPMS.2/usr/share/elementary/themes/default.edj
@@ -176,8 +176,8 @@
 edje/collections/191
 edje/collections/183
 edje/scripts/embryo/source/1769/14
-edje/scripts/embryo/source/1733/14
 edje/scripts/embryo/source/1734/7
+edje/scripts/embryo/source/1733/14
 edje/scripts/embryo/source/250/29
 edje/scripts/embryo/source/219/21
 edje/scripts/embryo/source/219/17
[...]
While working on [reproducible builds](https://reproducible-builds.org/) for [openSUSE](https://en.opensuse.org/openSUSE:Reproducible_Builds), I found that our `efl` 1.26.3 and `enlightenment-theme-openSUSE` packages varied in `.edj` files, even when trying to vary as little as possible. A similar previous issue was fixed in commit 4d2117ef2a5344d298dfa7768d01feab0cf86558 Steps to reproduce (in the openSUSE package after build): ```bash cd /home/abuild/rpmbuild/BUILD/efl-1.26.3/x86_64-suse-linux export EFL_RUN_IN_TREE=1 for i in 1 2 3 ; do rm -f data/elementary/themes/default.edj /home/abuild/rpmbuild/BUILD/efl-1.26.3/x86_64-suse-linux/src/bin/edje/edje_cc -beta -fastdecomp -deps data/elementary/themes/default.edj.d -sd /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/snd -id /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/img -id /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/fdo -fd /home/abuild/rpmbuild/BUILD/efl-1.26.3/data/elementary/themes/fnt ../data/elementary/themes/default.edc data/elementary/themes/default.edj md5sum data/elementary/themes/default.edj done ``` Actual results: ``` 0ade49720591d8922cdf7a3636e73a11 data/elementary/themes/default.edj 7bd0192cdd4cc1fd00a84d2df1dbdb78 data/elementary/themes/default.edj 5b048e1942ec40ad04563098400a2ff6 data/elementary/themes/default.edj ``` Running the loop 100 times produces 14 different results. This suggests that at least 4 bits of entropy are involved. There is a 6+% chance to get two identical results by chance. Expected results: output should be deterministic, so only change when inputs change. There seems to be some issue with ordering when looking at the diff: ```diff --- strings RPMS.1/usr/share/elementary/themes/default.edj +++ strings RPMS.2/usr/share/elementary/themes/default.edj @@ -176,8 +176,8 @@ edje/collections/191 edje/collections/183 edje/scripts/embryo/source/1769/14 -edje/scripts/embryo/source/1733/14 edje/scripts/embryo/source/1734/7 +edje/scripts/embryo/source/1733/14 edje/scripts/embryo/source/250/29 edje/scripts/embryo/source/219/21 edje/scripts/embryo/source/219/17 [...] ```
Owner

You won't be able to make it reproducable. Why?

edje_cc uses async threaded loading of image assets and will pack them into the target file as the threads report completion. This is a core feature of evas that is ued many places to keep decode out of the main loop to stall it on that I/O.

Edje_cc also has other threaded encode/compress passes for data blobs too which will do the same thing. Spawn a bunch of worker threads to compress things in parallel. The source blobs in your diff above are packed into the file for decompile purposes and... guess what? They are compressed (to keep the edj file from being too insanely big) and are compressed ... in threads. So there will be non-deterministic creation of the content of the file simply due to timing of scheduling of these threads. All edje cares about is that all the data is actually there and it is encoded right and can be decoded right.

So going to have to close. It's coded to try be faster to compile, not deterministic.

You won't be able to make it reproducable. Why? edje_cc uses async threaded loading of image assets and will pack them into the target file as the threads report completion. This is a core feature of evas that is ued many places to keep decode out of the main loop to stall it on that I/O. Edje_cc also has other threaded encode/compress passes for data blobs too which will do the same thing. Spawn a bunch of worker threads to compress things in parallel. The source blobs in your diff above are packed into the file for decompile purposes and... guess what? They are compressed (to keep the edj file from being too insanely big) and are compressed ... in threads. So there will be non-deterministic creation of the content of the file simply due to timing of scheduling of these threads. All edje cares about is that all the data is actually there and it is encoded right and can be decoded right. So going to have to close. It's coded to try be faster to compile, not deterministic.
Author

I want to mention here that xz -T and zstd -T and pigz all do threaded compression while being able to produce deterministic results.
I did not look into how they do it, but could imagine that they collects results in a specific order.

Also, edje_cc was non-deterministic on 1-core VMs where there is no advantage in using extra threads.

I want to mention here that `xz -T` and `zstd -T` and `pigz` all do threaded compression while being able to produce deterministic results. I did not look into how they do it, but could imagine that they collects results in a specific order. Also, `edje_cc` was non-deterministic on 1-core VMs where there is no advantage in using extra threads.
Author

I had some more looks into the source and something is wrong there.

4cea33945d/src/bin/edje/edje_cc.c (L50) says the default is threads = 0
but the help text in line 136 suggests that -threads is the default, which would be threads = 1.

Then there are 22 instances of if (threads) in edje_cc_out.c that should prevent the use of threads, but strace showed that it still calls the clone syscall 276 times to create as many threads. So at least the -nothreads param is broken.

I had some more looks into the source and something is wrong there. https://git.enlightenment.org/enlightenment/efl/src/commit/4cea33945df5bc82c123e8db88da8403a69912b9/src/bin/edje/edje_cc.c#L50 says the default is `threads = 0` but the help text in line 136 suggests that `-threads` is the default, which would be `threads = 1`. Then there are 22 instances of `if (threads)` in `edje_cc_out.c` that should prevent the use of threads, but strace showed that it still calls the `clone` syscall 276 times to create as many threads. So at least the `-nothreads` param is broken.
Owner

This is not like tar.gz where a tar file is compressed as one unit. edj files areeet files. these have each chunk of data compressed as a separate unit so it can be random-accessed and decompressed on the fluy without decompressing everything up until that point. that means that each key in th edj file (about 6000-7000 of them for the default theme) is throwing into a work queue and compressed on a set of threads which assemble the results when the compression finishes for each key/chunk as eet files (edje uses eet) are built like:

{header}{list-of-keys}{data1}{data2}{data3}...

so list-of-keys doesnt know where data2 will be (offset from that) until data1 is compressed and so on. it assigns a spot when the data comes in from compression.

and it'll still spawn threads to compress AND it'll still use threads to load/decode imagess from disk even with only 1 core. as each image decode comes back from the thread(s) doing the decode it will THEN be put onto the work queue to encode (compress) as above. in addition to all the other data chunks. so async preload alone will alter ordering. in addition to threaded encode. the threads option only affects this latter step. async loading of images is always done via evas and evas always uses threads for this.

This is not like tar.gz where a tar file is compressed as one unit. edj files areeet files. these have each chunk of data compressed as a separate unit so it can be random-accessed and decompressed on the fluy without decompressing everything up until that point. that means that each key in th edj file (about 6000-7000 of them for the default theme) is throwing into a work queue and compressed on a set of threads which assemble the results when the compression finishes for each key/chunk as eet files (edje uses eet) are built like: {header}{list-of-keys}{data1}{data2}{data3}... so list-of-keys doesnt know where data2 will be (offset from that) until data1 is compressed and so on. it assigns a spot when the data comes in from compression. and it'll still spawn threads to compress AND it'll still use threads to load/decode imagess from disk even with only 1 core. as each image decode comes back from the thread(s) doing the decode it will THEN be put onto the work queue to encode (compress) as above. in addition to all the other data chunks. so async preload alone will alter ordering. in addition to threaded encode. the threads option only affects this latter step. async loading of images is always done via evas and evas always uses threads for this.
Author

I still think, it should be possible to do that in a deterministic way.

It could be more like individual .gz files in a .tar or .cpio to allow random access.

Would you accept a patch that ensured determistic output order?

I still think, it should be possible to do that in a deterministic way. It could be more like individual .gz files in a .tar or .cpio to allow random access. Would you accept a patch that ensured determistic output order?
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: enlightenment/efl#41
No description provided.