evas_textblock: rainbow flag emoji treated as two clusters(update unibreak to version 4.2)
Summary:
if we have rainbow flag emoji (🏳️‍🌈)
we can use mouse/keyboard to move cursor inside it because we break it into two clusters, we break on 1F308,
This is wrong as we should treat emoji as a single cluster (based on rules mentioned in Unicode segmentation standard “Do not break within emoji modifier sequences or emoji ZWJ sequences” (https://unicode.org/reports/tr29/#GB11 )).
this issue happens because we don’t give 1F308 its correct grapheme break property value, I think this is a bug in the unibreak library as this Unicode 1F308 should have word break class value equals to Glue_After_ZWJ (based on https://www.unicode.org/reports/tr29/tr29-31.html#Glue_After_Zwj_WB and http://unicode.org/Public/emoji/5.0/emoji-zwj-sequences.txt) which will not make it break and we will get a single cluster.
I noticed that the current unibreak lib used in EFL seems to implement Unicode 9 (latest is Unicode 13) which uses obsolete and unused grapheme break property, such as E_Modifier & Glue_After_ZWJ, so if a new emoji introduced (rainbow flag was introduced after Unicode 9) and based on Unicode 9 it should use property E_Modifier or Glue_After_ZWJ we will have issue with it.
So I have updated unibreak lib using latest released version of unibreak (4.2) which implement Unicode 12.
I needed to remove **BREAK_AFTER(i)** to pass the tests in D1140 as spaces do not break on latest update (also related to T995).
{F3868712}
this should fix T8665 & T8688
Reviewers: ali.alzyod, woohyun, bowonryu, zmike, segfaultxavi, bu5hm4n
Reviewed By: ali.alzyod
Subscribers: segfaultxavi, cedric, #reviewers, #committers
Tags: #efl
Maniphest Tasks: T8665
Differential Revision: https://phab.enlightenment.org/D11743
2020-09-01 03:33:51 -07:00
|
|
|
New in libunibreak 4.2
|
|
|
|
|
|
|
|
- Update the data to conform to Unicode 12
|
|
|
|
|
|
|
|
New in libunibreak 4.1
|
|
|
|
|
|
|
|
- Update the code and data to conform to Unicode 11.0.0, especially
|
|
|
|
adding support for extended pictographs in word and grapheme breaking
|
|
|
|
- ZWJ support has been much improved (it was broken)
|
|
|
|
- Make minor tweaks to the project files
|
|
|
|
|
evas textblock: add/apply cursor cluster APIs based on grapheme cluster
Summary:
Add a feature for moving cursor over a grapheme cluster.
It is applied to edje_entry.c and elm_entry.c for improving
cursor handling just like other modern text editors. ex) gedit
The patch on Evas needs to update libunibreak library.
So, the patch will update libunibreak, too.
@feature
Test Plan:
1. Put "ഹലോ" in your entry.
2. Your cursor can reach at the end of text from the beginning
only in 2 right key event with this feature.
Reviewers: raster, cedric, jpeg, herdsman, zmike, devilhorns
Reviewed By: herdsman, zmike
Subscribers: #reviewers, #committers, zmike, bowonryu, woohyun
Tags: #efl
Differential Revision: https://phab.enlightenment.org/D5490
2018-08-20 04:21:53 -07:00
|
|
|
New in libunibreak 4.0
|
|
|
|
|
|
|
|
- Update the code and data to conform to Unicode 9.0.0
|
|
|
|
- Add grapheme breaking support
|
|
|
|
- Tested and enhanced according to the Unicode test suite
|
|
|
|
- Make bug fixes
|
|
|
|
|
2015-12-21 03:10:35 -08:00
|
|
|
New in libunibreak 3.0
|
|
|
|
|
|
|
|
- Update the code and data to conform to Unicode 7.0.0
|
|
|
|
- Update build scripts to fix compatibility issues
|
|
|
|
- Improve code structure
|
|
|
|
- Make a few bug fixes
|
|
|
|
|
2015-05-07 02:40:57 -07:00
|
|
|
New in libunibreak 1.1
|
|
|
|
|
|
|
|
- Update the code and data to conform to Unicode 6.2.0
|
|
|
|
- Update build files to support libtool 2.4
|
|
|
|
- Adjust code structure
|
|
|
|
- Make a few bug fixes
|
|
|
|
|
|
|
|
New in libunibreak 1.0
|
|
|
|
|
|
|
|
- Add word breaking support
|
evas_textblock: rainbow flag emoji treated as two clusters(update unibreak to version 4.2)
Summary:
if we have rainbow flag emoji (🏳️‍🌈)
we can use mouse/keyboard to move cursor inside it because we break it into two clusters, we break on 1F308,
This is wrong as we should treat emoji as a single cluster (based on rules mentioned in Unicode segmentation standard “Do not break within emoji modifier sequences or emoji ZWJ sequences” (https://unicode.org/reports/tr29/#GB11 )).
this issue happens because we don’t give 1F308 its correct grapheme break property value, I think this is a bug in the unibreak library as this Unicode 1F308 should have word break class value equals to Glue_After_ZWJ (based on https://www.unicode.org/reports/tr29/tr29-31.html#Glue_After_Zwj_WB and http://unicode.org/Public/emoji/5.0/emoji-zwj-sequences.txt) which will not make it break and we will get a single cluster.
I noticed that the current unibreak lib used in EFL seems to implement Unicode 9 (latest is Unicode 13) which uses obsolete and unused grapheme break property, such as E_Modifier & Glue_After_ZWJ, so if a new emoji introduced (rainbow flag was introduced after Unicode 9) and based on Unicode 9 it should use property E_Modifier or Glue_After_ZWJ we will have issue with it.
So I have updated unibreak lib using latest released version of unibreak (4.2) which implement Unicode 12.
I needed to remove **BREAK_AFTER(i)** to pass the tests in D1140 as spaces do not break on latest update (also related to T995).
{F3868712}
this should fix T8665 & T8688
Reviewers: ali.alzyod, woohyun, bowonryu, zmike, segfaultxavi, bu5hm4n
Reviewed By: ali.alzyod
Subscribers: segfaultxavi, cedric, #reviewers, #committers
Tags: #efl
Maniphest Tasks: T8665
Differential Revision: https://phab.enlightenment.org/D11743
2020-09-01 03:33:51 -07:00
|
|
|
- Change the library name to "libunibreak", while keeping maximum
|
|
|
|
compatibility
|
2015-05-07 02:40:57 -07:00
|
|
|
- Add pkg-config support
|
|
|
|
|
|
|
|
New in liblinebreak 2.1
|
|
|
|
|
|
|
|
- Update the data according to LineBreak-6.0.0.txt
|
|
|
|
- Fix the bug that an assertion in code can fail if U+FFFC is
|
|
|
|
encountered at the beginning of a line
|
|
|
|
|
|
|
|
New in liblinebreak 2.0
|
|
|
|
|
|
|
|
- Update the algorithm and data according to UAX #14-24 and
|
|
|
|
LineBreak-5.2.0.txt
|
|
|
|
- Rename some functions to reduce namespace pollution
|
|
|
|
- Make Doxygen documentation better
|
|
|
|
|
|
|
|
New in liblinebreak 1.2
|
|
|
|
|
|
|
|
- Fix the bug that an assertion in code can fail if an invalid UTF-8 or
|
|
|
|
UTF-16 sequence is encountered near the end of input
|
|
|
|
- Remove the specialization of right single quotation mark as closing
|
|
|
|
punctuation mark in English, French, and Spanish, because it can be
|
|
|
|
used as apostrophe
|
|
|
|
- Make Doxygen documentation better
|
|
|
|
|
|
|
|
New in liblinebreak 1.1
|
|
|
|
|
|
|
|
- Make get_lb_prop_lang static and not an exported symbol
|
|
|
|
- Define is_line_breakable to alias to is_breakable
|
|
|
|
- Declare get_next_char_utf* will be changed to lb_get_next_char_utf*
|
|
|
|
- Move the declarations of get_next_char_utf* from linebreak.h to
|
|
|
|
linebreakdef.h
|
|
|
|
- Add the function documentation comments to the header files
|
|
|
|
|
|
|
|
New in liblinebreak 1.0
|
|
|
|
|
|
|
|
- Update the line breaking data according to UAX #14-22 and
|
|
|
|
LineBreak-5.1.0.txt
|
|
|
|
- Add autoconfiscation support (./configure, make, make install)
|
|
|
|
- Add Makefile for MSVC
|
|
|
|
|
|
|
|
First public release (0.9.6, or 20080421)
|
|
|
|
|
|
|
|
- Implement line breaking algorithm according to UAX #14-19
|
|
|
|
- Line breaking data is generated from LineBreak-5.0.0.txt
|
|
|
|
- Makefile only supports GCC
|