Static deps unibreak: update to what will soon be version 3.

Version 3 is not yet released, but this is on track to become it.
This is based on commit: a815e11f7ebf35b59278f783227a829ee4692760.

@feature.
This commit is contained in:
Tom Hacohen 2015-05-07 10:53:11 +01:00
parent ba77a837a3
commit 7a49d23f90
17 changed files with 952 additions and 303 deletions

View File

@ -90,6 +90,8 @@ lib/evas/canvas/evas_vg_private.h
# Linebreak # Linebreak
noinst_HEADERS += \ noinst_HEADERS += \
static_libs/libunibreak/unibreakbase.h \
static_libs/libunibreak/unibreakdef.h \
static_libs/libunibreak/linebreak.h \ static_libs/libunibreak/linebreak.h \
static_libs/libunibreak/linebreakdef.h \ static_libs/libunibreak/linebreakdef.h \
static_libs/libunibreak/wordbreakdef.h \ static_libs/libunibreak/wordbreakdef.h \
@ -98,6 +100,8 @@ static_libs/libunibreak/wordbreakdata.c
# Linebreak # Linebreak
lib_evas_libevas_la_SOURCES = \ lib_evas_libevas_la_SOURCES = \
static_libs/libunibreak/unibreakbase.c \
static_libs/libunibreak/unibreakdef.c \
static_libs/libunibreak/linebreak.c \ static_libs/libunibreak/linebreak.c \
static_libs/libunibreak/linebreakdata.c \ static_libs/libunibreak/linebreakdata.c \
static_libs/libunibreak/linebreakdef.c \ static_libs/libunibreak/linebreakdef.c \

View File

@ -1,3 +1,167 @@
2015-04-19 Wu Yongwei <wuyongwei@gmail.com>
* LICENCE: Update copyright information.
2015-04-19 Wu Yongwei <wuyongwei@gmail.com>
* src/linebreakdata2.tmp: Remove the unnecessary inclusion of
"linebreak.h".
* src/linebreakdata.c: Ditto.
2015-04-19 Wu Yongwei <wuyongwei@gmail.com>
Use extended regexp to simplify expressions.
* src/LineBreak1.sed: Simplify with extended regexp.
* src/LineBreak2.sed: Ditto.
* src/Makefile.am: Add `-E' to the command line of sed.
2015-04-19 Wu Yongwei <wuyongwei@gmail.com>
Make further clean-up for the 3.0 release.
* configure.ac (AC_INIT): Change the library version to `3.0'.
* Doxyfile (PROJECT_NUMBER): Change to `3.0'.
(EXCLUDE): Add the missing `src/' before `filter_dup.c'.
* src/wordbreakdata1.tmpl: Remove the inclusion of "linebreak.h".
* src/wordbreakdata.c: Ditto.
2015-04-19 Wu Yongwei <wuyongwei@gmail.com>
* src/wordbreakdef.h: Include "unibreakdef.h".
2015-04-19 Wu Yongwei <wuyongwei@gmail.com>
* purge: Make it remove `compile'.
2015-04-18 Wu Yongwei <wuyongwei@gmail.com>
* src/unibreakdef.c: New file.
* src/unibreakdef.h: New file.
* src/wordbreak.c: Rename reference to `lb_get_next_char...' to
`ub_get_next_char...'.
* src/linebreak.c: Ditto.
(lb_get_next_char_utf8): Remove definition.
(lb_get_next_char_utf16): Ditto.
(lb_get_next_char_utf32): Ditto.
* src/linebreakdef.h: Include "unibreakdef.h".
(EOS): Remove definition.
(get_next_char_t): Remove typedef.
(lb_get_next_char_utf8): Remove declaration.
(lb_get_next_char_utf16): Ditto.
(lb_get_next_char_utf32): Ditto.
* src/Makefile.am (include_HEADERS): Add `unibreakdef.h'.
(libunibreak_la_SOURCES): Add `unibreakdef.c'.
(libunibreak_la_CFLAGS): Define to `-W -Wall'.
2015-04-18 Wu Yongwei <wuyongwei@gmail.com>
* src/unibreakbase.c: New file.
* src/unibreakbase.h: New file.
* src/linebreak.c (linebreak_version): Remove definition.
* src/linebreak.h: Include "unibreakbase.h".
(linebreak_version): Remove declaration.
(LINEBREAK_VERSION): Remove definition.
(utf8_t): Remove typedef.
(utf16_t): Remove typedef.
(utf32_t): Remove typedef.
* src/wordbreak.h: Include "unibreakbase.h" instead of
"linebreak.h".
* src/Makefile.am (include_HEADERS): Add `unibreakbase.h'.
(libunibreak_la_SOURCES): Add `unibreakbase.c'.
(libunibreak_la_LDFLAGS): Set the version-info to `3:0:0'.
2015-04-13 Wu Yongwei <wuyongwei@gmail.com>
* src/wordbreak.c: Update copyright and version information.
* src/wordbreak.h: Ditto.
* src/wordbreakdef.h: Ditto.
2015-04-13 Tom Hacohen <tom@stosb.com>
* src/wordbreakdef.h (enum WordBreakClass): Clean up and reorder.
2015-04-10 Tom Hacohen <tom@stosb.com>
Don't ship internal header.
* src/Makefile.am (include_HEADERS): Remove `wordbreakdef.h'.
(EXTRA_DIST): Add `wordbreakdef.h'.
2015-04-10 Tom Hacohen <tom@stosb.com>
Update files according to UAX #29-29, for Unicode 7.0.0.
* src/wordbreak.c (set_wordbreaks): Take care of Hebrew letters.
* src/wordbreakdata.h (enum WordBreakClass): Add WBP_Hebrew_Letter,
WBP_Single_Quote, and WBP_Double_Quote.
* src/wordbreakdata.c: Regenerate from WordBreakProperty-7.0.0.txt.
2015-04-10 Tom Hacohen <tom@stosb.com>
* src/sort_numeric_hex.py: Fix compatibility issue with new Python.
* src/Makefile.am (wordbreakdata): Fix word break data enum for
names with underscores.
* src/wordbreakdef.h (enum WordBreakClass): Correct WBP_Regional to
WBP_Regional_Indicator.
* src/wordbreak.c: Ditto.
* src/wordbreakdata.c: Ditto.
2015-04-05 Wu Yongwei <wuyongwei@gmail.com>
* src/linebreak.c: Make pointer alignment consistent.
* src/linebreak.h: Ditto.
* src/linebreakdef.h: Ditto.
2015-04-05 Wu Yongwei <wuyongwei@gmail.com>
* src/linebreak.h: Update copyright year and UAX information.
* src/linebreakdef.c: Ditto.
2015-04-05 Wu Yongwei <wuyongwei@gmail.com>
Implement rule LB21a, as introduced by Revision 28 of UAX #14.
* src/linebreakdef.h (struct LineBreakContext): Add new field
fLb21aHebrew.
* src/linebreak.c (treat_first_char): Initialize fLb21aHebrew
properly.
(lb_init_break_context): Clear fLb21aHebrew.
(get_lb_result_lookup): Apply rule LB21a and update fLb21aHebrew.
2014-12-06 Mikhail Polubisok <mpolubisok@gmail.com>
* src/linebreak.c (get_lb_result_lookup): Extend assertion condition
that has been wrong since Unicode 6.2.
2014-09-19 Petr Filipsky <philodej@gmail.com>
* src/LineBreak1.sed: Fix sed expression due to changed
LineBreak.txt file format.
2014-05-24 Wu Yongwei <wuyongwei@gmail.com>
* src/Makefile.gcc (TARGET): Change from `liblinebreak.a' to
`libunibreak.a'.
2014-05-23 Christoph Junghans <junghans@votca.org>
Fix `make install DESTDIR=...'.
* Makefile.am (install-exec-hook): Prefix `$(DESTDIR)/' before
`${libdir}'.
2014-02-16 Wu Yongwei <wuyongwei@gmail.com>
Following https://people.gnome.org/~walters/docs/build-api.txt, add
a quasi-standard autogen.sh, which generates `configure' and runs it
optionally.
* autogen.sh: New file.
2014-02-12 Wu Yongwei <wuyongwei@gmail.com>
* bootstrap: Remove the overkill bits and add back autoreconf.
* purge: Ensure config.cache is removed.
2014-02-10 Tom Hacohen <tom@stosb.com>
* bootstrap: Solve bootstrap problems found on Linux and Mac (thanks
to Nick Shvelidze and Christopher Baker).
2013-11-14 Wu Yongwei <wuyongwei@gmail.com> 2013-11-14 Wu Yongwei <wuyongwei@gmail.com>
* src/linebreak.c: Add/update comments and doc comments. * src/linebreak.c: Add/update comments and doc comments.

View File

@ -1,5 +1,6 @@
Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com> Copyright (C) 2008-2015 Wu Yongwei <wuyongwei at gmail dot com>
Copyright (C) 2012 Tom Hacohen <tom dot hacohen at samsung dot com> Copyright (C) 2012-2015 Tom Hacohen <tom at stosb dot com>
Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
This software is provided 'as-is', without any express or implied This software is provided 'as-is', without any express or implied
warranty. In no event will the author be held liable for any damages warranty. In no event will the author be held liable for any damages

View File

@ -4,7 +4,7 @@
* Line breaking in a Unicode sequence. Designed to be used in a * Line breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2008-2013 Wu Yongwei <wuyongwei at gmail dot com> * Copyright (C) 2008-2015 Wu Yongwei <wuyongwei at gmail dot com>
* Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com> * Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
@ -31,9 +31,9 @@
* Unicode 5.0.0: * Unicode 5.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-19.html> * <URL:http://www.unicode.org/reports/tr14/tr14-19.html>
* *
* This library has been updated according to Revision 30, for * This library has been updated according to Revision 33, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-30.html> * <URL:http://www.unicode.org/reports/tr14/tr14-33.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>
@ -45,7 +45,7 @@
* Implementation of the line breaking algorithm as described in Unicode * Implementation of the line breaking algorithm as described in Unicode
* Standard Annex 14. * Standard Annex 14.
* *
* @version 2.5, 2013/11/14 * @version 2.7, 2015/04/18
* @author Wu Yongwei * @author Wu Yongwei
* @author Petr Filipsky * @author Petr Filipsky
*/ */
@ -66,11 +66,6 @@
*/ */
#define LINEBREAK_INDEX_SIZE 40 #define LINEBREAK_INDEX_SIZE 40
/**
* Version number of the library.
*/
const int linebreak_version = LINEBREAK_VERSION;
/** /**
* Enumeration of break actions. They are used in the break action * Enumeration of break actions. They are used in the break action
* pair table below. * pair table below.
@ -451,7 +446,7 @@ static enum LineBreakClass resolve_lb_class(
* @post \a lbpCtx->lbcCur has the updated line break class * @post \a lbpCtx->lbcCur has the updated line break class
*/ */
static void treat_first_char( static void treat_first_char(
struct LineBreakContext* lbpCtx) struct LineBreakContext *lbpCtx)
{ {
switch (lbpCtx->lbcCur) switch (lbpCtx->lbcCur)
{ {
@ -465,6 +460,8 @@ static void treat_first_char(
case LBP_SP: case LBP_SP:
lbpCtx->lbcCur = LBP_WJ; /* Leading space treated as WJ */ lbpCtx->lbcCur = LBP_WJ; /* Leading space treated as WJ */
break; break;
case LBP_HL:
lbpCtx->fLb21aHebrew = 1; /* Rule LB21a */
default: default:
break; break;
} }
@ -485,7 +482,7 @@ static void treat_first_char(
* table lookup is needed * table lookup is needed
*/ */
static int get_lb_result_simple( static int get_lb_result_simple(
struct LineBreakContext* lbpCtx) struct LineBreakContext *lbpCtx)
{ {
if (lbpCtx->lbcCur == LBP_BK if (lbpCtx->lbcCur == LBP_BK
|| (lbpCtx->lbcCur == LBP_CR && lbpCtx->lbcNew != LBP_LF)) || (lbpCtx->lbcCur == LBP_CR && lbpCtx->lbcNew != LBP_LF))
@ -528,13 +525,12 @@ static int get_lb_result_simple(
* #LINEBREAK_ALLOWBREAK, and #LINEBREAK_NOBREAK * #LINEBREAK_ALLOWBREAK, and #LINEBREAK_NOBREAK
*/ */
static int get_lb_result_lookup( static int get_lb_result_lookup(
struct LineBreakContext* lbpCtx) struct LineBreakContext *lbpCtx)
{ {
/* TODO: Rule LB21a, as introduced by Revision 28 of UAX#14, is not
* yet implemented below. */
int brk = LINEBREAK_UNDEFINED; int brk = LINEBREAK_UNDEFINED;
assert(lbpCtx->lbcCur <= LBP_JT);
assert(lbpCtx->lbcNew <= LBP_JT); assert(lbpCtx->lbcCur <= LBP_RI);
assert(lbpCtx->lbcNew <= LBP_RI);
switch (baTable[lbpCtx->lbcCur - 1][lbpCtx->lbcNew - 1]) switch (baTable[lbpCtx->lbcCur - 1][lbpCtx->lbcNew - 1])
{ {
case DIR_BRK: case DIR_BRK:
@ -555,6 +551,19 @@ static int get_lb_result_lookup(
brk = LINEBREAK_NOBREAK; brk = LINEBREAK_NOBREAK;
break; break;
} }
/* Special processing due to rule LB21a */
if (lbpCtx->fLb21aHebrew &&
(lbpCtx->lbcCur == LBP_HY || lbpCtx->lbcCur == LBP_BA))
{
brk = LINEBREAK_NOBREAK;
lbpCtx->fLb21aHebrew = 0;
}
else if (!(lbpCtx->lbcNew == LBP_HY || lbpCtx->lbcNew == LBP_BA))
{
lbpCtx->fLb21aHebrew = (lbpCtx->lbcNew == LBP_HL);
}
lbpCtx->lbcCur = lbpCtx->lbcNew; lbpCtx->lbcCur = lbpCtx->lbcNew;
return brk; return brk;
} }
@ -568,9 +577,9 @@ static int get_lb_result_lookup(
* @post the line breaking context is initialized * @post the line breaking context is initialized
*/ */
void lb_init_break_context( void lb_init_break_context(
struct LineBreakContext* lbpCtx, struct LineBreakContext *lbpCtx,
utf32_t ch, utf32_t ch,
const char* lang) const char *lang)
{ {
lbpCtx->lang = lang; lbpCtx->lang = lang;
lbpCtx->lbpLang = get_lb_prop_lang(lang); lbpCtx->lbpLang = get_lb_prop_lang(lang);
@ -579,6 +588,7 @@ void lb_init_break_context(
lbpCtx->lbcCur = resolve_lb_class( lbpCtx->lbcCur = resolve_lb_class(
get_char_lb_class_lang(ch, lbpCtx->lbpLang), get_char_lb_class_lang(ch, lbpCtx->lbpLang),
lbpCtx->lang); lbpCtx->lang);
lbpCtx->fLb21aHebrew = 0;
treat_first_char(lbpCtx); treat_first_char(lbpCtx);
} }
@ -593,7 +603,7 @@ void lb_init_break_context(
* @post the line breaking context is updated * @post the line breaking context is updated
*/ */
int lb_process_next_char( int lb_process_next_char(
struct LineBreakContext* lbpCtx, struct LineBreakContext *lbpCtx,
utf32_t ch ) utf32_t ch )
{ {
int brk; int brk;
@ -617,127 +627,6 @@ int lb_process_next_char(
return brk; return brk;
} }
/**
* Gets the next Unicode character in a UTF-8 sequence. The index will
* be advanced to the next complete character, unless the end of string
* is reached in the middle of a UTF-8 sequence.
*
* @param[in] s input UTF-8 string
* @param[in] len length of the string in bytes
* @param[in,out] ip pointer to the index
* @return the Unicode character beginning at the index; or
* #EOS if end of input is encountered
*/
utf32_t lb_get_next_char_utf8(
const utf8_t *s,
size_t len,
size_t *ip)
{
utf8_t ch;
utf32_t res;
assert(*ip <= len);
if (*ip == len)
return EOS;
ch = s[*ip];
if (ch < 0xC2 || ch > 0xF4)
{ /* One-byte sequence, tail (should not occur), or invalid */
*ip += 1;
return ch;
}
else if (ch < 0xE0)
{ /* Two-byte sequence */
if (*ip + 2 > len)
return EOS;
res = ((ch & 0x1F) << 6) + (s[*ip + 1] & 0x3F);
*ip += 2;
return res;
}
else if (ch < 0xF0)
{ /* Three-byte sequence */
if (*ip + 3 > len)
return EOS;
res = ((ch & 0x0F) << 12) +
((s[*ip + 1] & 0x3F) << 6) +
((s[*ip + 2] & 0x3F));
*ip += 3;
return res;
}
else
{ /* Four-byte sequence */
if (*ip + 4 > len)
return EOS;
res = ((ch & 0x07) << 18) +
((s[*ip + 1] & 0x3F) << 12) +
((s[*ip + 2] & 0x3F) << 6) +
((s[*ip + 3] & 0x3F));
*ip += 4;
return res;
}
}
/**
* Gets the next Unicode character in a UTF-16 sequence. The index will
* be advanced to the next complete character, unless the end of string
* is reached in the middle of a UTF-16 surrogate pair.
*
* @param[in] s input UTF-16 string
* @param[in] len length of the string in words
* @param[in,out] ip pointer to the index
* @return the Unicode character beginning at the index; or
* #EOS if end of input is encountered
*/
utf32_t lb_get_next_char_utf16(
const utf16_t *s,
size_t len,
size_t *ip)
{
utf16_t ch;
assert(*ip <= len);
if (*ip == len)
return EOS;
ch = s[(*ip)++];
if (ch < 0xD800 || ch > 0xDBFF)
{ /* If the character is not a high surrogate */
return ch;
}
if (*ip == len)
{ /* If the input ends here (an error) */
--(*ip);
return EOS;
}
if (s[*ip] < 0xDC00 || s[*ip] > 0xDFFF)
{ /* If the next character is not the low surrogate (an error) */
return ch;
}
/* Return the constructed character and advance the index again */
return (((utf32_t)ch & 0x3FF) << 10) + (s[(*ip)++] & 0x3FF) + 0x10000;
}
/**
* Gets the next Unicode character in a UTF-32 sequence. The index will
* be advanced to the next character.
*
* @param[in] s input UTF-32 string
* @param[in] len length of the string in dwords
* @param[in,out] ip pointer to the index
* @return the Unicode character beginning at the index; or
* #EOS if end of input is encountered
*/
utf32_t lb_get_next_char_utf32(
const utf32_t *s,
size_t len,
size_t *ip)
{
assert(*ip <= len);
if (*ip == len)
return EOS;
return s[(*ip)++];
}
/** /**
* Sets the line breaking information for a generic input string. * Sets the line breaking information for a generic input string.
* *
@ -809,7 +698,7 @@ void set_linebreaks_utf8(
char *brks) char *brks)
{ {
set_linebreaks(s, len, lang, brks, set_linebreaks(s, len, lang, brks,
(get_next_char_t)lb_get_next_char_utf8); (get_next_char_t)ub_get_next_char_utf8);
} }
/** /**
@ -829,7 +718,7 @@ void set_linebreaks_utf16(
char *brks) char *brks)
{ {
set_linebreaks(s, len, lang, brks, set_linebreaks(s, len, lang, brks,
(get_next_char_t)lb_get_next_char_utf16); (get_next_char_t)ub_get_next_char_utf16);
} }
/** /**
@ -849,7 +738,7 @@ void set_linebreaks_utf32(
char *brks) char *brks)
{ {
set_linebreaks(s, len, lang, brks, set_linebreaks(s, len, lang, brks,
(get_next_char_t)lb_get_next_char_utf32); (get_next_char_t)ub_get_next_char_utf32);
} }
/** /**
@ -868,7 +757,7 @@ void set_linebreaks_utf32(
int is_line_breakable( int is_line_breakable(
utf32_t char1, utf32_t char1,
utf32_t char2, utf32_t char2,
const char* lang) const char *lang)
{ {
utf32_t s[2]; utf32_t s[2];
char brks[2]; char brks[2];

View File

@ -4,7 +4,7 @@
* Line breaking in a Unicode sequence. Designed to be used in a * Line breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com> * Copyright (C) 2008-2015 Wu Yongwei <wuyongwei at gmail dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages * warranty. In no event will the author be held liable for any damages
@ -30,9 +30,9 @@
* Unicode 5.0.0: * Unicode 5.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-19.html> * <URL:http://www.unicode.org/reports/tr14/tr14-19.html>
* *
* This library has been updated according to Revision 30, for * This library has been updated according to Revision 33, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-30.html> * <URL:http://www.unicode.org/reports/tr14/tr14-33.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>
@ -43,7 +43,7 @@
* *
* Header file for the line breaking algorithm. * Header file for the line breaking algorithm.
* *
* @version 2.2, 2012/10/06 * @version 2.4, 2015/04/18
* @author Wu Yongwei * @author Wu Yongwei
*/ */
@ -51,21 +51,12 @@
#define LINEBREAK_H #define LINEBREAK_H
#include <stddef.h> #include <stddef.h>
#include "unibreakbase.h"
#ifdef __cplusplus #ifdef __cplusplus
extern "C" { extern "C" {
#endif #endif
#define LINEBREAK_VERSION 0x0202 /**< Version of the library linebreak */
extern const int linebreak_version;
#ifndef LINEBREAK_UTF_TYPES_DEFINED
#define LINEBREAK_UTF_TYPES_DEFINED
typedef unsigned char utf8_t; /**< Type for UTF-8 data points */
typedef unsigned short utf16_t; /**< Type for UTF-16 data points */
typedef unsigned int utf32_t; /**< Type for UTF-32 data points */
#endif
#define LINEBREAK_MUSTBREAK 0 /**< Break is mandatory */ #define LINEBREAK_MUSTBREAK 0 /**< Break is mandatory */
#define LINEBREAK_ALLOWBREAK 1 /**< Break is allowed */ #define LINEBREAK_ALLOWBREAK 1 /**< Break is allowed */
#define LINEBREAK_NOBREAK 2 /**< No break is possible */ #define LINEBREAK_NOBREAK 2 /**< No break is possible */
@ -73,12 +64,12 @@ typedef unsigned int utf32_t; /**< Type for UTF-32 data points */
void init_linebreak(void); void init_linebreak(void);
void set_linebreaks_utf8( void set_linebreaks_utf8(
const utf8_t *s, size_t len, const char* lang, char *brks); const utf8_t *s, size_t len, const char *lang, char *brks);
void set_linebreaks_utf16( void set_linebreaks_utf16(
const utf16_t *s, size_t len, const char* lang, char *brks); const utf16_t *s, size_t len, const char *lang, char *brks);
void set_linebreaks_utf32( void set_linebreaks_utf32(
const utf32_t *s, size_t len, const char* lang, char *brks); const utf32_t *s, size_t len, const char *lang, char *brks);
int is_line_breakable(utf32_t char1, utf32_t char2, const char* lang); int is_line_breakable(utf32_t char1, utf32_t char2, const char *lang);
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@ -1,9 +1,8 @@
/* The content of this file is generated from: /* The content of this file is generated from:
# LineBreak-6.3.0.txt # LineBreak-7.0.0.txt
# Date: 2013-02-06, 19:45:00 GMT [KW, LI] # Date: 2014-02-28, 23:15:00 GMT [KW, LI]
*/ */
#include "linebreak.h"
#include "linebreakdef.h" #include "linebreakdef.h"
/** Default line breaking properties as from the Unicode Web site. */ /** Default line breaking properties as from the Unicode Web site. */
@ -93,11 +92,12 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x0363, 0x036F, LBP_CM }, { 0x0363, 0x036F, LBP_CM },
{ 0x0370, 0x037D, LBP_AL }, { 0x0370, 0x037D, LBP_AL },
{ 0x037E, 0x037E, LBP_IS }, { 0x037E, 0x037E, LBP_IS },
{ 0x0384, 0x0482, LBP_AL }, { 0x037F, 0x0482, LBP_AL },
{ 0x0483, 0x0489, LBP_CM }, { 0x0483, 0x0489, LBP_CM },
{ 0x048A, 0x0587, LBP_AL }, { 0x048A, 0x0587, LBP_AL },
{ 0x0589, 0x0589, LBP_IS }, { 0x0589, 0x0589, LBP_IS },
{ 0x058A, 0x058A, LBP_BA }, { 0x058A, 0x058A, LBP_BA },
{ 0x058D, 0x058E, LBP_AL },
{ 0x058F, 0x058F, LBP_PR }, { 0x058F, 0x058F, LBP_PR },
{ 0x0591, 0x05BD, LBP_CM }, { 0x0591, 0x05BD, LBP_CM },
{ 0x05BE, 0x05BE, LBP_BA }, { 0x05BE, 0x05BE, LBP_BA },
@ -159,7 +159,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x0829, 0x082D, LBP_CM }, { 0x0829, 0x082D, LBP_CM },
{ 0x0830, 0x0858, LBP_AL }, { 0x0830, 0x0858, LBP_AL },
{ 0x0859, 0x085B, LBP_CM }, { 0x0859, 0x085B, LBP_CM },
{ 0x085E, 0x08AC, LBP_AL }, { 0x085E, 0x08B2, LBP_AL },
{ 0x08E4, 0x0903, LBP_CM }, { 0x08E4, 0x0903, LBP_CM },
{ 0x0904, 0x0939, LBP_AL }, { 0x0904, 0x0939, LBP_AL },
{ 0x093A, 0x093C, LBP_CM }, { 0x093A, 0x093C, LBP_CM },
@ -171,7 +171,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x0962, 0x0963, LBP_CM }, { 0x0962, 0x0963, LBP_CM },
{ 0x0964, 0x0965, LBP_BA }, { 0x0964, 0x0965, LBP_BA },
{ 0x0966, 0x096F, LBP_NU }, { 0x0966, 0x096F, LBP_NU },
{ 0x0970, 0x097F, LBP_AL }, { 0x0970, 0x0980, LBP_AL },
{ 0x0981, 0x0983, LBP_CM }, { 0x0981, 0x0983, LBP_CM },
{ 0x0985, 0x09B9, LBP_AL }, { 0x0985, 0x09B9, LBP_AL },
{ 0x09BC, 0x09BC, LBP_CM }, { 0x09BC, 0x09BC, LBP_CM },
@ -223,14 +223,14 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x0BF0, 0x0BF8, LBP_AL }, { 0x0BF0, 0x0BF8, LBP_AL },
{ 0x0BF9, 0x0BF9, LBP_PR }, { 0x0BF9, 0x0BF9, LBP_PR },
{ 0x0BFA, 0x0BFA, LBP_AL }, { 0x0BFA, 0x0BFA, LBP_AL },
{ 0x0C01, 0x0C03, LBP_CM }, { 0x0C00, 0x0C03, LBP_CM },
{ 0x0C05, 0x0C3D, LBP_AL }, { 0x0C05, 0x0C3D, LBP_AL },
{ 0x0C3E, 0x0C56, LBP_CM }, { 0x0C3E, 0x0C56, LBP_CM },
{ 0x0C58, 0x0C61, LBP_AL }, { 0x0C58, 0x0C61, LBP_AL },
{ 0x0C62, 0x0C63, LBP_CM }, { 0x0C62, 0x0C63, LBP_CM },
{ 0x0C66, 0x0C6F, LBP_NU }, { 0x0C66, 0x0C6F, LBP_NU },
{ 0x0C78, 0x0C7F, LBP_AL }, { 0x0C78, 0x0C7F, LBP_AL },
{ 0x0C82, 0x0C83, LBP_CM }, { 0x0C81, 0x0C83, LBP_CM },
{ 0x0C85, 0x0CB9, LBP_AL }, { 0x0C85, 0x0CB9, LBP_AL },
{ 0x0CBC, 0x0CBC, LBP_CM }, { 0x0CBC, 0x0CBC, LBP_CM },
{ 0x0CBD, 0x0CBD, LBP_AL }, { 0x0CBD, 0x0CBD, LBP_AL },
@ -239,7 +239,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x0CE2, 0x0CE3, LBP_CM }, { 0x0CE2, 0x0CE3, LBP_CM },
{ 0x0CE6, 0x0CEF, LBP_NU }, { 0x0CE6, 0x0CEF, LBP_NU },
{ 0x0CF1, 0x0CF2, LBP_AL }, { 0x0CF1, 0x0CF2, LBP_AL },
{ 0x0D02, 0x0D03, LBP_CM }, { 0x0D01, 0x0D03, LBP_CM },
{ 0x0D05, 0x0D3D, LBP_AL }, { 0x0D05, 0x0D3D, LBP_AL },
{ 0x0D3E, 0x0D4D, LBP_CM }, { 0x0D3E, 0x0D4D, LBP_CM },
{ 0x0D4E, 0x0D4E, LBP_AL }, { 0x0D4E, 0x0D4E, LBP_AL },
@ -252,7 +252,9 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x0D7A, 0x0D7F, LBP_AL }, { 0x0D7A, 0x0D7F, LBP_AL },
{ 0x0D82, 0x0D83, LBP_CM }, { 0x0D82, 0x0D83, LBP_CM },
{ 0x0D85, 0x0DC6, LBP_AL }, { 0x0D85, 0x0DC6, LBP_AL },
{ 0x0DCA, 0x0DF3, LBP_CM }, { 0x0DCA, 0x0DDF, LBP_CM },
{ 0x0DE6, 0x0DEF, LBP_NU },
{ 0x0DF2, 0x0DF3, LBP_CM },
{ 0x0DF4, 0x0DF4, LBP_AL }, { 0x0DF4, 0x0DF4, LBP_AL },
{ 0x0E01, 0x0E3A, LBP_SA }, { 0x0E01, 0x0E3A, LBP_SA },
{ 0x0E3F, 0x0E3F, LBP_PR }, { 0x0E3F, 0x0E3F, LBP_PR },
@ -363,7 +365,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x1810, 0x1819, LBP_NU }, { 0x1810, 0x1819, LBP_NU },
{ 0x1820, 0x18A8, LBP_AL }, { 0x1820, 0x18A8, LBP_AL },
{ 0x18A9, 0x18A9, LBP_CM }, { 0x18A9, 0x18A9, LBP_CM },
{ 0x18AA, 0x191C, LBP_AL }, { 0x18AA, 0x191E, LBP_AL },
{ 0x1920, 0x193B, LBP_CM }, { 0x1920, 0x193B, LBP_CM },
{ 0x1940, 0x1940, LBP_AL }, { 0x1940, 0x1940, LBP_AL },
{ 0x1944, 0x1945, LBP_EX }, { 0x1944, 0x1945, LBP_EX },
@ -378,7 +380,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x1A7F, 0x1A7F, LBP_CM }, { 0x1A7F, 0x1A7F, LBP_CM },
{ 0x1A80, 0x1A99, LBP_NU }, { 0x1A80, 0x1A99, LBP_NU },
{ 0x1AA0, 0x1AAD, LBP_SA }, { 0x1AA0, 0x1AAD, LBP_SA },
{ 0x1B00, 0x1B04, LBP_CM }, { 0x1AB0, 0x1B04, LBP_CM },
{ 0x1B05, 0x1B33, LBP_AL }, { 0x1B05, 0x1B33, LBP_AL },
{ 0x1B34, 0x1B44, LBP_CM }, { 0x1B34, 0x1B44, LBP_CM },
{ 0x1B45, 0x1B4B, LBP_AL }, { 0x1B45, 0x1B4B, LBP_AL },
@ -412,7 +414,9 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x1CED, 0x1CED, LBP_CM }, { 0x1CED, 0x1CED, LBP_CM },
{ 0x1CEE, 0x1CF1, LBP_AL }, { 0x1CEE, 0x1CF1, LBP_AL },
{ 0x1CF2, 0x1CF4, LBP_CM }, { 0x1CF2, 0x1CF4, LBP_CM },
{ 0x1CF5, 0x1DBF, LBP_AL }, { 0x1CF5, 0x1CF6, LBP_AL },
{ 0x1CF8, 0x1CF9, LBP_CM },
{ 0x1D00, 0x1DBF, LBP_AL },
{ 0x1DC0, 0x1DFF, LBP_CM }, { 0x1DC0, 0x1DFF, LBP_CM },
{ 0x1E00, 0x1FFC, LBP_AL }, { 0x1E00, 0x1FFC, LBP_AL },
{ 0x1FFD, 0x1FFD, LBP_BB }, { 0x1FFD, 0x1FFD, LBP_BB },
@ -475,7 +479,9 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x20A7, 0x20A7, LBP_PO }, { 0x20A7, 0x20A7, LBP_PO },
{ 0x20A8, 0x20B5, LBP_PR }, { 0x20A8, 0x20B5, LBP_PR },
{ 0x20B6, 0x20B6, LBP_PO }, { 0x20B6, 0x20B6, LBP_PO },
{ 0x20B7, 0x20CF, LBP_PR }, { 0x20B7, 0x20BA, LBP_PR },
{ 0x20BB, 0x20BB, LBP_PO },
{ 0x20BC, 0x20CF, LBP_PR },
{ 0x20D0, 0x20F0, LBP_CM }, { 0x20D0, 0x20F0, LBP_CM },
{ 0x2100, 0x2102, LBP_AL }, { 0x2100, 0x2102, LBP_AL },
{ 0x2103, 0x2103, LBP_PO }, { 0x2103, 0x2103, LBP_PO },
@ -564,7 +570,12 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x22A5, 0x22A5, LBP_AI }, { 0x22A5, 0x22A5, LBP_AI },
{ 0x22A6, 0x22BE, LBP_AL }, { 0x22A6, 0x22BE, LBP_AL },
{ 0x22BF, 0x22BF, LBP_AI }, { 0x22BF, 0x22BF, LBP_AI },
{ 0x22C0, 0x2311, LBP_AL }, { 0x22C0, 0x2307, LBP_AL },
{ 0x2308, 0x2308, LBP_OP },
{ 0x2309, 0x2309, LBP_CL },
{ 0x230A, 0x230A, LBP_OP },
{ 0x230B, 0x230B, LBP_CL },
{ 0x230C, 0x2311, LBP_AL },
{ 0x2312, 0x2312, LBP_AI }, { 0x2312, 0x2312, LBP_AI },
{ 0x2313, 0x2319, LBP_AL }, { 0x2313, 0x2319, LBP_AL },
{ 0x231A, 0x231B, LBP_ID }, { 0x231A, 0x231B, LBP_ID },
@ -573,7 +584,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x232A, 0x232A, LBP_CL }, { 0x232A, 0x232A, LBP_CL },
{ 0x232B, 0x23EF, LBP_AL }, { 0x232B, 0x23EF, LBP_AL },
{ 0x23F0, 0x23F3, LBP_ID }, { 0x23F0, 0x23F3, LBP_ID },
{ 0x2400, 0x244A, LBP_AL }, { 0x23F4, 0x244A, LBP_AL },
{ 0x2460, 0x24FE, LBP_AI }, { 0x2460, 0x24FE, LBP_AI },
{ 0x24FF, 0x24FF, LBP_AL }, { 0x24FF, 0x24FF, LBP_AL },
{ 0x2500, 0x254B, LBP_AI }, { 0x2500, 0x254B, LBP_AI },
@ -671,8 +682,8 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x270E, 0x2756, LBP_AL }, { 0x270E, 0x2756, LBP_AL },
{ 0x2757, 0x2757, LBP_AI }, { 0x2757, 0x2757, LBP_AI },
{ 0x2758, 0x275A, LBP_AL }, { 0x2758, 0x275A, LBP_AL },
{ 0x275B, 0x275E, LBP_QU }, { 0x275B, 0x2760, LBP_QU },
{ 0x275F, 0x2761, LBP_AL }, { 0x2761, 0x2761, LBP_AL },
{ 0x2762, 0x2763, LBP_EX }, { 0x2762, 0x2763, LBP_EX },
{ 0x2764, 0x2767, LBP_AL }, { 0x2764, 0x2767, LBP_AL },
{ 0x2768, 0x2768, LBP_OP }, { 0x2768, 0x2768, LBP_OP },
@ -737,7 +748,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x29FD, 0x29FD, LBP_CL }, { 0x29FD, 0x29FD, LBP_CL },
{ 0x29FE, 0x2B54, LBP_AL }, { 0x29FE, 0x2B54, LBP_AL },
{ 0x2B55, 0x2B59, LBP_AI }, { 0x2B55, 0x2B59, LBP_AI },
{ 0x2C00, 0x2CEE, LBP_AL }, { 0x2B5A, 0x2CEE, LBP_AL },
{ 0x2CEF, 0x2CF1, LBP_CM }, { 0x2CEF, 0x2CF1, LBP_CM },
{ 0x2CF2, 0x2CF3, LBP_AL }, { 0x2CF2, 0x2CF3, LBP_AL },
{ 0x2CF9, 0x2CF9, LBP_EX }, { 0x2CF9, 0x2CF9, LBP_EX },
@ -776,6 +787,10 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x2E33, 0x2E34, LBP_BA }, { 0x2E33, 0x2E34, LBP_BA },
{ 0x2E35, 0x2E39, LBP_AL }, { 0x2E35, 0x2E39, LBP_AL },
{ 0x2E3A, 0x2E3B, LBP_B2 }, { 0x2E3A, 0x2E3B, LBP_B2 },
{ 0x2E3C, 0x2E3E, LBP_BA },
{ 0x2E3F, 0x2E3F, LBP_AL },
{ 0x2E40, 0x2E41, LBP_BA },
{ 0x2E42, 0x2E42, LBP_OP },
{ 0x2E80, 0x2FFB, LBP_ID }, { 0x2E80, 0x2FFB, LBP_ID },
{ 0x3000, 0x3000, LBP_BA }, { 0x3000, 0x3000, LBP_BA },
{ 0x3001, 0x3002, LBP_CL }, { 0x3001, 0x3002, LBP_CL },
@ -882,7 +897,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0xA66F, 0xA672, LBP_CM }, { 0xA66F, 0xA672, LBP_CM },
{ 0xA673, 0xA673, LBP_AL }, { 0xA673, 0xA673, LBP_AL },
{ 0xA674, 0xA67D, LBP_CM }, { 0xA674, 0xA67D, LBP_CM },
{ 0xA67E, 0xA697, LBP_AL }, { 0xA67E, 0xA69D, LBP_AL },
{ 0xA69F, 0xA69F, LBP_CM }, { 0xA69F, 0xA69F, LBP_CM },
{ 0xA6A0, 0xA6EF, LBP_AL }, { 0xA6A0, 0xA6EF, LBP_AL },
{ 0xA6F0, 0xA6F1, LBP_CM }, { 0xA6F0, 0xA6F1, LBP_CM },
@ -923,7 +938,11 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0xA9C7, 0xA9C9, LBP_BA }, { 0xA9C7, 0xA9C9, LBP_BA },
{ 0xA9CA, 0xA9CF, LBP_AL }, { 0xA9CA, 0xA9CF, LBP_AL },
{ 0xA9D0, 0xA9D9, LBP_NU }, { 0xA9D0, 0xA9D9, LBP_NU },
{ 0xA9DE, 0xAA28, LBP_AL }, { 0xA9DE, 0xA9DF, LBP_AL },
{ 0xA9E0, 0xA9EF, LBP_SA },
{ 0xA9F0, 0xA9F9, LBP_NU },
{ 0xA9FA, 0xA9FE, LBP_SA },
{ 0xAA00, 0xAA28, LBP_AL },
{ 0xAA29, 0xAA36, LBP_CM }, { 0xAA29, 0xAA36, LBP_CM },
{ 0xAA40, 0xAA42, LBP_AL }, { 0xAA40, 0xAA42, LBP_AL },
{ 0xAA43, 0xAA43, LBP_CM }, { 0xAA43, 0xAA43, LBP_CM },
@ -1753,8 +1772,8 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0xFB29, 0xFB29, LBP_AL }, { 0xFB29, 0xFB29, LBP_AL },
{ 0xFB2A, 0xFB4F, LBP_HL }, { 0xFB2A, 0xFB4F, LBP_HL },
{ 0xFB50, 0xFD3D, LBP_AL }, { 0xFB50, 0xFD3D, LBP_AL },
{ 0xFD3E, 0xFD3E, LBP_OP }, { 0xFD3E, 0xFD3E, LBP_CL },
{ 0xFD3F, 0xFD3F, LBP_CL }, { 0xFD3F, 0xFD3F, LBP_OP },
{ 0xFD50, 0xFDFB, LBP_AL }, { 0xFD50, 0xFDFB, LBP_AL },
{ 0xFDFC, 0xFDFC, LBP_PO }, { 0xFDFC, 0xFDFC, LBP_PO },
{ 0xFDFD, 0xFDFD, LBP_AL }, { 0xFDFD, 0xFDFD, LBP_AL },
@ -1766,7 +1785,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0xFE17, 0xFE17, LBP_OP }, { 0xFE17, 0xFE17, LBP_OP },
{ 0xFE18, 0xFE18, LBP_CL }, { 0xFE18, 0xFE18, LBP_CL },
{ 0xFE19, 0xFE19, LBP_IN }, { 0xFE19, 0xFE19, LBP_IN },
{ 0xFE20, 0xFE26, LBP_CM }, { 0xFE20, 0xFE2D, LBP_CM },
{ 0xFE30, 0xFE34, LBP_ID }, { 0xFE30, 0xFE34, LBP_ID },
{ 0xFE35, 0xFE35, LBP_OP }, { 0xFE35, 0xFE35, LBP_OP },
{ 0xFE36, 0xFE36, LBP_CL }, { 0xFE36, 0xFE36, LBP_CL },
@ -1852,13 +1871,17 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x10100, 0x10102, LBP_BA }, { 0x10100, 0x10102, LBP_BA },
{ 0x10107, 0x101FC, LBP_AL }, { 0x10107, 0x101FC, LBP_AL },
{ 0x101FD, 0x101FD, LBP_CM }, { 0x101FD, 0x101FD, LBP_CM },
{ 0x10280, 0x1039D, LBP_AL }, { 0x10280, 0x102D0, LBP_AL },
{ 0x102E0, 0x102E0, LBP_CM },
{ 0x102E1, 0x10375, LBP_AL },
{ 0x10376, 0x1037A, LBP_CM },
{ 0x10380, 0x1039D, LBP_AL },
{ 0x1039F, 0x1039F, LBP_BA }, { 0x1039F, 0x1039F, LBP_BA },
{ 0x103A0, 0x103CF, LBP_AL }, { 0x103A0, 0x103CF, LBP_AL },
{ 0x103D0, 0x103D0, LBP_BA }, { 0x103D0, 0x103D0, LBP_BA },
{ 0x103D1, 0x1049D, LBP_AL }, { 0x103D1, 0x1049D, LBP_AL },
{ 0x104A0, 0x104A9, LBP_NU }, { 0x104A0, 0x104A9, LBP_NU },
{ 0x10800, 0x10855, LBP_AL }, { 0x10500, 0x10855, LBP_AL },
{ 0x10857, 0x10857, LBP_BA }, { 0x10857, 0x10857, LBP_BA },
{ 0x10858, 0x1091B, LBP_AL }, { 0x10858, 0x1091B, LBP_AL },
{ 0x1091F, 0x1091F, LBP_BA }, { 0x1091F, 0x1091F, LBP_BA },
@ -1868,7 +1891,12 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x10A38, 0x10A3F, LBP_CM }, { 0x10A38, 0x10A3F, LBP_CM },
{ 0x10A40, 0x10A47, LBP_AL }, { 0x10A40, 0x10A47, LBP_AL },
{ 0x10A50, 0x10A57, LBP_BA }, { 0x10A50, 0x10A57, LBP_BA },
{ 0x10A58, 0x10B35, LBP_AL }, { 0x10A58, 0x10AE4, LBP_AL },
{ 0x10AE5, 0x10AE6, LBP_CM },
{ 0x10AEB, 0x10AEF, LBP_AL },
{ 0x10AF0, 0x10AF5, LBP_BA },
{ 0x10AF6, 0x10AF6, LBP_IN },
{ 0x10B00, 0x10B35, LBP_AL },
{ 0x10B39, 0x10B3F, LBP_BA }, { 0x10B39, 0x10B3F, LBP_BA },
{ 0x10B40, 0x10E7E, LBP_AL }, { 0x10B40, 0x10E7E, LBP_AL },
{ 0x11000, 0x11002, LBP_CM }, { 0x11000, 0x11002, LBP_CM },
@ -1877,7 +1905,7 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x11047, 0x11048, LBP_BA }, { 0x11047, 0x11048, LBP_BA },
{ 0x11049, 0x11065, LBP_AL }, { 0x11049, 0x11065, LBP_AL },
{ 0x11066, 0x1106F, LBP_NU }, { 0x11066, 0x1106F, LBP_NU },
{ 0x11080, 0x11082, LBP_CM }, { 0x1107F, 0x11082, LBP_CM },
{ 0x11083, 0x110AF, LBP_AL }, { 0x11083, 0x110AF, LBP_AL },
{ 0x110B0, 0x110BA, LBP_CM }, { 0x110B0, 0x110BA, LBP_CM },
{ 0x110BB, 0x110BD, LBP_AL }, { 0x110BB, 0x110BD, LBP_AL },
@ -1889,6 +1917,11 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x11127, 0x11134, LBP_CM }, { 0x11127, 0x11134, LBP_CM },
{ 0x11136, 0x1113F, LBP_NU }, { 0x11136, 0x1113F, LBP_NU },
{ 0x11140, 0x11143, LBP_BA }, { 0x11140, 0x11143, LBP_BA },
{ 0x11150, 0x11172, LBP_AL },
{ 0x11173, 0x11173, LBP_CM },
{ 0x11174, 0x11174, LBP_AL },
{ 0x11175, 0x11175, LBP_BB },
{ 0x11176, 0x11176, LBP_AL },
{ 0x11180, 0x11182, LBP_CM }, { 0x11180, 0x11182, LBP_CM },
{ 0x11183, 0x111B2, LBP_AL }, { 0x11183, 0x111B2, LBP_AL },
{ 0x111B3, 0x111C0, LBP_CM }, { 0x111B3, 0x111C0, LBP_CM },
@ -1896,12 +1929,46 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x111C5, 0x111C6, LBP_BA }, { 0x111C5, 0x111C6, LBP_BA },
{ 0x111C7, 0x111C7, LBP_AL }, { 0x111C7, 0x111C7, LBP_AL },
{ 0x111C8, 0x111C8, LBP_BA }, { 0x111C8, 0x111C8, LBP_BA },
{ 0x111CD, 0x111CD, LBP_AL },
{ 0x111D0, 0x111D9, LBP_NU }, { 0x111D0, 0x111D9, LBP_NU },
{ 0x111DA, 0x1122B, LBP_AL },
{ 0x1122C, 0x11237, LBP_CM },
{ 0x11238, 0x11239, LBP_BA },
{ 0x1123A, 0x1123A, LBP_AL },
{ 0x1123B, 0x1123C, LBP_BA },
{ 0x1123D, 0x112DE, LBP_AL },
{ 0x112DF, 0x112EA, LBP_CM },
{ 0x112F0, 0x112F9, LBP_NU },
{ 0x11301, 0x11303, LBP_CM },
{ 0x11305, 0x11339, LBP_AL },
{ 0x1133C, 0x1133C, LBP_CM },
{ 0x1133D, 0x1133D, LBP_AL },
{ 0x1133E, 0x11357, LBP_CM },
{ 0x1135D, 0x11361, LBP_AL },
{ 0x11362, 0x11374, LBP_CM },
{ 0x11480, 0x114AF, LBP_AL },
{ 0x114B0, 0x114C3, LBP_CM },
{ 0x114C4, 0x114C7, LBP_AL },
{ 0x114D0, 0x114D9, LBP_NU },
{ 0x11580, 0x115AE, LBP_AL },
{ 0x115AF, 0x115C0, LBP_CM },
{ 0x115C1, 0x115C1, LBP_BB },
{ 0x115C2, 0x115C3, LBP_BA },
{ 0x115C4, 0x115C5, LBP_EX },
{ 0x115C6, 0x115C8, LBP_AL },
{ 0x115C9, 0x115C9, LBP_BA },
{ 0x11600, 0x1162F, LBP_AL },
{ 0x11630, 0x11640, LBP_CM },
{ 0x11641, 0x11642, LBP_BA },
{ 0x11643, 0x11644, LBP_AL },
{ 0x11650, 0x11659, LBP_NU },
{ 0x11680, 0x116AA, LBP_AL }, { 0x11680, 0x116AA, LBP_AL },
{ 0x116AB, 0x116B7, LBP_CM }, { 0x116AB, 0x116B7, LBP_CM },
{ 0x116C0, 0x116C9, LBP_NU }, { 0x116C0, 0x116C9, LBP_NU },
{ 0x12000, 0x12462, LBP_AL }, { 0x118A0, 0x118DF, LBP_AL },
{ 0x12470, 0x12473, LBP_BA }, { 0x118E0, 0x118E9, LBP_NU },
{ 0x118EA, 0x1246E, LBP_AL },
{ 0x12470, 0x12474, LBP_BA },
{ 0x13000, 0x13257, LBP_AL }, { 0x13000, 0x13257, LBP_AL },
{ 0x13258, 0x1325A, LBP_OP }, { 0x13258, 0x1325A, LBP_OP },
{ 0x1325B, 0x1325D, LBP_CL }, { 0x1325B, 0x1325D, LBP_CL },
@ -1915,10 +1982,27 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x1328A, 0x13378, LBP_AL }, { 0x1328A, 0x13378, LBP_AL },
{ 0x13379, 0x13379, LBP_OP }, { 0x13379, 0x13379, LBP_OP },
{ 0x1337A, 0x1337B, LBP_CL }, { 0x1337A, 0x1337B, LBP_CL },
{ 0x1337C, 0x16F50, LBP_AL }, { 0x1337C, 0x16A5E, LBP_AL },
{ 0x16A60, 0x16A69, LBP_NU },
{ 0x16A6E, 0x16A6F, LBP_BA },
{ 0x16AD0, 0x16AED, LBP_AL },
{ 0x16AF0, 0x16AF4, LBP_CM },
{ 0x16AF5, 0x16AF5, LBP_BA },
{ 0x16B00, 0x16B2F, LBP_AL },
{ 0x16B30, 0x16B36, LBP_CM },
{ 0x16B37, 0x16B39, LBP_BA },
{ 0x16B3A, 0x16B43, LBP_AL },
{ 0x16B44, 0x16B44, LBP_BA },
{ 0x16B45, 0x16B45, LBP_AL },
{ 0x16B50, 0x16B59, LBP_NU },
{ 0x16B5B, 0x16F50, LBP_AL },
{ 0x16F51, 0x16F92, LBP_CM }, { 0x16F51, 0x16F92, LBP_CM },
{ 0x16F93, 0x16F9F, LBP_AL }, { 0x16F93, 0x16F9F, LBP_AL },
{ 0x1B000, 0x1B001, LBP_ID }, { 0x1B000, 0x1B001, LBP_ID },
{ 0x1BC00, 0x1BC9C, LBP_AL },
{ 0x1BC9D, 0x1BC9E, LBP_CM },
{ 0x1BC9F, 0x1BC9F, LBP_BA },
{ 0x1BCA0, 0x1BCA3, LBP_CM },
{ 0x1D000, 0x1D164, LBP_AL }, { 0x1D000, 0x1D164, LBP_AL },
{ 0x1D165, 0x1D169, LBP_CM }, { 0x1D165, 0x1D169, LBP_CM },
{ 0x1D16A, 0x1D16C, LBP_AL }, { 0x1D16A, 0x1D16C, LBP_AL },
@ -1931,15 +2015,19 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x1D242, 0x1D244, LBP_CM }, { 0x1D242, 0x1D244, LBP_CM },
{ 0x1D245, 0x1D7CB, LBP_AL }, { 0x1D245, 0x1D7CB, LBP_AL },
{ 0x1D7CE, 0x1D7FF, LBP_NU }, { 0x1D7CE, 0x1D7FF, LBP_NU },
{ 0x1E800, 0x1E8CF, LBP_AL },
{ 0x1E8D0, 0x1E8D6, LBP_CM },
{ 0x1EE00, 0x1EEF1, LBP_AL }, { 0x1EE00, 0x1EEF1, LBP_AL },
{ 0x1F000, 0x1F0DF, LBP_ID }, { 0x1F000, 0x1F0F5, LBP_ID },
{ 0x1F100, 0x1F12D, LBP_AI }, { 0x1F100, 0x1F12D, LBP_AI },
{ 0x1F12E, 0x1F12E, LBP_AL }, { 0x1F12E, 0x1F12E, LBP_AL },
{ 0x1F130, 0x1F169, LBP_AI }, { 0x1F130, 0x1F169, LBP_AI },
{ 0x1F16A, 0x1F16B, LBP_AL }, { 0x1F16A, 0x1F16B, LBP_AL },
{ 0x1F170, 0x1F19A, LBP_AI }, { 0x1F170, 0x1F19A, LBP_AI },
{ 0x1F1E6, 0x1F1FF, LBP_RI }, { 0x1F1E6, 0x1F1FF, LBP_RI },
{ 0x1F200, 0x1F3B4, LBP_ID }, { 0x1F200, 0x1F39B, LBP_ID },
{ 0x1F39C, 0x1F39D, LBP_AL },
{ 0x1F39E, 0x1F3B4, LBP_ID },
{ 0x1F3B5, 0x1F3B6, LBP_AL }, { 0x1F3B5, 0x1F3B6, LBP_AL },
{ 0x1F3B7, 0x1F3BB, LBP_ID }, { 0x1F3B7, 0x1F3BB, LBP_ID },
{ 0x1F3BC, 0x1F3BC, LBP_AL }, { 0x1F3BC, 0x1F3BC, LBP_AL },
@ -1953,14 +2041,23 @@ struct LineBreakProperties lb_prop_default[] = {
{ 0x1F4AF, 0x1F4AF, LBP_AL }, { 0x1F4AF, 0x1F4AF, LBP_AL },
{ 0x1F4B0, 0x1F4B0, LBP_ID }, { 0x1F4B0, 0x1F4B0, LBP_ID },
{ 0x1F4B1, 0x1F4B2, LBP_AL }, { 0x1F4B1, 0x1F4B2, LBP_AL },
{ 0x1F4B3, 0x1F4FC, LBP_ID }, { 0x1F4B3, 0x1F4FE, LBP_ID },
{ 0x1F500, 0x1F506, LBP_AL }, { 0x1F500, 0x1F506, LBP_AL },
{ 0x1F507, 0x1F516, LBP_ID }, { 0x1F507, 0x1F516, LBP_ID },
{ 0x1F517, 0x1F524, LBP_AL }, { 0x1F517, 0x1F524, LBP_AL },
{ 0x1F525, 0x1F531, LBP_ID }, { 0x1F525, 0x1F531, LBP_ID },
{ 0x1F532, 0x1F543, LBP_AL }, { 0x1F532, 0x1F549, LBP_AL },
{ 0x1F550, 0x1F6C5, LBP_ID }, { 0x1F54A, 0x1F5D3, LBP_ID },
{ 0x1F700, 0x1F773, LBP_AL }, { 0x1F5D4, 0x1F5DB, LBP_AL },
{ 0x1F5DC, 0x1F5F3, LBP_ID },
{ 0x1F5F4, 0x1F5F9, LBP_AL },
{ 0x1F5FA, 0x1F64F, LBP_ID },
{ 0x1F650, 0x1F675, LBP_AL },
{ 0x1F676, 0x1F678, LBP_QU },
{ 0x1F679, 0x1F67B, LBP_NS },
{ 0x1F67C, 0x1F67F, LBP_AL },
{ 0x1F680, 0x1F6F3, LBP_ID },
{ 0x1F700, 0x1F8AD, LBP_AL },
{ 0x20000, 0x3FFFD, LBP_ID }, { 0x20000, 0x3FFFD, LBP_ID },
{ 0xE0001, 0xE01EF, LBP_CM }, { 0xE0001, 0xE01EF, LBP_CM },
{ 0xF0000, 0x10FFFD, LBP_XX }, { 0xF0000, 0x10FFFD, LBP_XX },

View File

@ -4,7 +4,7 @@
* Line breaking in a Unicode sequence. Designed to be used in a * Line breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com> * Copyright (C) 2008-2015 Wu Yongwei <wuyongwei at gmail dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages * warranty. In no event will the author be held liable for any damages
@ -30,9 +30,9 @@
* Unicode 5.0.0: * Unicode 5.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-19.html> * <URL:http://www.unicode.org/reports/tr14/tr14-19.html>
* *
* This library has been updated according to Revision 30, for * This library has been updated according to Revision 33, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-30.html> * <URL:http://www.unicode.org/reports/tr14/tr14-33.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>

View File

@ -4,7 +4,7 @@
* Line breaking in a Unicode sequence. Designed to be used in a * Line breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2008-2013 Wu Yongwei <wuyongwei at gmail dot com> * Copyright (C) 2008-2015 Wu Yongwei <wuyongwei at gmail dot com>
* Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com> * Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
@ -31,9 +31,9 @@
* Unicode 5.0.0: * Unicode 5.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-19.html> * <URL:http://www.unicode.org/reports/tr14/tr14-19.html>
* *
* This library has been updated according to Revision 30, for * This library has been updated according to Revision 33, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-30.html> * <URL:http://www.unicode.org/reports/tr14/tr14-33.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>
@ -45,16 +45,12 @@
* Definitions of internal data structures, declarations of global * Definitions of internal data structures, declarations of global
* variables, and function prototypes for the line breaking algorithm. * variables, and function prototypes for the line breaking algorithm.
* *
* @version 2.4, 2013/11/10 * @version 2.6, 2015/04/18
* @author Wu Yongwei * @author Wu Yongwei
* @author Petr Filipsky * @author Petr Filipsky
*/ */
/** #include "unibreakdef.h"
* Constant value to mark the end of string. It is not a valid Unicode
* character.
*/
#define EOS 0xFFFFFFFF
/** /**
* Line break classes. This is a direct mapping of Table 1 of Unicode * Line break classes. This is a direct mapping of Table 1 of Unicode
@ -143,28 +139,20 @@ struct LineBreakContext
enum LineBreakClass lbcCur; /**< Breaking class of current codepoint */ enum LineBreakClass lbcCur; /**< Breaking class of current codepoint */
enum LineBreakClass lbcNew; /**< Breaking class of next codepoint */ enum LineBreakClass lbcNew; /**< Breaking class of next codepoint */
enum LineBreakClass lbcLast; /**< Breaking class of last codepoint */ enum LineBreakClass lbcLast; /**< Breaking class of last codepoint */
int fLb21aHebrew; /**< Flag for Hebrew letters (LB21a) */
}; };
/**
* Abstract function interface for #lb_get_next_char_utf8,
* #lb_get_next_char_utf16, and #lb_get_next_char_utf32.
*/
typedef utf32_t (*get_next_char_t)(const void *, size_t, size_t *);
/* Declarations */ /* Declarations */
extern struct LineBreakProperties lb_prop_default[]; extern struct LineBreakProperties lb_prop_default[];
extern struct LineBreakPropertiesLang lb_prop_lang_map[]; extern struct LineBreakPropertiesLang lb_prop_lang_map[];
/* Function Prototype */ /* Function Prototype */
utf32_t lb_get_next_char_utf8(const utf8_t *s, size_t len, size_t *ip);
utf32_t lb_get_next_char_utf16(const utf16_t *s, size_t len, size_t *ip);
utf32_t lb_get_next_char_utf32(const utf32_t *s, size_t len, size_t *ip);
void lb_init_break_context( void lb_init_break_context(
struct LineBreakContext* lbpCtx, struct LineBreakContext *lbpCtx,
utf32_t ch, utf32_t ch,
const char* lang); const char *lang);
int lb_process_next_char( int lb_process_next_char(
struct LineBreakContext* lbpCtx, struct LineBreakContext *lbpCtx,
utf32_t ch); utf32_t ch);
void set_linebreaks( void set_linebreaks(
const void *s, const void *s,

View File

@ -0,0 +1,41 @@
/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */
/*
* Break processing in a Unicode sequence. Designed to be used in a
* generic text renderer.
*
* Copyright (C) 2015 Wu Yongwei <wuyongwei at gmail dot com>
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute
* it freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must
* not claim that you wrote the original software. If you use this
* software in a product, an acknowledgement in the product
* documentation would be appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must
* not be misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source
* distribution.
*/
/**
* @file unibreakbase.c
*
* Definition of basic libunibreak information.
*
* @version 1.0, 2015/04/18
* @author Wu Yongwei
*/
#include "unibreakbase.h"
/**
* Version number of the library.
*/
const int unibreak_version = UNIBREAK_VERSION;

View File

@ -0,0 +1,73 @@
/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */
/*
* Break processing in a Unicode sequence. Designed to be used in a
* generic text renderer.
*
* Copyright (C) 2015 Wu Yongwei <wuyongwei at gmail dot com>
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute
* it freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must
* not claim that you wrote the original software. If you use this
* software in a product, an acknowledgement in the product
* documentation would be appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must
* not be misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source
* distribution.
*
* The main reference is Unicode Standard Annex 14 (UAX #14):
* <URL:http://www.unicode.org/reports/tr14/>
*
* When this library was designed, this annex was at Revision 19, for
* Unicode 5.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-19.html>
*
* This library has been updated according to Revision 33, for
* Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-33.html>
*
* The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html>
*/
/**
* @file unibreakbase.h
*
* Header file for common definitions in the libunibreak library.
*
* @version 1.0, 2015/04/18
* @author Wu Yongwei
*/
#ifndef UNIBREAKBASE_H
#define UNIBREAKBASE_H
#include <stddef.h>
#ifdef __cplusplus
extern "C" {
#endif
#define UNIBREAK_VERSION 0x0300 /**< Version of the library linebreak */
extern const int unibreak_version;
#ifndef UNIBREAK_UTF_TYPES_DEFINED
#define UNIBREAK_UTF_TYPES_DEFINED
typedef unsigned char utf8_t; /**< Type for UTF-8 data points */
typedef unsigned short utf16_t; /**< Type for UTF-16 data points */
typedef unsigned int utf32_t; /**< Type for UTF-32 data points */
#endif
#ifdef __cplusplus
}
#endif
#endif /* UNIBREAKBASE_H */

View File

@ -0,0 +1,159 @@
/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */
/*
* Break processing in a Unicode sequence. Designed to be used in a
* generic text renderer.
*
* Copyright (C) 2015 Wu Yongwei <wuyongwei at gmail dot com>
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute
* it freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must
* not claim that you wrote the original software. If you use this
* software in a product, an acknowledgement in the product
* documentation would be appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must
* not be misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source
* distribution.
*/
/**
* @file unibreakdef.c
*
* Definition of utility functions used by the libunibreak library.
*
* @version 1.0, 2015/04/18
* @author Wu Yongwei
*/
#include <assert.h>
#include <stddef.h>
#include "unibreakdef.h"
/**
* Gets the next Unicode character in a UTF-8 sequence. The index will
* be advanced to the next complete character, unless the end of string
* is reached in the middle of a UTF-8 sequence.
*
* @param[in] s input UTF-8 string
* @param[in] len length of the string in bytes
* @param[in,out] ip pointer to the index
* @return the Unicode character beginning at the index; or
* #EOS if end of input is encountered
*/
utf32_t ub_get_next_char_utf8(
const utf8_t *s,
size_t len,
size_t *ip)
{
utf8_t ch;
utf32_t res;
assert(*ip <= len);
if (*ip == len)
return EOS;
ch = s[*ip];
if (ch < 0xC2 || ch > 0xF4)
{ /* One-byte sequence, tail (should not occur), or invalid */
*ip += 1;
return ch;
}
else if (ch < 0xE0)
{ /* Two-byte sequence */
if (*ip + 2 > len)
return EOS;
res = ((ch & 0x1F) << 6) + (s[*ip + 1] & 0x3F);
*ip += 2;
return res;
}
else if (ch < 0xF0)
{ /* Three-byte sequence */
if (*ip + 3 > len)
return EOS;
res = ((ch & 0x0F) << 12) +
((s[*ip + 1] & 0x3F) << 6) +
((s[*ip + 2] & 0x3F));
*ip += 3;
return res;
}
else
{ /* Four-byte sequence */
if (*ip + 4 > len)
return EOS;
res = ((ch & 0x07) << 18) +
((s[*ip + 1] & 0x3F) << 12) +
((s[*ip + 2] & 0x3F) << 6) +
((s[*ip + 3] & 0x3F));
*ip += 4;
return res;
}
}
/**
* Gets the next Unicode character in a UTF-16 sequence. The index will
* be advanced to the next complete character, unless the end of string
* is reached in the middle of a UTF-16 surrogate pair.
*
* @param[in] s input UTF-16 string
* @param[in] len length of the string in words
* @param[in,out] ip pointer to the index
* @return the Unicode character beginning at the index; or
* #EOS if end of input is encountered
*/
utf32_t ub_get_next_char_utf16(
const utf16_t *s,
size_t len,
size_t *ip)
{
utf16_t ch;
assert(*ip <= len);
if (*ip == len)
return EOS;
ch = s[(*ip)++];
if (ch < 0xD800 || ch > 0xDBFF)
{ /* If the character is not a high surrogate */
return ch;
}
if (*ip == len)
{ /* If the input ends here (an error) */
--(*ip);
return EOS;
}
if (s[*ip] < 0xDC00 || s[*ip] > 0xDFFF)
{ /* If the next character is not the low surrogate (an error) */
return ch;
}
/* Return the constructed character and advance the index again */
return (((utf32_t)ch & 0x3FF) << 10) + (s[(*ip)++] & 0x3FF) + 0x10000;
}
/**
* Gets the next Unicode character in a UTF-32 sequence. The index will
* be advanced to the next character.
*
* @param[in] s input UTF-32 string
* @param[in] len length of the string in dwords
* @param[in,out] ip pointer to the index
* @return the Unicode character beginning at the index; or
* #EOS if end of input is encountered
*/
utf32_t ub_get_next_char_utf32(
const utf32_t *s,
size_t len,
size_t *ip)
{
assert(*ip <= len);
if (*ip == len)
return EOS;
return s[(*ip)++];
}

View File

@ -0,0 +1,80 @@
/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */
/*
* Break processing in a Unicode sequence. Designed to be used in a
* generic text renderer.
*
* Copyright (C) 2015 Wu Yongwei <wuyongwei at gmail dot com>
*
* This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages
* arising from the use of this software.
*
* Permission is granted to anyone to use this software for any purpose,
* including commercial applications, and to alter it and redistribute
* it freely, subject to the following restrictions:
*
* 1. The origin of this software must not be misrepresented; you must
* not claim that you wrote the original software. If you use this
* software in a product, an acknowledgement in the product
* documentation would be appreciated but is not required.
* 2. Altered source versions must be plainly marked as such, and must
* not be misrepresented as being the original software.
* 3. This notice may not be removed or altered from any source
* distribution.
*
* The main reference is Unicode Standard Annex 14 (UAX #14):
* <URL:http://www.unicode.org/reports/tr14/>
*
* When this library was designed, this annex was at Revision 19, for
* Unicode 5.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-19.html>
*
* This library has been updated according to Revision 33, for
* Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr14/tr14-33.html>
*
* The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html>
*/
/**
* @file unibreakdef.h
*
* Header file for private definitions in the libunibreak library.
*
* @version 1.1, 2015/04/19
* @author Wu Yongwei
*/
#ifndef UNIBREAKDEF_H
#define UNIBREAKDEF_H
#include "unibreakbase.h"
#ifdef __cplusplus
extern "C" {
#endif
/**
* Constant value to mark the end of string. It is not a valid Unicode
* character.
*/
#define EOS 0xFFFFFFFF
/**
* Abstract function interface for #ub_get_next_char_utf8,
* #ub_get_next_char_utf16, and #ub_get_next_char_utf32.
*/
typedef utf32_t (*get_next_char_t)(const void *, size_t, size_t *);
/* Function Prototype */
utf32_t ub_get_next_char_utf8(const utf8_t *s, size_t len, size_t *ip);
utf32_t ub_get_next_char_utf16(const utf16_t *s, size_t len, size_t *ip);
utf32_t ub_get_next_char_utf32(const utf32_t *s, size_t len, size_t *ip);
#ifdef __cplusplus
}
#endif
#endif /* UNIBREAKDEF_H */

View File

@ -4,7 +4,7 @@
* Word breaking in a Unicode sequence. Designed to be used in a * Word breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2013 Tom Hacohen <tom at stosb dot com> * Copyright (C) 2013-2015 Tom Hacohen <tom at stosb dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages * warranty. In no event will the author be held liable for any damages
@ -30,9 +30,9 @@
* Unicode 6.0.0: * Unicode 6.0.0:
* <URL:http://www.unicode.org/reports/tr29/tr29-17.html> * <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
* *
* This library has been updated according to Revision 21, for * This library has been updated according to Revision 25, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr29/tr29-21.html> * <URL:http://www.unicode.org/reports/tr29/tr29-25.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>
@ -44,16 +44,14 @@
* Implementation of the word breaking algorithm as described in Unicode * Implementation of the word breaking algorithm as described in Unicode
* Standard Annex 29. * Standard Annex 29.
* *
* @version 2.4, 2013/09/28 * @version 2.6, 2015/04/18
* @author Tom Hacohen * @author Tom Hacohen
*/ */
#include <assert.h> #include <assert.h>
#include <stddef.h> #include <stddef.h>
#include <string.h> #include <string.h>
#include "linebreak.h" #include "unibreakdef.h"
#include "linebreakdef.h"
#include "wordbreak.h" #include "wordbreak.h"
#include "wordbreakdata.c" #include "wordbreakdata.c"
@ -128,7 +126,6 @@ static void set_brks_to(
while (posNext < posEnd) while (posNext < posEnd)
{ {
utf32_t ch; utf32_t ch;
(void)ch;
ch = get_next_char(s, len, &posNext); ch = get_next_char(s, len, &posNext);
assert(ch != EOS); assert(ch != EOS);
for (; posStart < posNext - 1; ++posStart) for (; posStart < posNext - 1; ++posStart)
@ -257,8 +254,24 @@ static void set_wordbreaks(
posLast = posCur; posLast = posCur;
break; break;
case WBP_Hebrew_Letter:
case WBP_ALetter: case WBP_ALetter:
if ((wbcSeqStart == WBP_ALetter) || /* WB5,6,7 */ if ((wbcSeqStart == WBP_Hebrew_Letter) &&
(wbcLast == WBP_Double_Quote)) /* WB7b,c */
{
if (wbcCur == WBP_Hebrew_Letter)
{
set_brks_to(s, brks, posLast, posCur, len,
WORDBREAK_NOBREAK, get_next_char);
}
else
{
set_brks_to(s, brks, posLast, posCur, len,
WORDBREAK_BREAK, get_next_char);
}
}
else if (((wbcSeqStart == WBP_ALetter) ||
(wbcSeqStart == WBP_Hebrew_Letter)) || /* WB5,6,7 */
(wbcLast == WBP_Numeric) || /* WB10 */ (wbcLast == WBP_Numeric) || /* WB10 */
(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */ (wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
{ {
@ -275,8 +288,18 @@ static void set_wordbreaks(
posLast = posCur; posLast = posCur;
break; break;
case WBP_Single_Quote:
if (wbcLast == WBP_Hebrew_Letter) /* WB7a */
{
set_brks_to(s, brks, posLast, posCur, len,
WORDBREAK_NOBREAK, get_next_char);
wbcSeqStart = wbcCur;
posLast = posCur;
}
/* No break on purpose */
case WBP_MidNumLet: case WBP_MidNumLet:
if ((wbcLast == WBP_ALetter) || /* WB6,7 */ if (((wbcLast == WBP_ALetter) ||
(wbcLast == WBP_Hebrew_Letter)) || /* WB6,7 */
(wbcLast == WBP_Numeric)) /* WB11,12 */ (wbcLast == WBP_Numeric)) /* WB11,12 */
{ {
/* Go on */ /* Go on */
@ -291,7 +314,8 @@ static void set_wordbreaks(
break; break;
case WBP_MidLetter: case WBP_MidLetter:
if (wbcLast == WBP_ALetter) /* WB6,7 */ if ((wbcLast == WBP_ALetter) ||
(wbcLast == WBP_Hebrew_Letter)) /* WB6,7 */
{ {
/* Go on */ /* Go on */
} }
@ -320,7 +344,8 @@ static void set_wordbreaks(
case WBP_Numeric: case WBP_Numeric:
if ((wbcSeqStart == WBP_Numeric) || /* WB8,11,12 */ if ((wbcSeqStart == WBP_Numeric) || /* WB8,11,12 */
(wbcLast == WBP_ALetter) || /* WB9 */ ((wbcLast == WBP_ALetter) ||
(wbcLast == WBP_Hebrew_Letter)) || /* WB9 */
(wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */ (wbcSeqStart == WBP_ExtendNumLet)) /* WB13b */
{ {
set_brks_to(s, brks, posLast, posCur, len, set_brks_to(s, brks, posLast, posCur, len,
@ -340,6 +365,7 @@ static void set_wordbreaks(
/* WB13a,13b */ /* WB13a,13b */
if ((wbcSeqStart == wbcLast) && if ((wbcSeqStart == wbcLast) &&
((wbcLast == WBP_ALetter) || ((wbcLast == WBP_ALetter) ||
(wbcLast == WBP_Hebrew_Letter) ||
(wbcLast == WBP_Numeric) || (wbcLast == WBP_Numeric) ||
(wbcLast == WBP_Katakana) || (wbcLast == WBP_Katakana) ||
(wbcLast == WBP_ExtendNumLet))) (wbcLast == WBP_ExtendNumLet)))
@ -357,9 +383,9 @@ static void set_wordbreaks(
posLast = posCur; posLast = posCur;
break; break;
case WBP_Regional: case WBP_Regional_Indicator:
/* WB13c */ /* WB13c */
if (wbcSeqStart == WBP_Regional) if (wbcSeqStart == WBP_Regional_Indicator)
{ {
set_brks_to(s, brks, posLast, posCur, len, set_brks_to(s, brks, posLast, posCur, len,
WORDBREAK_NOBREAK, get_next_char); WORDBREAK_NOBREAK, get_next_char);
@ -368,6 +394,20 @@ static void set_wordbreaks(
posLast = posCur; posLast = posCur;
break; break;
case WBP_Double_Quote:
if (wbcLast == WBP_Hebrew_Letter) /* WB7b,c */
{
/* Go on */
}
else
{
set_brks_to(s, brks, posLast, posCur, len,
WORDBREAK_BREAK, get_next_char);
wbcSeqStart = wbcCur;
posLast = posCur;
}
break;
case WBP_Any: case WBP_Any:
/* Allow breaks and reset */ /* Allow breaks and reset */
set_brks_to(s, brks, posLast, posCur, len, set_brks_to(s, brks, posLast, posCur, len,
@ -409,7 +449,7 @@ void set_wordbreaks_utf8(
char *brks) char *brks)
{ {
set_wordbreaks(s, len, lang, brks, set_wordbreaks(s, len, lang, brks,
(get_next_char_t)lb_get_next_char_utf8); (get_next_char_t)ub_get_next_char_utf8);
} }
/** /**
@ -429,7 +469,7 @@ void set_wordbreaks_utf16(
char *brks) char *brks)
{ {
set_wordbreaks(s, len, lang, brks, set_wordbreaks(s, len, lang, brks,
(get_next_char_t)lb_get_next_char_utf16); (get_next_char_t)ub_get_next_char_utf16);
} }
/** /**
@ -449,5 +489,5 @@ void set_wordbreaks_utf32(
char *brks) char *brks)
{ {
set_wordbreaks(s, len, lang, brks, set_wordbreaks(s, len, lang, brks,
(get_next_char_t)lb_get_next_char_utf32); (get_next_char_t)ub_get_next_char_utf32);
} }

View File

@ -4,7 +4,7 @@
* Word breaking in a Unicode sequence. Designed to be used in a * Word breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2013 Tom Hacohen <tom at stosb dot com> * Copyright (C) 2013-2015 Tom Hacohen <tom at stosb dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages * warranty. In no event will the author be held liable for any damages
@ -30,9 +30,9 @@
* Unicode 6.0.0: * Unicode 6.0.0:
* <URL:http://www.unicode.org/reports/tr29/tr29-17.html> * <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
* *
* This library has been updated according to Revision 21, for * This library has been updated according to Revision 25, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr29/tr29-21.html> * <URL:http://www.unicode.org/reports/tr29/tr29-25.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>
@ -43,7 +43,7 @@
* *
* Header file for the word breaking (segmentation) algorithm. * Header file for the word breaking (segmentation) algorithm.
* *
* @version 2.3, 2013/09/28 * @version 2.5, 2015/04/18
* @author Tom Hacohen * @author Tom Hacohen
*/ */
@ -51,7 +51,7 @@
#define WORDBREAK_H #define WORDBREAK_H
#include <stddef.h> #include <stddef.h>
#include "linebreak.h" #include "unibreakbase.h"
#ifdef __cplusplus #ifdef __cplusplus
extern "C" { extern "C" {

View File

@ -1,16 +1,16 @@
/* The content of this file is generated from: /* The content of this file is generated from:
# WordBreakProperty-6.2.0.txt # WordBreakProperty-7.0.0.txt
# Date: 2012-08-13, 19:12:09 GMT [MD] # Date: 2014-02-19, 15:51:39 GMT [MD]
*/ */
#include "linebreak.h"
#include "wordbreakdef.h" #include "wordbreakdef.h"
static struct WordBreakProperties wb_prop_default[] = { static struct WordBreakProperties wb_prop_default[] = {
{0x000A, 0x000A, WBP_LF}, {0x000A, 0x000A, WBP_LF},
{0x000B, 0x000C, WBP_Newline}, {0x000B, 0x000C, WBP_Newline},
{0x000D, 0x000D, WBP_CR}, {0x000D, 0x000D, WBP_CR},
{0x0027, 0x0027, WBP_MidNumLet}, {0x0022, 0x0022, WBP_Double_Quote},
{0x0027, 0x0027, WBP_Single_Quote},
{0x002C, 0x002C, WBP_MidNum}, {0x002C, 0x002C, WBP_MidNum},
{0x002E, 0x002E, WBP_MidNumLet}, {0x002E, 0x002E, WBP_MidNumLet},
{0x0030, 0x0039, WBP_Numeric}, {0x0030, 0x0039, WBP_Numeric},
@ -36,6 +36,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0295, 0x02AF, WBP_ALetter}, {0x0295, 0x02AF, WBP_ALetter},
{0x02B0, 0x02C1, WBP_ALetter}, {0x02B0, 0x02C1, WBP_ALetter},
{0x02C6, 0x02D1, WBP_ALetter}, {0x02C6, 0x02D1, WBP_ALetter},
{0x02D7, 0x02D7, WBP_MidLetter},
{0x02E0, 0x02E4, WBP_ALetter}, {0x02E0, 0x02E4, WBP_ALetter},
{0x02EC, 0x02EC, WBP_ALetter}, {0x02EC, 0x02EC, WBP_ALetter},
{0x02EE, 0x02EE, WBP_ALetter}, {0x02EE, 0x02EE, WBP_ALetter},
@ -46,6 +47,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x037A, 0x037A, WBP_ALetter}, {0x037A, 0x037A, WBP_ALetter},
{0x037B, 0x037D, WBP_ALetter}, {0x037B, 0x037D, WBP_ALetter},
{0x037E, 0x037E, WBP_MidNum}, {0x037E, 0x037E, WBP_MidNum},
{0x037F, 0x037F, WBP_ALetter},
{0x0386, 0x0386, WBP_ALetter}, {0x0386, 0x0386, WBP_ALetter},
{0x0387, 0x0387, WBP_MidLetter}, {0x0387, 0x0387, WBP_MidLetter},
{0x0388, 0x038A, WBP_ALetter}, {0x0388, 0x038A, WBP_ALetter},
@ -55,7 +57,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x03F7, 0x0481, WBP_ALetter}, {0x03F7, 0x0481, WBP_ALetter},
{0x0483, 0x0487, WBP_Extend}, {0x0483, 0x0487, WBP_Extend},
{0x0488, 0x0489, WBP_Extend}, {0x0488, 0x0489, WBP_Extend},
{0x048A, 0x0527, WBP_ALetter}, {0x048A, 0x052F, WBP_ALetter},
{0x0531, 0x0556, WBP_ALetter}, {0x0531, 0x0556, WBP_ALetter},
{0x0559, 0x0559, WBP_ALetter}, {0x0559, 0x0559, WBP_ALetter},
{0x0561, 0x0587, WBP_ALetter}, {0x0561, 0x0587, WBP_ALetter},
@ -65,13 +67,14 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x05C1, 0x05C2, WBP_Extend}, {0x05C1, 0x05C2, WBP_Extend},
{0x05C4, 0x05C5, WBP_Extend}, {0x05C4, 0x05C5, WBP_Extend},
{0x05C7, 0x05C7, WBP_Extend}, {0x05C7, 0x05C7, WBP_Extend},
{0x05D0, 0x05EA, WBP_ALetter}, {0x05D0, 0x05EA, WBP_Hebrew_Letter},
{0x05F0, 0x05F2, WBP_ALetter}, {0x05F0, 0x05F2, WBP_Hebrew_Letter},
{0x05F3, 0x05F3, WBP_ALetter}, {0x05F3, 0x05F3, WBP_ALetter},
{0x05F4, 0x05F4, WBP_MidLetter}, {0x05F4, 0x05F4, WBP_MidLetter},
{0x0600, 0x0604, WBP_Format}, {0x0600, 0x0605, WBP_Format},
{0x060C, 0x060D, WBP_MidNum}, {0x060C, 0x060D, WBP_MidNum},
{0x0610, 0x061A, WBP_Extend}, {0x0610, 0x061A, WBP_Extend},
{0x061C, 0x061C, WBP_Format},
{0x0620, 0x063F, WBP_ALetter}, {0x0620, 0x063F, WBP_ALetter},
{0x0640, 0x0640, WBP_ALetter}, {0x0640, 0x0640, WBP_ALetter},
{0x0641, 0x064A, WBP_ALetter}, {0x0641, 0x064A, WBP_ALetter},
@ -117,10 +120,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0829, 0x082D, WBP_Extend}, {0x0829, 0x082D, WBP_Extend},
{0x0840, 0x0858, WBP_ALetter}, {0x0840, 0x0858, WBP_ALetter},
{0x0859, 0x085B, WBP_Extend}, {0x0859, 0x085B, WBP_Extend},
{0x08A0, 0x08A0, WBP_ALetter}, {0x08A0, 0x08B2, WBP_ALetter},
{0x08A2, 0x08AC, WBP_ALetter}, {0x08E4, 0x0902, WBP_Extend},
{0x08E4, 0x08FE, WBP_Extend},
{0x0900, 0x0902, WBP_Extend},
{0x0903, 0x0903, WBP_Extend}, {0x0903, 0x0903, WBP_Extend},
{0x0904, 0x0939, WBP_ALetter}, {0x0904, 0x0939, WBP_ALetter},
{0x093A, 0x093A, WBP_Extend}, {0x093A, 0x093A, WBP_Extend},
@ -138,8 +139,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0962, 0x0963, WBP_Extend}, {0x0962, 0x0963, WBP_Extend},
{0x0966, 0x096F, WBP_Numeric}, {0x0966, 0x096F, WBP_Numeric},
{0x0971, 0x0971, WBP_ALetter}, {0x0971, 0x0971, WBP_ALetter},
{0x0972, 0x0977, WBP_ALetter}, {0x0972, 0x0980, WBP_ALetter},
{0x0979, 0x097F, WBP_ALetter},
{0x0981, 0x0981, WBP_Extend}, {0x0981, 0x0981, WBP_Extend},
{0x0982, 0x0983, WBP_Extend}, {0x0982, 0x0983, WBP_Extend},
{0x0985, 0x098C, WBP_ALetter}, {0x0985, 0x098C, WBP_ALetter},
@ -247,12 +247,12 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0BD0, 0x0BD0, WBP_ALetter}, {0x0BD0, 0x0BD0, WBP_ALetter},
{0x0BD7, 0x0BD7, WBP_Extend}, {0x0BD7, 0x0BD7, WBP_Extend},
{0x0BE6, 0x0BEF, WBP_Numeric}, {0x0BE6, 0x0BEF, WBP_Numeric},
{0x0C00, 0x0C00, WBP_Extend},
{0x0C01, 0x0C03, WBP_Extend}, {0x0C01, 0x0C03, WBP_Extend},
{0x0C05, 0x0C0C, WBP_ALetter}, {0x0C05, 0x0C0C, WBP_ALetter},
{0x0C0E, 0x0C10, WBP_ALetter}, {0x0C0E, 0x0C10, WBP_ALetter},
{0x0C12, 0x0C28, WBP_ALetter}, {0x0C12, 0x0C28, WBP_ALetter},
{0x0C2A, 0x0C33, WBP_ALetter}, {0x0C2A, 0x0C39, WBP_ALetter},
{0x0C35, 0x0C39, WBP_ALetter},
{0x0C3D, 0x0C3D, WBP_ALetter}, {0x0C3D, 0x0C3D, WBP_ALetter},
{0x0C3E, 0x0C40, WBP_Extend}, {0x0C3E, 0x0C40, WBP_Extend},
{0x0C41, 0x0C44, WBP_Extend}, {0x0C41, 0x0C44, WBP_Extend},
@ -263,6 +263,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0C60, 0x0C61, WBP_ALetter}, {0x0C60, 0x0C61, WBP_ALetter},
{0x0C62, 0x0C63, WBP_Extend}, {0x0C62, 0x0C63, WBP_Extend},
{0x0C66, 0x0C6F, WBP_Numeric}, {0x0C66, 0x0C6F, WBP_Numeric},
{0x0C81, 0x0C81, WBP_Extend},
{0x0C82, 0x0C83, WBP_Extend}, {0x0C82, 0x0C83, WBP_Extend},
{0x0C85, 0x0C8C, WBP_ALetter}, {0x0C85, 0x0C8C, WBP_ALetter},
{0x0C8E, 0x0C90, WBP_ALetter}, {0x0C8E, 0x0C90, WBP_ALetter},
@ -284,6 +285,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0CE2, 0x0CE3, WBP_Extend}, {0x0CE2, 0x0CE3, WBP_Extend},
{0x0CE6, 0x0CEF, WBP_Numeric}, {0x0CE6, 0x0CEF, WBP_Numeric},
{0x0CF1, 0x0CF2, WBP_ALetter}, {0x0CF1, 0x0CF2, WBP_ALetter},
{0x0D01, 0x0D01, WBP_Extend},
{0x0D02, 0x0D03, WBP_Extend}, {0x0D02, 0x0D03, WBP_Extend},
{0x0D05, 0x0D0C, WBP_ALetter}, {0x0D05, 0x0D0C, WBP_ALetter},
{0x0D0E, 0x0D10, WBP_ALetter}, {0x0D0E, 0x0D10, WBP_ALetter},
@ -311,6 +313,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x0DD2, 0x0DD4, WBP_Extend}, {0x0DD2, 0x0DD4, WBP_Extend},
{0x0DD6, 0x0DD6, WBP_Extend}, {0x0DD6, 0x0DD6, WBP_Extend},
{0x0DD8, 0x0DDF, WBP_Extend}, {0x0DD8, 0x0DDF, WBP_Extend},
{0x0DE6, 0x0DEF, WBP_Numeric},
{0x0DF2, 0x0DF3, WBP_Extend}, {0x0DF2, 0x0DF3, WBP_Extend},
{0x0E31, 0x0E31, WBP_Extend}, {0x0E31, 0x0E31, WBP_Extend},
{0x0E34, 0x0E3A, WBP_Extend}, {0x0E34, 0x0E3A, WBP_Extend},
@ -391,6 +394,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1681, 0x169A, WBP_ALetter}, {0x1681, 0x169A, WBP_ALetter},
{0x16A0, 0x16EA, WBP_ALetter}, {0x16A0, 0x16EA, WBP_ALetter},
{0x16EE, 0x16F0, WBP_ALetter}, {0x16EE, 0x16F0, WBP_ALetter},
{0x16F1, 0x16F8, WBP_ALetter},
{0x1700, 0x170C, WBP_ALetter}, {0x1700, 0x170C, WBP_ALetter},
{0x170E, 0x1711, WBP_ALetter}, {0x170E, 0x1711, WBP_ALetter},
{0x1712, 0x1714, WBP_Extend}, {0x1712, 0x1714, WBP_Extend},
@ -411,6 +415,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x17DD, 0x17DD, WBP_Extend}, {0x17DD, 0x17DD, WBP_Extend},
{0x17E0, 0x17E9, WBP_Numeric}, {0x17E0, 0x17E9, WBP_Numeric},
{0x180B, 0x180D, WBP_Extend}, {0x180B, 0x180D, WBP_Extend},
{0x180E, 0x180E, WBP_Format},
{0x1810, 0x1819, WBP_Numeric}, {0x1810, 0x1819, WBP_Numeric},
{0x1820, 0x1842, WBP_ALetter}, {0x1820, 0x1842, WBP_ALetter},
{0x1843, 0x1843, WBP_ALetter}, {0x1843, 0x1843, WBP_ALetter},
@ -419,7 +424,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x18A9, 0x18A9, WBP_Extend}, {0x18A9, 0x18A9, WBP_Extend},
{0x18AA, 0x18AA, WBP_ALetter}, {0x18AA, 0x18AA, WBP_ALetter},
{0x18B0, 0x18F5, WBP_ALetter}, {0x18B0, 0x18F5, WBP_ALetter},
{0x1900, 0x191C, WBP_ALetter}, {0x1900, 0x191E, WBP_ALetter},
{0x1920, 0x1922, WBP_Extend}, {0x1920, 0x1922, WBP_Extend},
{0x1923, 0x1926, WBP_Extend}, {0x1923, 0x1926, WBP_Extend},
{0x1927, 0x1928, WBP_Extend}, {0x1927, 0x1928, WBP_Extend},
@ -434,7 +439,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x19D0, 0x19D9, WBP_Numeric}, {0x19D0, 0x19D9, WBP_Numeric},
{0x1A00, 0x1A16, WBP_ALetter}, {0x1A00, 0x1A16, WBP_ALetter},
{0x1A17, 0x1A18, WBP_Extend}, {0x1A17, 0x1A18, WBP_Extend},
{0x1A19, 0x1A1B, WBP_Extend}, {0x1A19, 0x1A1A, WBP_Extend},
{0x1A1B, 0x1A1B, WBP_Extend},
{0x1A55, 0x1A55, WBP_Extend}, {0x1A55, 0x1A55, WBP_Extend},
{0x1A56, 0x1A56, WBP_Extend}, {0x1A56, 0x1A56, WBP_Extend},
{0x1A57, 0x1A57, WBP_Extend}, {0x1A57, 0x1A57, WBP_Extend},
@ -449,6 +455,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1A7F, 0x1A7F, WBP_Extend}, {0x1A7F, 0x1A7F, WBP_Extend},
{0x1A80, 0x1A89, WBP_Numeric}, {0x1A80, 0x1A89, WBP_Numeric},
{0x1A90, 0x1A99, WBP_Numeric}, {0x1A90, 0x1A99, WBP_Numeric},
{0x1AB0, 0x1ABD, WBP_Extend},
{0x1ABE, 0x1ABE, WBP_Extend},
{0x1B00, 0x1B03, WBP_Extend}, {0x1B00, 0x1B03, WBP_Extend},
{0x1B04, 0x1B04, WBP_Extend}, {0x1B04, 0x1B04, WBP_Extend},
{0x1B05, 0x1B33, WBP_ALetter}, {0x1B05, 0x1B33, WBP_ALetter},
@ -471,8 +479,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1BA6, 0x1BA7, WBP_Extend}, {0x1BA6, 0x1BA7, WBP_Extend},
{0x1BA8, 0x1BA9, WBP_Extend}, {0x1BA8, 0x1BA9, WBP_Extend},
{0x1BAA, 0x1BAA, WBP_Extend}, {0x1BAA, 0x1BAA, WBP_Extend},
{0x1BAB, 0x1BAB, WBP_Extend}, {0x1BAB, 0x1BAD, WBP_Extend},
{0x1BAC, 0x1BAD, WBP_Extend},
{0x1BAE, 0x1BAF, WBP_ALetter}, {0x1BAE, 0x1BAF, WBP_ALetter},
{0x1BB0, 0x1BB9, WBP_Numeric}, {0x1BB0, 0x1BB9, WBP_Numeric},
{0x1BBA, 0x1BE5, WBP_ALetter}, {0x1BBA, 0x1BE5, WBP_ALetter},
@ -504,13 +511,14 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1CF2, 0x1CF3, WBP_Extend}, {0x1CF2, 0x1CF3, WBP_Extend},
{0x1CF4, 0x1CF4, WBP_Extend}, {0x1CF4, 0x1CF4, WBP_Extend},
{0x1CF5, 0x1CF6, WBP_ALetter}, {0x1CF5, 0x1CF6, WBP_ALetter},
{0x1CF8, 0x1CF9, WBP_Extend},
{0x1D00, 0x1D2B, WBP_ALetter}, {0x1D00, 0x1D2B, WBP_ALetter},
{0x1D2C, 0x1D6A, WBP_ALetter}, {0x1D2C, 0x1D6A, WBP_ALetter},
{0x1D6B, 0x1D77, WBP_ALetter}, {0x1D6B, 0x1D77, WBP_ALetter},
{0x1D78, 0x1D78, WBP_ALetter}, {0x1D78, 0x1D78, WBP_ALetter},
{0x1D79, 0x1D9A, WBP_ALetter}, {0x1D79, 0x1D9A, WBP_ALetter},
{0x1D9B, 0x1DBF, WBP_ALetter}, {0x1D9B, 0x1DBF, WBP_ALetter},
{0x1DC0, 0x1DE6, WBP_Extend}, {0x1DC0, 0x1DF5, WBP_Extend},
{0x1DFC, 0x1DFF, WBP_Extend}, {0x1DFC, 0x1DFF, WBP_Extend},
{0x1E00, 0x1F15, WBP_ALetter}, {0x1E00, 0x1F15, WBP_ALetter},
{0x1F18, 0x1F1D, WBP_ALetter}, {0x1F18, 0x1F1D, WBP_ALetter},
@ -544,7 +552,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x2044, 0x2044, WBP_MidNum}, {0x2044, 0x2044, WBP_MidNum},
{0x2054, 0x2054, WBP_ExtendNumLet}, {0x2054, 0x2054, WBP_ExtendNumLet},
{0x2060, 0x2064, WBP_Format}, {0x2060, 0x2064, WBP_Format},
{0x206A, 0x206F, WBP_Format}, {0x2066, 0x206F, WBP_Format},
{0x2071, 0x2071, WBP_ALetter}, {0x2071, 0x2071, WBP_ALetter},
{0x207F, 0x207F, WBP_ALetter}, {0x207F, 0x207F, WBP_ALetter},
{0x2090, 0x209C, WBP_ALetter}, {0x2090, 0x209C, WBP_ALetter},
@ -631,7 +639,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xA670, 0xA672, WBP_Extend}, {0xA670, 0xA672, WBP_Extend},
{0xA674, 0xA67D, WBP_Extend}, {0xA674, 0xA67D, WBP_Extend},
{0xA67F, 0xA67F, WBP_ALetter}, {0xA67F, 0xA67F, WBP_ALetter},
{0xA680, 0xA697, WBP_ALetter}, {0xA680, 0xA69B, WBP_ALetter},
{0xA69C, 0xA69D, WBP_ALetter},
{0xA69F, 0xA69F, WBP_Extend}, {0xA69F, 0xA69F, WBP_Extend},
{0xA6A0, 0xA6E5, WBP_ALetter}, {0xA6A0, 0xA6E5, WBP_ALetter},
{0xA6E6, 0xA6EF, WBP_ALetter}, {0xA6E6, 0xA6EF, WBP_ALetter},
@ -642,8 +651,9 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xA771, 0xA787, WBP_ALetter}, {0xA771, 0xA787, WBP_ALetter},
{0xA788, 0xA788, WBP_ALetter}, {0xA788, 0xA788, WBP_ALetter},
{0xA78B, 0xA78E, WBP_ALetter}, {0xA78B, 0xA78E, WBP_ALetter},
{0xA790, 0xA793, WBP_ALetter}, {0xA790, 0xA7AD, WBP_ALetter},
{0xA7A0, 0xA7AA, WBP_ALetter}, {0xA7B0, 0xA7B1, WBP_ALetter},
{0xA7F7, 0xA7F7, WBP_ALetter},
{0xA7F8, 0xA7F9, WBP_ALetter}, {0xA7F8, 0xA7F9, WBP_ALetter},
{0xA7FA, 0xA7FA, WBP_ALetter}, {0xA7FA, 0xA7FA, WBP_ALetter},
{0xA7FB, 0xA801, WBP_ALetter}, {0xA7FB, 0xA801, WBP_ALetter},
@ -683,6 +693,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xA9BD, 0xA9C0, WBP_Extend}, {0xA9BD, 0xA9C0, WBP_Extend},
{0xA9CF, 0xA9CF, WBP_ALetter}, {0xA9CF, 0xA9CF, WBP_ALetter},
{0xA9D0, 0xA9D9, WBP_Numeric}, {0xA9D0, 0xA9D9, WBP_Numeric},
{0xA9E5, 0xA9E5, WBP_Extend},
{0xA9F0, 0xA9F9, WBP_Numeric},
{0xAA00, 0xAA28, WBP_ALetter}, {0xAA00, 0xAA28, WBP_ALetter},
{0xAA29, 0xAA2E, WBP_Extend}, {0xAA29, 0xAA2E, WBP_Extend},
{0xAA2F, 0xAA30, WBP_Extend}, {0xAA2F, 0xAA30, WBP_Extend},
@ -696,6 +708,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xAA4D, 0xAA4D, WBP_Extend}, {0xAA4D, 0xAA4D, WBP_Extend},
{0xAA50, 0xAA59, WBP_Numeric}, {0xAA50, 0xAA59, WBP_Numeric},
{0xAA7B, 0xAA7B, WBP_Extend}, {0xAA7B, 0xAA7B, WBP_Extend},
{0xAA7C, 0xAA7C, WBP_Extend},
{0xAA7D, 0xAA7D, WBP_Extend},
{0xAAB0, 0xAAB0, WBP_Extend}, {0xAAB0, 0xAAB0, WBP_Extend},
{0xAAB2, 0xAAB4, WBP_Extend}, {0xAAB2, 0xAAB4, WBP_Extend},
{0xAAB7, 0xAAB8, WBP_Extend}, {0xAAB7, 0xAAB8, WBP_Extend},
@ -714,6 +728,9 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xAB11, 0xAB16, WBP_ALetter}, {0xAB11, 0xAB16, WBP_ALetter},
{0xAB20, 0xAB26, WBP_ALetter}, {0xAB20, 0xAB26, WBP_ALetter},
{0xAB28, 0xAB2E, WBP_ALetter}, {0xAB28, 0xAB2E, WBP_ALetter},
{0xAB30, 0xAB5A, WBP_ALetter},
{0xAB5C, 0xAB5F, WBP_ALetter},
{0xAB64, 0xAB65, WBP_ALetter},
{0xABC0, 0xABE2, WBP_ALetter}, {0xABC0, 0xABE2, WBP_ALetter},
{0xABE3, 0xABE4, WBP_Extend}, {0xABE3, 0xABE4, WBP_Extend},
{0xABE5, 0xABE5, WBP_Extend}, {0xABE5, 0xABE5, WBP_Extend},
@ -728,15 +745,16 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xD7CB, 0xD7FB, WBP_ALetter}, {0xD7CB, 0xD7FB, WBP_ALetter},
{0xFB00, 0xFB06, WBP_ALetter}, {0xFB00, 0xFB06, WBP_ALetter},
{0xFB13, 0xFB17, WBP_ALetter}, {0xFB13, 0xFB17, WBP_ALetter},
{0xFB1D, 0xFB1D, WBP_ALetter}, {0xFB1D, 0xFB1D, WBP_Hebrew_Letter},
{0xFB1E, 0xFB1E, WBP_Extend}, {0xFB1E, 0xFB1E, WBP_Extend},
{0xFB1F, 0xFB28, WBP_ALetter}, {0xFB1F, 0xFB28, WBP_Hebrew_Letter},
{0xFB2A, 0xFB36, WBP_ALetter}, {0xFB2A, 0xFB36, WBP_Hebrew_Letter},
{0xFB38, 0xFB3C, WBP_ALetter}, {0xFB38, 0xFB3C, WBP_Hebrew_Letter},
{0xFB3E, 0xFB3E, WBP_ALetter}, {0xFB3E, 0xFB3E, WBP_Hebrew_Letter},
{0xFB40, 0xFB41, WBP_ALetter}, {0xFB40, 0xFB41, WBP_Hebrew_Letter},
{0xFB43, 0xFB44, WBP_ALetter}, {0xFB43, 0xFB44, WBP_Hebrew_Letter},
{0xFB46, 0xFBB1, WBP_ALetter}, {0xFB46, 0xFB4F, WBP_Hebrew_Letter},
{0xFB50, 0xFBB1, WBP_ALetter},
{0xFBD3, 0xFD3D, WBP_ALetter}, {0xFBD3, 0xFD3D, WBP_ALetter},
{0xFD50, 0xFD8F, WBP_ALetter}, {0xFD50, 0xFD8F, WBP_ALetter},
{0xFD92, 0xFDC7, WBP_ALetter}, {0xFD92, 0xFDC7, WBP_ALetter},
@ -745,7 +763,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0xFE10, 0xFE10, WBP_MidNum}, {0xFE10, 0xFE10, WBP_MidNum},
{0xFE13, 0xFE13, WBP_MidLetter}, {0xFE13, 0xFE13, WBP_MidLetter},
{0xFE14, 0xFE14, WBP_MidNum}, {0xFE14, 0xFE14, WBP_MidNum},
{0xFE20, 0xFE26, WBP_Extend}, {0xFE20, 0xFE2D, WBP_Extend},
{0xFE33, 0xFE34, WBP_ExtendNumLet}, {0xFE33, 0xFE34, WBP_ExtendNumLet},
{0xFE4D, 0xFE4F, WBP_ExtendNumLet}, {0xFE4D, 0xFE4F, WBP_ExtendNumLet},
{0xFE50, 0xFE50, WBP_MidNum}, {0xFE50, 0xFE50, WBP_MidNum},
@ -784,11 +802,14 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x101FD, 0x101FD, WBP_Extend}, {0x101FD, 0x101FD, WBP_Extend},
{0x10280, 0x1029C, WBP_ALetter}, {0x10280, 0x1029C, WBP_ALetter},
{0x102A0, 0x102D0, WBP_ALetter}, {0x102A0, 0x102D0, WBP_ALetter},
{0x10300, 0x1031E, WBP_ALetter}, {0x102E0, 0x102E0, WBP_Extend},
{0x10300, 0x1031F, WBP_ALetter},
{0x10330, 0x10340, WBP_ALetter}, {0x10330, 0x10340, WBP_ALetter},
{0x10341, 0x10341, WBP_ALetter}, {0x10341, 0x10341, WBP_ALetter},
{0x10342, 0x10349, WBP_ALetter}, {0x10342, 0x10349, WBP_ALetter},
{0x1034A, 0x1034A, WBP_ALetter}, {0x1034A, 0x1034A, WBP_ALetter},
{0x10350, 0x10375, WBP_ALetter},
{0x10376, 0x1037A, WBP_Extend},
{0x10380, 0x1039D, WBP_ALetter}, {0x10380, 0x1039D, WBP_ALetter},
{0x103A0, 0x103C3, WBP_ALetter}, {0x103A0, 0x103C3, WBP_ALetter},
{0x103C8, 0x103CF, WBP_ALetter}, {0x103C8, 0x103CF, WBP_ALetter},
@ -796,12 +817,19 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x10400, 0x1044F, WBP_ALetter}, {0x10400, 0x1044F, WBP_ALetter},
{0x10450, 0x1049D, WBP_ALetter}, {0x10450, 0x1049D, WBP_ALetter},
{0x104A0, 0x104A9, WBP_Numeric}, {0x104A0, 0x104A9, WBP_Numeric},
{0x10500, 0x10527, WBP_ALetter},
{0x10530, 0x10563, WBP_ALetter},
{0x10600, 0x10736, WBP_ALetter},
{0x10740, 0x10755, WBP_ALetter},
{0x10760, 0x10767, WBP_ALetter},
{0x10800, 0x10805, WBP_ALetter}, {0x10800, 0x10805, WBP_ALetter},
{0x10808, 0x10808, WBP_ALetter}, {0x10808, 0x10808, WBP_ALetter},
{0x1080A, 0x10835, WBP_ALetter}, {0x1080A, 0x10835, WBP_ALetter},
{0x10837, 0x10838, WBP_ALetter}, {0x10837, 0x10838, WBP_ALetter},
{0x1083C, 0x1083C, WBP_ALetter}, {0x1083C, 0x1083C, WBP_ALetter},
{0x1083F, 0x10855, WBP_ALetter}, {0x1083F, 0x10855, WBP_ALetter},
{0x10860, 0x10876, WBP_ALetter},
{0x10880, 0x1089E, WBP_ALetter},
{0x10900, 0x10915, WBP_ALetter}, {0x10900, 0x10915, WBP_ALetter},
{0x10920, 0x10939, WBP_ALetter}, {0x10920, 0x10939, WBP_ALetter},
{0x10980, 0x109B7, WBP_ALetter}, {0x10980, 0x109B7, WBP_ALetter},
@ -816,9 +844,14 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x10A38, 0x10A3A, WBP_Extend}, {0x10A38, 0x10A3A, WBP_Extend},
{0x10A3F, 0x10A3F, WBP_Extend}, {0x10A3F, 0x10A3F, WBP_Extend},
{0x10A60, 0x10A7C, WBP_ALetter}, {0x10A60, 0x10A7C, WBP_ALetter},
{0x10A80, 0x10A9C, WBP_ALetter},
{0x10AC0, 0x10AC7, WBP_ALetter},
{0x10AC9, 0x10AE4, WBP_ALetter},
{0x10AE5, 0x10AE6, WBP_Extend},
{0x10B00, 0x10B35, WBP_ALetter}, {0x10B00, 0x10B35, WBP_ALetter},
{0x10B40, 0x10B55, WBP_ALetter}, {0x10B40, 0x10B55, WBP_ALetter},
{0x10B60, 0x10B72, WBP_ALetter}, {0x10B60, 0x10B72, WBP_ALetter},
{0x10B80, 0x10B91, WBP_ALetter},
{0x10C00, 0x10C48, WBP_ALetter}, {0x10C00, 0x10C48, WBP_ALetter},
{0x11000, 0x11000, WBP_Extend}, {0x11000, 0x11000, WBP_Extend},
{0x11001, 0x11001, WBP_Extend}, {0x11001, 0x11001, WBP_Extend},
@ -826,7 +859,7 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x11003, 0x11037, WBP_ALetter}, {0x11003, 0x11037, WBP_ALetter},
{0x11038, 0x11046, WBP_Extend}, {0x11038, 0x11046, WBP_Extend},
{0x11066, 0x1106F, WBP_Numeric}, {0x11066, 0x1106F, WBP_Numeric},
{0x11080, 0x11081, WBP_Extend}, {0x1107F, 0x11081, WBP_Extend},
{0x11082, 0x11082, WBP_Extend}, {0x11082, 0x11082, WBP_Extend},
{0x11083, 0x110AF, WBP_ALetter}, {0x11083, 0x110AF, WBP_ALetter},
{0x110B0, 0x110B2, WBP_Extend}, {0x110B0, 0x110B2, WBP_Extend},
@ -842,6 +875,9 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1112C, 0x1112C, WBP_Extend}, {0x1112C, 0x1112C, WBP_Extend},
{0x1112D, 0x11134, WBP_Extend}, {0x1112D, 0x11134, WBP_Extend},
{0x11136, 0x1113F, WBP_Numeric}, {0x11136, 0x1113F, WBP_Numeric},
{0x11150, 0x11172, WBP_ALetter},
{0x11173, 0x11173, WBP_Extend},
{0x11176, 0x11176, WBP_ALetter},
{0x11180, 0x11181, WBP_Extend}, {0x11180, 0x11181, WBP_Extend},
{0x11182, 0x11182, WBP_Extend}, {0x11182, 0x11182, WBP_Extend},
{0x11183, 0x111B2, WBP_ALetter}, {0x11183, 0x111B2, WBP_ALetter},
@ -850,6 +886,68 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x111BF, 0x111C0, WBP_Extend}, {0x111BF, 0x111C0, WBP_Extend},
{0x111C1, 0x111C4, WBP_ALetter}, {0x111C1, 0x111C4, WBP_ALetter},
{0x111D0, 0x111D9, WBP_Numeric}, {0x111D0, 0x111D9, WBP_Numeric},
{0x111DA, 0x111DA, WBP_ALetter},
{0x11200, 0x11211, WBP_ALetter},
{0x11213, 0x1122B, WBP_ALetter},
{0x1122C, 0x1122E, WBP_Extend},
{0x1122F, 0x11231, WBP_Extend},
{0x11232, 0x11233, WBP_Extend},
{0x11234, 0x11234, WBP_Extend},
{0x11235, 0x11235, WBP_Extend},
{0x11236, 0x11237, WBP_Extend},
{0x112B0, 0x112DE, WBP_ALetter},
{0x112DF, 0x112DF, WBP_Extend},
{0x112E0, 0x112E2, WBP_Extend},
{0x112E3, 0x112EA, WBP_Extend},
{0x112F0, 0x112F9, WBP_Numeric},
{0x11301, 0x11301, WBP_Extend},
{0x11302, 0x11303, WBP_Extend},
{0x11305, 0x1130C, WBP_ALetter},
{0x1130F, 0x11310, WBP_ALetter},
{0x11313, 0x11328, WBP_ALetter},
{0x1132A, 0x11330, WBP_ALetter},
{0x11332, 0x11333, WBP_ALetter},
{0x11335, 0x11339, WBP_ALetter},
{0x1133C, 0x1133C, WBP_Extend},
{0x1133D, 0x1133D, WBP_ALetter},
{0x1133E, 0x1133F, WBP_Extend},
{0x11340, 0x11340, WBP_Extend},
{0x11341, 0x11344, WBP_Extend},
{0x11347, 0x11348, WBP_Extend},
{0x1134B, 0x1134D, WBP_Extend},
{0x11357, 0x11357, WBP_Extend},
{0x1135D, 0x11361, WBP_ALetter},
{0x11362, 0x11363, WBP_Extend},
{0x11366, 0x1136C, WBP_Extend},
{0x11370, 0x11374, WBP_Extend},
{0x11480, 0x114AF, WBP_ALetter},
{0x114B0, 0x114B2, WBP_Extend},
{0x114B3, 0x114B8, WBP_Extend},
{0x114B9, 0x114B9, WBP_Extend},
{0x114BA, 0x114BA, WBP_Extend},
{0x114BB, 0x114BE, WBP_Extend},
{0x114BF, 0x114C0, WBP_Extend},
{0x114C1, 0x114C1, WBP_Extend},
{0x114C2, 0x114C3, WBP_Extend},
{0x114C4, 0x114C5, WBP_ALetter},
{0x114C7, 0x114C7, WBP_ALetter},
{0x114D0, 0x114D9, WBP_Numeric},
{0x11580, 0x115AE, WBP_ALetter},
{0x115AF, 0x115B1, WBP_Extend},
{0x115B2, 0x115B5, WBP_Extend},
{0x115B8, 0x115BB, WBP_Extend},
{0x115BC, 0x115BD, WBP_Extend},
{0x115BE, 0x115BE, WBP_Extend},
{0x115BF, 0x115C0, WBP_Extend},
{0x11600, 0x1162F, WBP_ALetter},
{0x11630, 0x11632, WBP_Extend},
{0x11633, 0x1163A, WBP_Extend},
{0x1163B, 0x1163C, WBP_Extend},
{0x1163D, 0x1163D, WBP_Extend},
{0x1163E, 0x1163E, WBP_Extend},
{0x1163F, 0x11640, WBP_Extend},
{0x11644, 0x11644, WBP_ALetter},
{0x11650, 0x11659, WBP_Numeric},
{0x11680, 0x116AA, WBP_ALetter}, {0x11680, 0x116AA, WBP_ALetter},
{0x116AB, 0x116AB, WBP_Extend}, {0x116AB, 0x116AB, WBP_Extend},
{0x116AC, 0x116AC, WBP_Extend}, {0x116AC, 0x116AC, WBP_Extend},
@ -859,16 +957,36 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x116B6, 0x116B6, WBP_Extend}, {0x116B6, 0x116B6, WBP_Extend},
{0x116B7, 0x116B7, WBP_Extend}, {0x116B7, 0x116B7, WBP_Extend},
{0x116C0, 0x116C9, WBP_Numeric}, {0x116C0, 0x116C9, WBP_Numeric},
{0x12000, 0x1236E, WBP_ALetter}, {0x118A0, 0x118DF, WBP_ALetter},
{0x12400, 0x12462, WBP_ALetter}, {0x118E0, 0x118E9, WBP_Numeric},
{0x118FF, 0x118FF, WBP_ALetter},
{0x11AC0, 0x11AF8, WBP_ALetter},
{0x12000, 0x12398, WBP_ALetter},
{0x12400, 0x1246E, WBP_ALetter},
{0x13000, 0x1342E, WBP_ALetter}, {0x13000, 0x1342E, WBP_ALetter},
{0x16800, 0x16A38, WBP_ALetter}, {0x16800, 0x16A38, WBP_ALetter},
{0x16A40, 0x16A5E, WBP_ALetter},
{0x16A60, 0x16A69, WBP_Numeric},
{0x16AD0, 0x16AED, WBP_ALetter},
{0x16AF0, 0x16AF4, WBP_Extend},
{0x16B00, 0x16B2F, WBP_ALetter},
{0x16B30, 0x16B36, WBP_Extend},
{0x16B40, 0x16B43, WBP_ALetter},
{0x16B50, 0x16B59, WBP_Numeric},
{0x16B63, 0x16B77, WBP_ALetter},
{0x16B7D, 0x16B8F, WBP_ALetter},
{0x16F00, 0x16F44, WBP_ALetter}, {0x16F00, 0x16F44, WBP_ALetter},
{0x16F50, 0x16F50, WBP_ALetter}, {0x16F50, 0x16F50, WBP_ALetter},
{0x16F51, 0x16F7E, WBP_Extend}, {0x16F51, 0x16F7E, WBP_Extend},
{0x16F8F, 0x16F92, WBP_Extend}, {0x16F8F, 0x16F92, WBP_Extend},
{0x16F93, 0x16F9F, WBP_ALetter}, {0x16F93, 0x16F9F, WBP_ALetter},
{0x1B000, 0x1B000, WBP_Katakana}, {0x1B000, 0x1B000, WBP_Katakana},
{0x1BC00, 0x1BC6A, WBP_ALetter},
{0x1BC70, 0x1BC7C, WBP_ALetter},
{0x1BC80, 0x1BC88, WBP_ALetter},
{0x1BC90, 0x1BC99, WBP_ALetter},
{0x1BC9D, 0x1BC9E, WBP_Extend},
{0x1BCA0, 0x1BCA3, WBP_Format},
{0x1D165, 0x1D166, WBP_Extend}, {0x1D165, 0x1D166, WBP_Extend},
{0x1D167, 0x1D169, WBP_Extend}, {0x1D167, 0x1D169, WBP_Extend},
{0x1D16D, 0x1D172, WBP_Extend}, {0x1D16D, 0x1D172, WBP_Extend},
@ -908,6 +1026,8 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1D7AA, 0x1D7C2, WBP_ALetter}, {0x1D7AA, 0x1D7C2, WBP_ALetter},
{0x1D7C4, 0x1D7CB, WBP_ALetter}, {0x1D7C4, 0x1D7CB, WBP_ALetter},
{0x1D7CE, 0x1D7FF, WBP_Numeric}, {0x1D7CE, 0x1D7FF, WBP_Numeric},
{0x1E800, 0x1E8C4, WBP_ALetter},
{0x1E8D0, 0x1E8D6, WBP_Extend},
{0x1EE00, 0x1EE03, WBP_ALetter}, {0x1EE00, 0x1EE03, WBP_ALetter},
{0x1EE05, 0x1EE1F, WBP_ALetter}, {0x1EE05, 0x1EE1F, WBP_ALetter},
{0x1EE21, 0x1EE22, WBP_ALetter}, {0x1EE21, 0x1EE22, WBP_ALetter},
@ -941,7 +1061,10 @@ static struct WordBreakProperties wb_prop_default[] = {
{0x1EEA1, 0x1EEA3, WBP_ALetter}, {0x1EEA1, 0x1EEA3, WBP_ALetter},
{0x1EEA5, 0x1EEA9, WBP_ALetter}, {0x1EEA5, 0x1EEA9, WBP_ALetter},
{0x1EEAB, 0x1EEBB, WBP_ALetter}, {0x1EEAB, 0x1EEBB, WBP_ALetter},
{0x1F1E6, 0x1F1FF, WBP_Regional}, {0x1F130, 0x1F149, WBP_ALetter},
{0x1F150, 0x1F169, WBP_ALetter},
{0x1F170, 0x1F189, WBP_ALetter},
{0x1F1E6, 0x1F1FF, WBP_Regional_Indicator},
{0xE0001, 0xE0001, WBP_Format}, {0xE0001, 0xE0001, WBP_Format},
{0xE0020, 0xE007F, WBP_Format}, {0xE0020, 0xE007F, WBP_Format},
{0xE0100, 0xE01EF, WBP_Extend}, {0xE0100, 0xE01EF, WBP_Extend},

View File

@ -4,8 +4,7 @@
* Word breaking in a Unicode sequence. Designed to be used in a * Word breaking in a Unicode sequence. Designed to be used in a
* generic text renderer. * generic text renderer.
* *
* Copyright (C) 2013 Tom Hacohen <tom at stosb dot com> * Copyright (C) 2013-15 Tom Hacohen <tom at stosb dot com>
* Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
* *
* This software is provided 'as-is', without any express or implied * This software is provided 'as-is', without any express or implied
* warranty. In no event will the author be held liable for any damages * warranty. In no event will the author be held liable for any damages
@ -31,9 +30,8 @@
* Unicode 6.0.0: * Unicode 6.0.0:
* <URL:http://www.unicode.org/reports/tr29/tr29-17.html> * <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
* *
* This library has been updated according to Revision 21, for * This library has been updated according to Revision 25, for
* Unicode 6.2.0: * Unicode 7.0.0:
* <URL:http://www.unicode.org/reports/tr29/tr29-21.html>
* *
* The Unicode Terms of Use are available at * The Unicode Terms of Use are available at
* <URL:http://www.unicode.org/copyright.html> * <URL:http://www.unicode.org/copyright.html>
@ -45,11 +43,12 @@
* Definitions of internal data structures, declarations of global * Definitions of internal data structures, declarations of global
* variables, and function prototypes for the word breaking algorithm. * variables, and function prototypes for the word breaking algorithm.
* *
* @version 2.4, 2013/11/10 * @version 2.6, 2015/04/19
* @author Tom Hacohen * @author Tom Hacohen
* @author Petr Filipsky
*/ */
#include "unibreakdef.h"
/** /**
* Word break classes. This is a direct mapping of Table 3 of Unicode * Word break classes. This is a direct mapping of Table 3 of Unicode
* Standard Annex 29, Revision 23. * Standard Annex 29, Revision 23.
@ -61,18 +60,18 @@ enum WordBreakClass
WBP_LF, WBP_LF,
WBP_Newline, WBP_Newline,
WBP_Extend, WBP_Extend,
WBP_Regional_Indicator,
WBP_Format, WBP_Format,
WBP_Katakana, WBP_Katakana,
WBP_Hebrew_Letter,
WBP_ALetter, WBP_ALetter,
WBP_Single_Quote,
WBP_Double_Quote,
WBP_MidNumLet, WBP_MidNumLet,
WBP_MidLetter, WBP_MidLetter,
WBP_MidNum, WBP_MidNum,
WBP_Numeric, WBP_Numeric,
WBP_ExtendNumLet, WBP_ExtendNumLet,
WBP_Regional,
WBP_Hebrew,
WBP_Single,
WBP_Double,
WBP_Any WBP_Any
}; };