Synced libunibreak local copy with upstream.

This fixes T805.
2014-01-21 16:41:06 +00:00 · 2014-01-21 16:41:06 +00:00 · cff1a9a59f
parent cc8fa1da45
commit cff1a9a59f
12 changed files with 1359 additions and 1100 deletions
--- a/src/static_libs/libunibreak/AUTHORS
+++ b/src/static_libs/libunibreak/AUTHORS
@ -1,4 +1,5 @@
-Wu Yongwei.  Designed and implemented liblinebreak.
+Wu Yongwei.  Designed and implemented the original liblinebreak.
+Current maintainer of libunibreak.

 Nikolay Pultsin.  Put forward the original requirements on liblinebreak,
 performed tests, and made a lot of suggestions on the initial versions.
@ -6,3 +7,5 @@ performed tests, and made a lot of suggestions on the initial versions.
 Thomas Klausner.  Autoconfiscated and libtoolized liblinebreak.

 Tom Hacohen.  Added word boundaries support.
+
+Petr Filipsky.  Added incremental processing for line-breaking.
--- a/src/static_libs/libunibreak/ChangeLog
+++ b/src/static_libs/libunibreak/ChangeLog
@ -1,3 +1,116 @@
+2013-11-14  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/linebreak.c: Add/update comments and doc comments.
+	(lb_init_breaking_class): Rename to treat_first_char.
+	(lb_classify_break_simple): Rename to get_lb_result_simple.
+	(lb_classify_break_lookup): Rename to get_lb_result_lookup.
+	(set_linebreaks): Remove an unused local variable.
+
+2013-11-14  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/linebreakdata.c: Regenerate from LineBreak-6.3.0.txt.
+
+2013-11-13  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Fix compilation problems under MSVC.
+	* src/linebreak.c (lb_init_breaking_class): Remove `inline'.
+	(lb_classify_break_simple): Ditto.
+	(lb_classify_break_lookup): Ditto.
+	(lb_classify_break_lookup): Move local variable declaration before
+	assertions.
+
+2013-11-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/Makefile.am (libunibreak_la_LDFLAGS): Set the version-info to
+	`2:0:1'.
+
+2013-11-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/linebreakdef.c: Adjust the order of code.
+	(lb_process_next_char): Make its return type int.
+	* src/linebreak.c (lb_process_next_char): Ditto.
+
+2013-11-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/linebreak.c: Make minor changes in doc comments, formatting,
+	and names.
+	* src/linebreakdef.c: Ditto.
+
+2013-11-10  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* AUTHORS: Add `Petr Filipsky'.
+
+2013-11-10  Petr Filipsky  <philodej@gmail.com>
+
+	Expose low level line-breaking API for incremental processing.
+	* src/linebreak.h: Add prototype declarations for
+	lb_init_break_context and lb_process_next_char.
+	(struct LineBreakContext): New struct.
+	* src/linebreak.h (LINEBREAK_UNDEFINED): New macro constant.
+	(lb_init_breaking_class): New static function.
+	(lb_classify_break_simple): New static function.
+	(lb_classify_break_lookup): New static function.
+	(lb_init_break_context): New function.
+	(lb_process_next_char): New function.
+	(set_linebreaks): Implement with lb_init_break_context and
+	lb_process_next_char.
+
+2013-11-05  Petr Filipsky  <philodej@gmail.com>
+
+	* src/wordbreakdef.h (enum WordBreakClass): Update according to
+	Table 3 of Unicode Standard Annex 29, Revision 23.
+
+2013-09-30  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Update for the libunibreak 1.1 release.
+	* configure.ac (AC_INIT): Change the library version to `1.1'.
+	* Doxyfile (PROJECT_NUMBER): Change to `1.1'.
+	* Makefile.am (EXTRA_DIST): Add the `tools' directory.
+	* NEWS: Add information about libunibreak 1.1.
+	* src/Makefile.am (libunibreak_la_LDFLAGS): Set the version to `1:1'.
+
+2013-09-29  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/Makefile.msvc: Modernize obsolete/deprecated MSVC options.
+
+2013-09-28  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/wordbreak.c: Update copyright year and UAX information.
+	* src/wordbreak.h: Ditto.
+	* src/wordbreakdef.h: Ditto.
+
+2013-09-28  Wu Yongwei  <wuyongwei@gmail.com>
+
+	Fix the errors caused by libtool 2.4 (really annoying to the level
+	of WTF for making me add the foolish dependency on m4).
+	* Makefile.am (ACLOCAL_AMFLAGS): Add `-I m4'.
+	* bootstrap: Add a line to execute autoreconf.
+	* configure.ac (AC_CONFIG_MACRO_DIR): Set to `[m4]'.
+	* purge: Make it remove also the m4 directory.
+
+2013-09-28  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* Makefile.am (EXTRA_DIST): Add `README.md'.
+
+2013-09-28  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* README.md: New Markdown version of README.
+	* README: Remove.
+
+2013-05-13  Tom Hacohen  <tom@stosb.com>
+
+	Update files according to UAX #29-21, for Unicode 6.2.0.
+	* README: Update the reference to UAX #29-21.
+	* src/wordbreak.c (set_wordbreaks): Update for WBP_Regional.
+	* src/wordbreakdef.h (WBP_Regional): New enumerator for the new
+	property `RI' as defined in UAX #29-21.
+	* src/wordbreakdata.c: Regenerate from WordBreakProperty-6.2.0.txt.
+
+2013-05-06  Wu Yongwei  <wuyongwei@gmail.com>
+
+	* src/Makefile.am (install-exec-hook): Make sure `--disable-static'
+	can work (thanks to Eugene V. Lyubimkin).
+
 2012-10-06  Wu Yongwei  <wuyongwei@gmail.com>

 	Update files according to UAX #14-30, for Unicode 6.2.0.
@ -82,11 +195,12 @@

 2012-08-11  Wu Yongwei  <wuyongwei@gmail.com>

+	Update for the libunibreak 1.0 release.
 	* configure.ac (AC_INIT): Change the library name and version to
 	`libunibreak' and `1.0'.
 	(AC_PROG_LN_S): New macro.
 	(AC_OUTPUT): Change to `libunibreak.pc'.
-	* Doxyfile: (PROJECT_NAME): Change to `libunibreak'.
+	* Doxyfile (PROJECT_NAME): Change to `libunibreak'.
 	(PROJECT_NUMBER): Change to `1.0'.
 	* LICENCE: Add copyright information about Tom Hacohen.
 	* Makefile.am (lib_LTLIBRARIES): Change to `libunibreak.la'.
@ -96,7 +210,7 @@
 	a symlink to libunibreak.a.
 	* Makefile.msvc: Change the library name to `libunibreak', and the
 	output library to `unibreak.lib'.
-	* NEW: Add information about libunibreak 1.0.
+	* NEWS: Add information about libunibreak 1.0.
 	* README: Change the library name, and add information about word
 	break.

--- a/src/static_libs/libunibreak/NEWS
+++ b/src/static_libs/libunibreak/NEWS
@ -1,3 +1,10 @@
+New in libunibreak 1.1
+
+- Update the code and data to conform to Unicode 6.2.0
+- Update build files to support libtool 2.4
+- Adjust code structure
+- Make a few bug fixes
+
 New in libunibreak 1.0

 - Add word breaking support
--- a/src/static_libs/libunibreak/README
+++ b/src/static_libs/libunibreak/README
@ -1,31 +1,30 @@
-                         L I B U N I B R E A K
-                         =====================
+LIBUNIBREAK
+===========

 Overview
 --------

 This is the README file for libunibreak, an implementation of the line
-breaking and word breaking algorithms as described in Unicode
-Standard Annex 14 and Unicode Standard Annex 30, available at
-         <URL:http://www.unicode.org/reports/tr14/tr14-30.html>
-         <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
+breaking and word breaking algorithms as described in [Unicode Standard
+Annex 14] [1] and [Unicode Standard Annex 29] [2].  Check the project's
+[home page] [3] for up-to-date information.

-Check this URL for up-to-date information:
-             <URL:https://github.com/adah1972/libunibreak>
+  [1]: http://www.unicode.org/reports/tr14/tr14-30.html
+  [2]: http://www.unicode.org/reports/tr29/tr29-21.html
+  [3]: https://github.com/adah1972/libunibreak


 Licence
 -------

 This library is released under an open-source licence, the zlib/libpng
-licence.  Please check the file LICENCE for details.
+licence.  Please check the file *LICENCE* for details.

 Apart from using the algorithm, part of the code is derived from the
-data provided under
-                  <URL:http://www.unicode.org/Public/>
+[Unicode Public Data] [4], and the [Unicode Terms of Use] [5] may apply.

-And the Unicode Terms of Use may apply:
-              <URL:http://www.unicode.org/copyright.html>
+  [4]: http://www.unicode.org/Public/
+  [5]: http://www.unicode.org/copyright.html


 Installation
@ -33,7 +32,7 @@ Installation

 There are three ways to build the library:

-1) On *NIX systems supported by the autoconfiscation tools, do the
+1. On \*NIX systems supported by the autoconfiscation tools, do the
   normal

        ./configure
@ -42,30 +41,28 @@ There are three ways to build the library:

   to build and install both the dynamic and static libraries.  In
   addition, one may
+   - type `make doc` to generate the doxygen documentation; or
+   - type `make linebreakdata` to regenerate *linebreakdata.c* from
+     *LineBreak.txt*.
+   - type `make wordbreakdata` to regenerate *wordbreakdata.c* from
+     *WordBreakProperty.txt*.

-   - type `make doc' to generate the doxygen documentation; or
-   - type `make linebreakdata' to regenerate linebreakdata.c from
-     LineBreak.txt.
-   - type `make wordbreakdata' to regenerate wordbreakdata.c from
-     WordBreakProperty.txt.
-
-2) On systems where GCC and Binutils are supported, one can type
+2. On systems where GCC and Binutils are supported, one can type

        cd src
        cp -p Makefile.gcc Makefile
        make

   to build the static library.  In addition, one may
-
-   - type `make debug' or `make release' to explicitly generate the
+   - type `make debug` or `make release` to explicitly generate the
     debug or release build;
-   - type `make doc' to generate the doxygen documentation; or
-   - type `make linebreakdata' to regenerate linebreakdata.c from
-     LineBreak.txt.
-   - type `make wordbreakdata' to regenerate wordbreakdata.c from
-     WordBreakProperty.txt.
+   - type `make doc` to generate the doxygen documentation; or
+   - type `make linebreakdata` to regenerate *linebreakdata.c* from
+     *LineBreak.txt*.
+   - type `make wordbreakdata` to regenerate *wordbreakdata.c* from
+     *WordBreakProperty.txt*.

-3) On Windows, apart from using method 1 (Cygwin/MSYS) and method 2
+3. On Windows, apart from using method 1 (Cygwin/MSYS) and method 2
   (MinGW), MSVC can also be used.  Type

        cd src
@ -80,9 +77,11 @@ There are three ways to build the library:
 Documentation
 -------------

-Check the generated document doc/html/linebreak_8h.html and
-doc/html/wordbreak_8h.html in the downloaded file for the public
+Check the generated document *doc/html/linebreak\_8h.html* and
+*doc/html/wordbreak\_8h.html* in the downloaded file for the public
 interfaces exposed to applications.


+<!--
 vim:autoindent:expandtab:formatoptions=tcqlmn:textwidth=72:
+-->
--- a/src/static_libs/libunibreak/linebreak.c
+++ b/src/static_libs/libunibreak/linebreak.c
@ -1,10 +1,11 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
- * Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com>
+ * Copyright (C) 2008-2013 Wu Yongwei <wuyongwei at gmail dot com>
+ * Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
@ -44,8 +45,9 @@
 * Implementation of the line breaking algorithm as described in Unicode
 * Standard Annex 14.
 *
- * @version	2.3, 2012/10/06
+ * @version 2.5, 2013/11/14
 * @author  Wu Yongwei
+ * @author  Petr Filipsky
 */

 #include <assert.h>
@ -54,6 +56,11 @@
 #include "linebreak.h"
 #include "linebreakdef.h"

+/**
+ * Special value used internally to indicate an undefined break result.
+ */
+#define LINEBREAK_UNDEFINED -1
+
 /**
 * Size of the second-level index to the line breaking properties.
 */
@ -424,7 +431,7 @@ static enum LineBreakClass resolve_lb_class(
        }
    case LBP_CJ:
        /* Simplified for `normal' line breaking.  See
-		 * <url:http://www.unicode.org/reports/tr14/tr14-28.html#CJ>
+         * <url:http://www.unicode.org/reports/tr14/tr14-30.html#CJ>
         * for details. */
        return LBP_ID;
    case LBP_SA:
@ -436,6 +443,180 @@ static enum LineBreakClass resolve_lb_class(
    }
 }

+/**
+ * Treats specially for the first character in a line.
+ *
+ * @param[in,out] lbpCtx  pointer to the line breaking context
+ * @pre                   \a lbpCtx->lbcCur has a valid line break class
+ * @post                  \a lbpCtx->lbcCur has the updated line break class
+ */
+static void treat_first_char(
+        struct LineBreakContext* lbpCtx)
+{
+    switch (lbpCtx->lbcCur)
+    {
+    case LBP_LF:
+    case LBP_NL:
+        lbpCtx->lbcCur = LBP_BK;        /* Rule LB5 */
+        break;
+    case LBP_CB:
+        lbpCtx->lbcCur = LBP_BA;        /* Rule LB20 */
+        break;
+    case LBP_SP:
+        lbpCtx->lbcCur = LBP_WJ;        /* Leading space treated as WJ */
+        break;
+    default:
+        break;
+    }
+}
+
+/**
+ * Tries telling the line break opportunity by simple rules.
+ *
+ * @param[in,out] lbpCtx  pointer to the line breaking context
+ * @pre                   \a lbpCtx->lbcCur has the current line break
+ *                        class; and \a lbpCtx->lbcNew has the line
+ *                        break class for the next character
+ * @post                  \a lbpCtx->lbcCur has the updated line break
+ *                        class
+ * @return                break result, one of #LINEBREAK_MUSTBREAK,
+ *                        #LINEBREAK_ALLOWBREAK, and #LINEBREAK_NOBREAK
+ *                        if identified; or #LINEBREAK_UNDEFINED if
+ *                        table lookup is needed
+ */
+static int get_lb_result_simple(
+        struct LineBreakContext* lbpCtx)
+{
+    if (lbpCtx->lbcCur == LBP_BK
+        || (lbpCtx->lbcCur == LBP_CR && lbpCtx->lbcNew != LBP_LF))
+    {
+        return LINEBREAK_MUSTBREAK;     /* Rules LB4 and LB5 */
+    }
+
+    switch (lbpCtx->lbcNew)
+    {
+    case LBP_SP:
+        return LINEBREAK_NOBREAK;       /* Rule LB7; no change to lbcCur */
+    case LBP_BK:
+    case LBP_LF:
+    case LBP_NL:
+        lbpCtx->lbcCur = LBP_BK;        /* Mandatory break after */
+        return LINEBREAK_NOBREAK;       /* Rule LB6 */
+    case LBP_CR:
+        lbpCtx->lbcCur = LBP_CR;
+        return LINEBREAK_NOBREAK;       /* Rule LB6 */
+    case LBP_CB:
+        lbpCtx->lbcCur = LBP_BA;
+        return LINEBREAK_ALLOWBREAK;    /* Rule LB20 */
+    default:
+        return LINEBREAK_UNDEFINED;     /* Table lookup is needed */
+    }
+}
+
+/**
+ * Tells the line break opportunity by table lookup.
+ *
+ * @param[in,out] lbpCtx  pointer to the line breaking context
+ * @pre                   \a lbpCtx->lbcCur has the current line break
+ *                        class; \a lbpCtx->lbcLast has the line break
+ *                        class for the last character; and \a
+ *                        lbcCur->lbcNew has the line break class for
+ *                        the next character
+ * @post                  \a lbpCtx->lbcCur has the updated line break
+ *                        class
+ * @return                break result, one of #LINEBREAK_MUSTBREAK,
+ *                        #LINEBREAK_ALLOWBREAK, and #LINEBREAK_NOBREAK
+ */
+static int get_lb_result_lookup(
+        struct LineBreakContext* lbpCtx)
+{
+    /* TODO: Rule LB21a, as introduced by Revision 28 of UAX#14, is not
+     * yet implemented below. */
+    int brk = LINEBREAK_UNDEFINED;
+    assert(lbpCtx->lbcCur <= LBP_JT);
+    assert(lbpCtx->lbcNew <= LBP_JT);
+    switch (baTable[lbpCtx->lbcCur - 1][lbpCtx->lbcNew - 1])
+    {
+    case DIR_BRK:
+        brk = LINEBREAK_ALLOWBREAK;
+        break;
+    case CMI_BRK:
+    case IND_BRK:
+        brk = (lbpCtx->lbcLast == LBP_SP)
+            ? LINEBREAK_ALLOWBREAK
+            : LINEBREAK_NOBREAK;
+        break;
+    case CMP_BRK:
+        brk = LINEBREAK_NOBREAK;
+        if (lbpCtx->lbcLast != LBP_SP)
+            return brk;                 /* Do not update lbcCur */
+        break;
+    case PRH_BRK:
+        brk = LINEBREAK_NOBREAK;
+        break;
+    }
+    lbpCtx->lbcCur = lbpCtx->lbcNew;
+    return brk;
+}
+
+/**
+ * Initializes line breaking context for a given language.
+ *
+ * @param[in,out] lbpCtx  pointer to the line breaking context
+ * @param[in]     ch      the first character to process
+ * @param[in]     lang    language of the input
+ * @post                  the line breaking context is initialized
+ */
+void lb_init_break_context(
+        struct LineBreakContext* lbpCtx,
+        utf32_t ch,
+        const char* lang)
+{
+    lbpCtx->lang = lang;
+    lbpCtx->lbpLang = get_lb_prop_lang(lang);
+    lbpCtx->lbcLast = LBP_Undefined;
+    lbpCtx->lbcNew = LBP_Undefined;
+    lbpCtx->lbcCur = resolve_lb_class(
+                        get_char_lb_class_lang(ch, lbpCtx->lbpLang),
+                        lbpCtx->lang);
+    treat_first_char(lbpCtx);
+}
+
+/**
+ * Updates LineBreakingContext for the next code point and returns
+ * the detected break.
+ *
+ * @param[in,out] lbpCtx  pointer to the line breaking context
+ * @param[in]     ch      Unicode code point
+ * @return                break result, one of #LINEBREAK_MUSTBREAK,
+ *                        #LINEBREAK_ALLOWBREAK, and #LINEBREAK_NOBREAK
+ * @post                  the line breaking context is updated
+ */
+int lb_process_next_char(
+        struct LineBreakContext* lbpCtx,
+        utf32_t ch )
+{
+    int brk;
+
+    lbpCtx->lbcLast = lbpCtx->lbcNew;
+    lbpCtx->lbcNew = get_char_lb_class_lang(ch, lbpCtx->lbpLang);
+    brk = get_lb_result_simple(lbpCtx);
+    switch (brk)
+    {
+    case LINEBREAK_MUSTBREAK:
+        lbpCtx->lbcCur = resolve_lb_class(lbpCtx->lbcNew, lbpCtx->lang);
+        treat_first_char(lbpCtx);
+        break;
+    case LINEBREAK_UNDEFINED:
+        lbpCtx->lbcNew = resolve_lb_class(lbpCtx->lbcNew, lbpCtx->lang);
+        brk = get_lb_result_lookup(lbpCtx);
+        break;
+    default:
+        break;
+    }
+    return brk;
+}
+
 /**
 * Gets the next Unicode character in a UTF-8 sequence.  The index will
 * be advanced to the next complete character, unless the end of string
@ -577,10 +758,7 @@ void set_linebreaks(
        get_next_char_t get_next_char)
 {
    utf32_t ch;
-	enum LineBreakClass lbcCur;
-	enum LineBreakClass lbcNew;
-	enum LineBreakClass lbcLast;
-	struct LineBreakProperties *lbpLang;
+    struct LineBreakContext lbCtx;
    size_t posCur = 0;
    size_t posLast = 0;

@ -588,28 +766,7 @@ void set_linebreaks(
    ch = get_next_char(s, len, &posCur);
    if (ch == EOS)
        return;
-	lbpLang = get_lb_prop_lang(lang);
-	lbcCur = resolve_lb_class(get_char_lb_class_lang(ch, lbpLang), lang);
-	lbcNew = LBP_Undefined;
-
-nextline:
-
-	/* Special treatment for the first character */
-	switch (lbcCur)
-	{
-	case LBP_LF:
-	case LBP_NL:
-		lbcCur = LBP_BK;
-		break;
-	case LBP_CB:
-		lbcCur = LBP_BA;
-		break;
-	case LBP_SP:
-		lbcCur = LBP_WJ;
-		break;
-	default:
-		break;
-	}
+    lb_init_break_context(&lbCtx, ch, lang);

    /* Process a line till an explicit break or end of string */
    for (;;)
@ -619,75 +776,10 @@ nextline:
            brks[posLast] = LINEBREAK_INSIDEACHAR;
        }
        assert(posLast == posCur - 1);
-		lbcLast = lbcNew;
        ch = get_next_char(s, len, &posCur);
        if (ch == EOS)
            break;
-		lbcNew = get_char_lb_class_lang(ch, lbpLang);
-		if (lbcCur == LBP_BK || (lbcCur == LBP_CR && lbcNew != LBP_LF))
-		{
-			brks[posLast] = LINEBREAK_MUSTBREAK;
-			lbcCur = resolve_lb_class(lbcNew, lang);
-			goto nextline;
-		}
-
-		switch (lbcNew)
-		{
-		case LBP_SP:
-			brks[posLast] = LINEBREAK_NOBREAK;
-			continue;
-		case LBP_BK:
-		case LBP_LF:
-		case LBP_NL:
-			brks[posLast] = LINEBREAK_NOBREAK;
-			lbcCur = LBP_BK;
-			continue;
-		case LBP_CR:
-			brks[posLast] = LINEBREAK_NOBREAK;
-			lbcCur = LBP_CR;
-			continue;
-		case LBP_CB:
-			brks[posLast] = LINEBREAK_ALLOWBREAK;
-			lbcCur = LBP_BA;
-			continue;
-		default:
-			break;
-		}
-
-		lbcNew = resolve_lb_class(lbcNew, lang);
-
-		/* TODO: LB21a, as introduced by Revision 28 of UAX#14, is not
-		 * yet implemented below. */
-
-		assert(lbcCur <= LBP_JT);
-		assert(lbcNew <= LBP_JT);
-		switch (baTable[lbcCur - 1][lbcNew - 1])
-		{
-		case DIR_BRK:
-			brks[posLast] = LINEBREAK_ALLOWBREAK;
-			break;
-		case CMI_BRK:
-		case IND_BRK:
-			if (lbcLast == LBP_SP)
-			{
-				brks[posLast] = LINEBREAK_ALLOWBREAK;
-			}
-			else
-			{
-				brks[posLast] = LINEBREAK_NOBREAK;
-			}
-			break;
-		case CMP_BRK:
-			brks[posLast] = LINEBREAK_NOBREAK;
-			if (lbcLast != LBP_SP)
-				continue;
-			break;
-		case PRH_BRK:
-			brks[posLast] = LINEBREAK_NOBREAK;
-			break;
-		}
-
-		lbcCur = lbcNew;
+        brks[posLast] = lb_process_next_char(&lbCtx, ch);
    }

    assert(posLast == posCur - 1 && posCur <= len);
--- a/src/static_libs/libunibreak/linebreak.h
+++ b/src/static_libs/libunibreak/linebreak.h
@ -1,4 +1,4 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
--- a/src/static_libs/libunibreak/linebreakdata.c
+++ b/src/static_libs/libunibreak/linebreakdata.c
@ -1,6 +1,6 @@
 /* The content of this file is generated from:
-# LineBreak-6.2.0.txt
-# Date: 2012-08-08, 19:26:00 GMT [KW]
+# LineBreak-6.3.0.txt
+# Date: 2013-02-06, 19:45:00 GMT [KW, LI]
 */

 #include "linebreak.h"
@ -114,7 +114,9 @@ struct LineBreakProperties lb_prop_default[] = {
 	{ 0x060C, 0x060D, LBP_IS },
 	{ 0x060E, 0x060F, LBP_AL },
 	{ 0x0610, 0x061A, LBP_CM },
-	{ 0x061B, 0x061F, LBP_EX },
+	{ 0x061B, 0x061B, LBP_EX },
+	{ 0x061C, 0x061C, LBP_CM },
+	{ 0x061E, 0x061F, LBP_EX },
 	{ 0x0620, 0x064A, LBP_AL },
 	{ 0x064B, 0x065F, LBP_CM },
 	{ 0x0660, 0x0669, LBP_NU },
@ -456,7 +458,7 @@ struct LineBreakProperties lb_prop_default[] = {
 	{ 0x205D, 0x205F, LBP_BA },
 	{ 0x2060, 0x2060, LBP_WJ },
 	{ 0x2061, 0x2064, LBP_AL },
-	{ 0x206A, 0x206F, LBP_CM },
+	{ 0x2066, 0x206F, LBP_CM },
 	{ 0x2070, 0x2071, LBP_AL },
 	{ 0x2074, 0x2074, LBP_AI },
 	{ 0x2075, 0x207C, LBP_AL },
@ -473,7 +475,7 @@ struct LineBreakProperties lb_prop_default[] = {
 	{ 0x20A7, 0x20A7, LBP_PO },
 	{ 0x20A8, 0x20B5, LBP_PR },
 	{ 0x20B6, 0x20B6, LBP_PO },
-	{ 0x20B7, 0x20BA, LBP_PR },
+	{ 0x20B7, 0x20CF, LBP_PR },
 	{ 0x20D0, 0x20F0, LBP_CM },
 	{ 0x2100, 0x2102, LBP_AL },
 	{ 0x2103, 0x2103, LBP_PO },
@ -774,7 +776,8 @@ struct LineBreakProperties lb_prop_default[] = {
 	{ 0x2E33, 0x2E34, LBP_BA },
 	{ 0x2E35, 0x2E39, LBP_AL },
 	{ 0x2E3A, 0x2E3B, LBP_B2 },
-	{ 0x2E80, 0x3000, LBP_ID },
+	{ 0x2E80, 0x2FFB, LBP_ID },
+	{ 0x3000, 0x3000, LBP_BA },
 	{ 0x3001, 0x3002, LBP_CL },
 	{ 0x3003, 0x3004, LBP_ID },
 	{ 0x3005, 0x3005, LBP_NS },
@ -803,7 +806,9 @@ struct LineBreakProperties lb_prop_default[] = {
 	{ 0x301E, 0x301F, LBP_CL },
 	{ 0x3020, 0x3029, LBP_ID },
 	{ 0x302A, 0x302F, LBP_CM },
-	{ 0x3030, 0x303A, LBP_ID },
+	{ 0x3030, 0x3034, LBP_ID },
+	{ 0x3035, 0x3035, LBP_CM },
+	{ 0x3036, 0x303A, LBP_ID },
 	{ 0x303B, 0x303C, LBP_NS },
 	{ 0x303D, 0x303F, LBP_ID },
 	{ 0x3041, 0x3041, LBP_CJ },
--- a/src/static_libs/libunibreak/linebreakdef.c
+++ b/src/static_libs/libunibreak/linebreakdef.c
@ -1,4 +1,4 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
--- a/src/static_libs/libunibreak/linebreakdef.h
+++ b/src/static_libs/libunibreak/linebreakdef.h
@ -1,10 +1,11 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Line breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
- * Copyright (C) 2008-2012 Wu Yongwei <wuyongwei at gmail dot com>
+ * Copyright (C) 2008-2013 Wu Yongwei <wuyongwei at gmail dot com>
+ * Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
@ -44,15 +45,16 @@
 * Definitions of internal data structures, declarations of global
 * variables, and function prototypes for the line breaking algorithm.
 *
- * @version	2.3, 2012/10/06
+ * @version 2.4, 2013/11/10
 * @author  Wu Yongwei
+ * @author  Petr Filipsky
 */

 /**
 * Constant value to mark the end of string.  It is not a valid Unicode
 * character.
 */
-#define EOS 0xFFFF
+#define EOS 0xFFFFFFFF

 /**
 * Line break classes.  This is a direct mapping of Table 1 of Unicode
@ -130,6 +132,19 @@ struct LineBreakPropertiesLang
    struct LineBreakProperties *lbp;    /**< Pointer to associated data */
 };

+/**
+ * Context representing internal state of the line breaking algorithm.
+ * This is useful to callers if incremental analysis is wanted.
+ */
+struct LineBreakContext
+{
+    const char *lang;               /**< Language name */
+    struct LineBreakProperties *lbpLang;/**< Pointer to LineBreakProperties */
+    enum LineBreakClass lbcCur;     /**< Breaking class of current codepoint */
+    enum LineBreakClass lbcNew;     /**< Breaking class of next codepoint */
+    enum LineBreakClass lbcLast;    /**< Breaking class of last codepoint */
+};
+
 /**
 * Abstract function interface for #lb_get_next_char_utf8,
 * #lb_get_next_char_utf16, and #lb_get_next_char_utf32.
@ -144,6 +159,13 @@ extern struct LineBreakPropertiesLang lb_prop_lang_map[];
 utf32_t lb_get_next_char_utf8(const utf8_t *s, size_t len, size_t *ip);
 utf32_t lb_get_next_char_utf16(const utf16_t *s, size_t len, size_t *ip);
 utf32_t lb_get_next_char_utf32(const utf32_t *s, size_t len, size_t *ip);
+void lb_init_break_context(
+        struct LineBreakContext* lbpCtx,
+        utf32_t ch,
+        const char* lang);
+int lb_process_next_char(
+        struct LineBreakContext* lbpCtx,
+        utf32_t ch);
 void set_linebreaks(
        const void *s,
        size_t len,
--- a/src/static_libs/libunibreak/wordbreak.c
+++ b/src/static_libs/libunibreak/wordbreak.c
@ -1,10 +1,10 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Word breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
- * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
+ * Copyright (C) 2013 Tom Hacohen <tom at stosb dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
@ -30,6 +30,10 @@
 * Unicode 6.0.0:
 *      <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 *
+ * This library has been updated according to Revision 21, for
+ * Unicode 6.2.0:
+ *      <URL:http://www.unicode.org/reports/tr29/tr29-21.html>
+ *
 * The Unicode Terms of Use are available at
 *      <URL:http://www.unicode.org/copyright.html>
 */
@ -40,7 +44,7 @@
 * Implementation of the word breaking algorithm as described in Unicode
 * Standard Annex 29.
 *
- * @version	2.3, 2013/05/14
+ * @version 2.4, 2013/09/28
 * @author  Tom Hacohen
 */

--- a/src/static_libs/libunibreak/wordbreak.h
+++ b/src/static_libs/libunibreak/wordbreak.h
@ -1,10 +1,10 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Word breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
- * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
+ * Copyright (C) 2013 Tom Hacohen <tom at stosb dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
@ -30,6 +30,10 @@
 * Unicode 6.0.0:
 *      <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 *
+ * This library has been updated according to Revision 21, for
+ * Unicode 6.2.0:
+ *      <URL:http://www.unicode.org/reports/tr29/tr29-21.html>
+ *
 * The Unicode Terms of Use are available at
 *      <URL:http://www.unicode.org/copyright.html>
 */
@ -39,7 +43,7 @@
 *
 * Header file for the word breaking (segmentation) algorithm.
 *
- * @version	2.2, 2012/02/04
+ * @version 2.3, 2013/09/28
 * @author  Tom Hacohen
 */

--- a/src/static_libs/libunibreak/wordbreakdef.h
+++ b/src/static_libs/libunibreak/wordbreakdef.h
@ -1,10 +1,11 @@
-/* vim: set tabstop=4 shiftwidth=4: */
+/* vim: set expandtab tabstop=4 softtabstop=4 shiftwidth=4: */

 /*
 * Word breaking in a Unicode sequence.  Designed to be used in a
 * generic text renderer.
 *
- * Copyright (C) 2012 Tom Hacohen <tom@stosb.com>
+ * Copyright (C) 2013 Tom Hacohen <tom at stosb dot com>
+ * Copyright (C) 2013 Petr Filipsky <philodej at gmail dot com>
 *
 * This software is provided 'as-is', without any express or implied
 * warranty.  In no event will the author be held liable for any damages
@ -30,6 +31,10 @@
 * Unicode 6.0.0:
 *      <URL:http://www.unicode.org/reports/tr29/tr29-17.html>
 *
+ * This library has been updated according to Revision 21, for
+ * Unicode 6.2.0:
+ *      <URL:http://www.unicode.org/reports/tr29/tr29-21.html>
+ *
 * The Unicode Terms of Use are available at
 *      <URL:http://www.unicode.org/copyright.html>
 */
@ -40,13 +45,14 @@
 * Definitions of internal data structures, declarations of global
 * variables, and function prototypes for the word breaking algorithm.
 *
- * @version	2.2, 2013/05/14
+ * @version 2.4, 2013/11/10
 * @author  Tom Hacohen
+ * @author  Petr Filipsky
 */

 /**
 * Word break classes.  This is a direct mapping of Table 3 of Unicode
- * Standard Annex 29, Revision 17.
+ * Standard Annex 29, Revision 23.
 */
 enum WordBreakClass
 {
@ -64,6 +70,9 @@ enum WordBreakClass
    WBP_Numeric,
    WBP_ExtendNumLet,
    WBP_Regional,
+    WBP_Hebrew,
+    WBP_Single,
+    WBP_Double,
    WBP_Any
 };