Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 23 Jun 2014 02:34:01 +0400
From:      Dmitry Selyutin <ghostman.sd@gmail.com>
To:        soc-status@FreeBSD.org, Pedro Giffuni <pfg@FreeBSD.org>,  David Chisnall <theraven@freebsd.org>
Subject:   Report #3: Unicode support
Message-ID:  <53A759D9.1010804@gmail.com>

next in thread | raw e-mail | index | archive | help
Hello everyone!

I'm glad to tell that I've finished a base sketch of the Unicode
Normalization Algorithm, which seems to work. Files were recently
updated to the most recent version of the Unicode (7.0.0).
Of course this code needs some tuning, e.g. in the worst case one has to
iterate over the whole table in order to check if character can be
normalized; I'm going to fix it using other structure, where each byte
denotes 8 characters, while each bit of this byte means flag if this
character may or may not be normalized. Thus we need to have two arrays
of 139264 characters (for composition and decomposition respectively),
where the state of the each character may be determined by simple
division. That's just a proposal; everyone is welcome to propose a
better way to handle such things.
Of course, the other important part is to prepare a testing suite, but
for this part I have to consult with my mentors, Pedro and David.

-- 
With best regards,
Dmitry Selyutin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53A759D9.1010804>