Date: Mon, 23 Jun 2014 02:34:01 +0400 From: Dmitry Selyutin <ghostman.sd@gmail.com> To: soc-status@FreeBSD.org, Pedro Giffuni <pfg@FreeBSD.org>, David Chisnall <theraven@freebsd.org> Subject: Report #3: Unicode support Message-ID: <53A759D9.1010804@gmail.com>
next in thread | raw e-mail | index | archive | help
Hello everyone! I'm glad to tell that I've finished a base sketch of the Unicode Normalization Algorithm, which seems to work. Files were recently updated to the most recent version of the Unicode (7.0.0). Of course this code needs some tuning, e.g. in the worst case one has to iterate over the whole table in order to check if character can be normalized; I'm going to fix it using other structure, where each byte denotes 8 characters, while each bit of this byte means flag if this character may or may not be normalized. Thus we need to have two arrays of 139264 characters (for composition and decomposition respectively), where the state of the each character may be determined by simple division. That's just a proposal; everyone is welcome to propose a better way to handle such things. Of course, the other important part is to prepare a testing suite, but for this part I have to consult with my mentors, Pedro and David. -- With best regards, Dmitry Selyutin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?53A759D9.1010804>