Date: Mon, 21 Jul 2014 15:23:50 +0400 From: Dmitry Selyutin <ghostman.sd@gmail.com> To: soc-status@freebsd.org, Pedro Giffuni <pfg@freebsd.org>, David Chisnall <theraven@freebsd.org> Subject: Report #4: Unicode support Message-ID: <CAMqzjetbe7x-mWYjVL5OPu39pv4xG4Zmt%2Bj8Hyi1cvPRxmWVSw@mail.gmail.com>
next in thread | raw e-mail | index | archive | help
Hello everyone, here comes my report on progress during these two weeks. Pedro, David, excuse me for duplication, please: I should have just included you into this letter instead of sending you two letters. I've just realized that I've forgotten to write the report. :-( I've been intensively testing my normalization implementation and discovered that it was working incorrectly. Moreover, it's code seems to be completely cryptic, so I've rewritten it from the scratch. Now it seems to work correctly (at least it passes Unicode tests). The things that I've completely ignored are canonicalization and combining characters classes. I've decided to publish it in git repo and integrate it to head later, since it's a real pain to recompile the entire system every several hours after changes in source code (especially if changes are not large). I've also thought about your message where you doubt about project structure. We'll have `uniext.h' header, which is included if UNICODE_ADDENDA macro is defined. This header defines the following functions: strcanon, strcanon_l, wcscanon, strnorm, strnorm_l, wcsnorm, wcclass. The last one was written as a helper function which is used inside wcscanon and wcsnorm, but I thought that it also may be useful as a standalone function. I've rewritten algorithms: now everithing is performed using binary search and hashes, so it's really fast (before the search was linear). Now it works really fast (e.g. for decomposition it works from 10 to 12 times faster than Python's decomposition algorithm). I've also tested it on the wide strings, and it works as expected (at least!). So this part seems to be finished. The last thing to do is to place everything in the right place into the FreeBSD source tree. Here is my testing repo: https://github.com/ghostmansd/uniext. Just use `git clone https://github.com/ghostmansd/uniext'. P.S. You need to use gmake if you want to use my Makefile (I don't know BSD Makefile syntax well). However, all what you need is to add `-Iinclude' flag to CFLAGS, compile everithing in `src', compile `main.c' and link it all together. -- With best regards, Dmitry Selyutin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMqzjetbe7x-mWYjVL5OPu39pv4xG4Zmt%2Bj8Hyi1cvPRxmWVSw>