Date: Wed, 27 Aug 2014 14:51:01 +0400 From: Dmitry Selyutin <ghostman.sd@gmail.com> To: Pedro Giffuni <pfg@freebsd.org> Cc: soc-status@freebsd.org, Konrad Jankowski <versus@freebsd.org>, freebsd-i18n@freebsd.org Subject: Re: Report #9: Unicode support Message-ID: <CAMqzjeuUrpOfkX41bTY62NRNap0NetCKzTpSv5JaUC4Qvh59sA@mail.gmail.com> In-Reply-To: <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX%2B6j=wpO7zw@mail.gmail.com> References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com> <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org> <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX%2B6j=wpO7zw@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
I've just seen EuroBSDCon's calendar page and it seems that it is impossible to join it (i.e. I missed the application deadline).[0] Well, may be next year? :-) 2014-08-27 14:48 GMT+04:00 Dmitry Selyutin <ghostman.sd@gmail.com>: > Hi, Pedro, Baptiste, > > first of all thanks for your congratulations and kind words! The > project was really harder that anything I've ever met in my life, but > at the same time it was the most interesting one. :-) And still > remains! ;-) > >> That is not really uncommon :) > Well, so I can leave it as it is. :-) > >> The project does have access to sparc64 machines so if you have some >> self-contained test we can run it for you or we can test it as a routine= libc >> test after committing. > Hopefully I can finish it today or in the next two days. > >> You never answered my question concerning the fallback options. > Really? I thought that I answered. :-D Well, I'll try to explain > again. DUCET seems to be a bit obsolete collation table, which can be > more or less successfully used with real languages. However, in real > world it is completely unusable, so ICU and other use CLDR collation > table, which supports more levels. I started with DUCET since there > was much more information about it, but then I found that it doesn't > fit well, so I switched to CLDR. We have DUCET table somewhere in our > revisions though; as a fallback option, it still may be useful, so I > can restore it if you want. > >> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. > Well, I think I'll do it right after exams. bdb AFAIK is deprecated > from Linux (though it can be used as bdb46 or something similar). I > don't know reasons why they did such thing; it would be great if we > could use a tool which can be used on different platforms without > modifications and tons of conditional define's and undef's. > >> It has simple API way easier to use, the db format is endian safe and fi= nal file >> is smaller than equivalent in bdb format. > It sounds great! > >> I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The >> FreeBSD Foundation will be allocating funds for students that want to go= . >> I won=E2=80=99t be there (I am a bit far away) but David and other devel= opers will >> likely be. > Well, that depends on whether I pass my exams for the postgraduate > course or not. I'd really like to listen to more experienced > developers and may be even talk to other people about work which I did > to better understand the community's opinions. > > 2014-08-27 3:17 GMT+04:00 Pedro Giffuni <pfg@freebsd.org>: >> Hi Baptiste; >> >> >> On 08/26/14 17:16, Baptiste Daroussin wrote: >>> >>> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: >>>> >>>> Hello everyone! >>>> >>>> Here are the last news about the Unicode support project[0]. >>>> You can always check my repository[1]. >>>> >>>> During these days I had hardware problems (my HDD peacefully died), so >>>> development didn't progress so much as before. However, I've >>>> eliminated these problems, so I tried to fix bugs and reorganize the >>>> code as much as possible. Now everything shall compile. >>>> >>>> I decided to use __attribute__((constructor)) and >>>> __attribute__((destructor)), since I don't know if there exist a >>>> better way to open a file once in the startup and closing it when all >>>> routines close. I've found one or two occurrences of this construction >>>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I >>>> decided to use it. Hopefully it will also allow us to use root >>>> collation database in the embedded systems (if any such system really >>>> needs collation algorithm). >>>> >>>> As you may know we need a tool that can convert collation text files >>>> obtained from unicode.org to new collation database (colldb) format. >>>> There is a version of this tool written in Python >>>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we >>>> have a base system though, so it seems that we need to written such >>>> tool using C language. I was thinking of lex/yacc combo; I've never >>>> tried it, but I think it shouldn't be too hard to write a tool using >>>> it. I'd like to know your opinions about this task. >>>> I've already written a man page (bin/colldb/colldb.1). The only thing >>>> which seems dubious is that I decided to use the same name as for the >>>> library itself (well, it seems I have a lack of imagination). So we >>>> have both colldb.1 and colldb.3 man pages. >>>> >>>> The other thing I'd really like to do is to really force network byte >>>> order in collation database format (I'm sure I've seen a way to do it >>>> in Berkley databases). It's a pity that I have no platform with >>>> big-endian (or even PDP!) byte order. Any help here is highly >>>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts >>>> whether it fits well to my task). >>>> >>>> Since Google Summer of Code period has passed, I'd like to thank both >>>> my mentors, Pedro and David, who gave me a helping hand during this >>>> project, and especially Konrad Jankowski, who found time to answer my >>>> questions and help me too. Though GSoC is closed, I'd like to stay >>>> with FreeBSD project. First of all, I want to finish and bring to mind >>>> this project: I don't think it's really finished, especially its >>>> testing part, though it seems that new collation algorithm can already >>>> be used. Then I'd like to work in other parts of my project, >>>> especially in internationalization parts. I'd also like to improve my >>>> own library, qc, to provide a rich API for *BSD and POSIX systems, >>>> since I acutely feel the lack of such API. If it is possible to stay >>>> with project, I'd be very happy to do it. :-) >>>> >>>> P.S. Does anyone knows how to get diff between only for my branch >>>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to >>>> give everything what all FreeBSD's GSoC have done, so I need some >>>> other command. Thanks for your help! >>>> >>>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode >>>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >>>> >>> First thank you very much for your work on this subject this is highly >>> needed. >>> >>> Concerning the db format have you thought about using the new netbsd >>> constant >>> database format? >>> >>> It has simple API way easier to use, the db format is endian safe and >>> final file >>> is smaller than equivalent in bdb format. >>> >>> Lots of areas of FreeBSD could benefit from using this cdb format as we= ll >>> imho. >> >> >> While here, let me congratulate Dmitry. The Unicode Collation Algorithm = is >> not something easy/fun to work with. >> >> Indeed both David and Konrad suggested it (or tinycdb). The reason for >> going bdb was that we had time constraints and bdb is already in libc. >> >> FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppo= rt >> in Illumos which basically implements their own very efficient format. W= e >> ended up re-using the tools that libc already has to better focus on the >> collation part. >> >> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. >> >> As Dmitry noted there are still details to work out and we have to run t= ests >> and get the code reviewed but all in all I am very satisfied with the >> advance >> in this GSoC. >> >> Best regards, >> >> Pedro. >> >> [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef >> [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/ >> > > > > -- > With best regards, > Dmitry Selyutin --=20 With best regards, Dmitry Selyutin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMqzjeuUrpOfkX41bTY62NRNap0NetCKzTpSv5JaUC4Qvh59sA>