From owner-soc-status@FreeBSD.ORG Wed Aug 27 10:51:24 2014 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 328135C5; Wed, 27 Aug 2014 10:51:24 +0000 (UTC) Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com [IPv6:2a00:1450:400c:c00::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0D58B3118; Wed, 27 Aug 2014 10:51:22 +0000 (UTC) Received: by mail-wg0-f42.google.com with SMTP id l18so26061wgh.25 for ; Wed, 27 Aug 2014 03:51:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; bh=xs76FovOPQA6k17V9jDJw/utYC9/mTQ60nnu07rrW6A=; b=E/ddpuRoflzeJ9ReuzKCnabiaEVnwS5LHyY5ZH1iWNFxv90SwBrf6n/YUtZeT1qmh8 7WchhmhGnEC9Dnct57EEmHKClntGqDnBHUkTVteqfRnvlvunODYH0vY1buDraDmbVgks QqVJ8973EFYs0uel6uhhcx4PEeVKHYdw7LKyd3HoHq5I4F8lJgr1PZnZLsvoS2NjDq4T OG4epC8uriVesasH3kP9OBy0NBWmURAjD9ytr+vqmMs2BQxOBimFRk6r0h9a/w6b4iUR 9Hpvqr0jv6Qdrmwh5xgl2cGH8ee46M2rhczmT/TquZNZuNicaYTXkQPCV7sTmsC8OPef 0i9A== X-Received: by 10.180.92.134 with SMTP id cm6mr28112076wib.72.1409136681311; Wed, 27 Aug 2014 03:51:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.48.9 with HTTP; Wed, 27 Aug 2014 03:51:01 -0700 (PDT) Reply-To: ghostmansd@gmail.com In-Reply-To: References: <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org> From: Dmitry Selyutin Date: Wed, 27 Aug 2014 14:51:01 +0400 Message-ID: Subject: Re: Report #9: Unicode support To: Pedro Giffuni Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: soc-status@freebsd.org, Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Aug 2014 10:51:24 -0000 I've just seen EuroBSDCon's calendar page and it seems that it is impossible to join it (i.e. I missed the application deadline).[0] Well, may be next year? :-) 2014-08-27 14:48 GMT+04:00 Dmitry Selyutin : > Hi, Pedro, Baptiste, > > first of all thanks for your congratulations and kind words! The > project was really harder that anything I've ever met in my life, but > at the same time it was the most interesting one. :-) And still > remains! ;-) > >> That is not really uncommon :) > Well, so I can leave it as it is. :-) > >> The project does have access to sparc64 machines so if you have some >> self-contained test we can run it for you or we can test it as a routine= libc >> test after committing. > Hopefully I can finish it today or in the next two days. > >> You never answered my question concerning the fallback options. > Really? I thought that I answered. :-D Well, I'll try to explain > again. DUCET seems to be a bit obsolete collation table, which can be > more or less successfully used with real languages. However, in real > world it is completely unusable, so ICU and other use CLDR collation > table, which supports more levels. I started with DUCET since there > was much more information about it, but then I found that it doesn't > fit well, so I switched to CLDR. We have DUCET table somewhere in our > revisions though; as a fallback option, it still may be useful, so I > can restore it if you want. > >> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. > Well, I think I'll do it right after exams. bdb AFAIK is deprecated > from Linux (though it can be used as bdb46 or something similar). I > don't know reasons why they did such thing; it would be great if we > could use a tool which can be used on different platforms without > modifications and tons of conditional define's and undef's. > >> It has simple API way easier to use, the db format is endian safe and fi= nal file >> is smaller than equivalent in bdb format. > It sounds great! > >> I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The >> FreeBSD Foundation will be allocating funds for students that want to go= . >> I won=E2=80=99t be there (I am a bit far away) but David and other devel= opers will >> likely be. > Well, that depends on whether I pass my exams for the postgraduate > course or not. I'd really like to listen to more experienced > developers and may be even talk to other people about work which I did > to better understand the community's opinions. > > 2014-08-27 3:17 GMT+04:00 Pedro Giffuni : >> Hi Baptiste; >> >> >> On 08/26/14 17:16, Baptiste Daroussin wrote: >>> >>> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: >>>> >>>> Hello everyone! >>>> >>>> Here are the last news about the Unicode support project[0]. >>>> You can always check my repository[1]. >>>> >>>> During these days I had hardware problems (my HDD peacefully died), so >>>> development didn't progress so much as before. However, I've >>>> eliminated these problems, so I tried to fix bugs and reorganize the >>>> code as much as possible. Now everything shall compile. >>>> >>>> I decided to use __attribute__((constructor)) and >>>> __attribute__((destructor)), since I don't know if there exist a >>>> better way to open a file once in the startup and closing it when all >>>> routines close. I've found one or two occurrences of this construction >>>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I >>>> decided to use it. Hopefully it will also allow us to use root >>>> collation database in the embedded systems (if any such system really >>>> needs collation algorithm). >>>> >>>> As you may know we need a tool that can convert collation text files >>>> obtained from unicode.org to new collation database (colldb) format. >>>> There is a version of this tool written in Python >>>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we >>>> have a base system though, so it seems that we need to written such >>>> tool using C language. I was thinking of lex/yacc combo; I've never >>>> tried it, but I think it shouldn't be too hard to write a tool using >>>> it. I'd like to know your opinions about this task. >>>> I've already written a man page (bin/colldb/colldb.1). The only thing >>>> which seems dubious is that I decided to use the same name as for the >>>> library itself (well, it seems I have a lack of imagination). So we >>>> have both colldb.1 and colldb.3 man pages. >>>> >>>> The other thing I'd really like to do is to really force network byte >>>> order in collation database format (I'm sure I've seen a way to do it >>>> in Berkley databases). It's a pity that I have no platform with >>>> big-endian (or even PDP!) byte order. Any help here is highly >>>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts >>>> whether it fits well to my task). >>>> >>>> Since Google Summer of Code period has passed, I'd like to thank both >>>> my mentors, Pedro and David, who gave me a helping hand during this >>>> project, and especially Konrad Jankowski, who found time to answer my >>>> questions and help me too. Though GSoC is closed, I'd like to stay >>>> with FreeBSD project. First of all, I want to finish and bring to mind >>>> this project: I don't think it's really finished, especially its >>>> testing part, though it seems that new collation algorithm can already >>>> be used. Then I'd like to work in other parts of my project, >>>> especially in internationalization parts. I'd also like to improve my >>>> own library, qc, to provide a rich API for *BSD and POSIX systems, >>>> since I acutely feel the lack of such API. If it is possible to stay >>>> with project, I'd be very happy to do it. :-) >>>> >>>> P.S. Does anyone knows how to get diff between only for my branch >>>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to >>>> give everything what all FreeBSD's GSoC have done, so I need some >>>> other command. Thanks for your help! >>>> >>>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode >>>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >>>> >>> First thank you very much for your work on this subject this is highly >>> needed. >>> >>> Concerning the db format have you thought about using the new netbsd >>> constant >>> database format? >>> >>> It has simple API way easier to use, the db format is endian safe and >>> final file >>> is smaller than equivalent in bdb format. >>> >>> Lots of areas of FreeBSD could benefit from using this cdb format as we= ll >>> imho. >> >> >> While here, let me congratulate Dmitry. The Unicode Collation Algorithm = is >> not something easy/fun to work with. >> >> Indeed both David and Konrad suggested it (or tinycdb). The reason for >> going bdb was that we had time constraints and bdb is already in libc. >> >> FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppo= rt >> in Illumos which basically implements their own very efficient format. W= e >> ended up re-using the tools that libc already has to better focus on the >> collation part. >> >> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. >> >> As Dmitry noted there are still details to work out and we have to run t= ests >> and get the code reviewed but all in all I am very satisfied with the >> advance >> in this GSoC. >> >> Best regards, >> >> Pedro. >> >> [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef >> [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/ >> > > > > -- > With best regards, > Dmitry Selyutin --=20 With best regards, Dmitry Selyutin