From owner-soc-status@FreeBSD.ORG Tue Aug 26 23:17:40 2014 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 6B19AD29 for ; Tue, 26 Aug 2014 23:17:40 +0000 (UTC) Received: from nm17-vm1.bullet.mail.bf1.yahoo.com (nm17-vm1.bullet.mail.bf1.yahoo.com [98.139.213.55]) by mx1.freebsd.org (Postfix) with ESMTP id 1A6FA3EFE for ; Tue, 26 Aug 2014 23:17:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1409095052; bh=AaEvnKkmGqLM3zNYd0bc/iD94kjsBKCiZVWdxKNX8ic=; h=Received:Received:Received:X-Yahoo-Newman-Id:X-Yahoo-Newman-Property:X-YMail-OSG:X-Yahoo-SMTP:Message-ID:Date:From:User-Agent:MIME-Version:To:CC:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding:From:Subject; b=turmXWapLknz59OM/vyTcbisxwK+cT+we6SPVM42c5Y4VQ+o/OnxgDgi9kGymEhiOGjABd4LZWXNuhD5dGbg3D1M1vSrvDtYV3DPa2OeDsQGmgqWsUfgpi8fQWLAwKWz0GvcM9XKWbSmcuaBlwAVkgmj97W81ZtgJlR9BnmBTajJi7KpSnfwv+Rh+C9r9GWdPt04Zqwq1j1iqzxmp6MjbO7FTzprDcnnZVWw+QEaZW/rAPpJPVQbcl2hCkMFXlJYidy4Zpr/Yl7RXMAuKBMIpkntYHeWVGpE5xkO2G6aMBkBwyIFmw7CpQ0++N7KYLqDgJzGBxPGqWhl2nlJRk1Xxg== DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s2048; d=yahoo.com; b=WeEHAzoYnG5rDKI/vMCD/y4YbUAAhmjxqoy5XDNU0HYAJkKxnR3KqS+UgPIPm2fSak65KoxlKNy/kUeyM+/KbVYud3W8Nd791iia2B2fSnNycuFpd2vXuDtJMf9jEpqdImsaMSntQ3uMkofWtSpcXzSL9CdCwVJsQBq89V676xHRH5AzSujxiGvwDRTmZNhLAjholMaiRA/G+LgZqNPg6xBJrC7EgJxURBPB2QXZLSYAZmVSgbn86ONdH03cfnr3n5jeZA/XhLDtlcTRW/dXUYgGp7M7rUwkkwLUbM5AauKZ8ovQJWQa4kQ+nKmbU547YwiNoQ9oaMCGr05MHx1z3g==; Received: from [98.139.215.143] by nm17.bullet.mail.bf1.yahoo.com with NNFMP; 26 Aug 2014 23:17:32 -0000 Received: from [68.142.230.69] by tm14.bullet.mail.bf1.yahoo.com with NNFMP; 26 Aug 2014 23:17:32 -0000 Received: from [127.0.0.1] by smtp226.mail.bf1.yahoo.com with NNFMP; 26 Aug 2014 23:17:32 -0000 X-Yahoo-Newman-Id: 116492.29038.bm@smtp226.mail.bf1.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: iGE.ZTAVM1l3c8F6nX2XuCSeGcDmwdS3i_Uc3sRrywgW6tb Z_wY2XEiG5yCFwqyhTxZZHy0irdNX4F31Jwq6CzEzV8.GVfCfs0Z.NOndtTT GsrZzS3eBiFJwrt2FyW5Dvt9PKuwCSpU.cPWeuCFJwvDpND5sYdJi2zYyJT. CiGn78PpoYNiLqcJLjSjjdj8Rygqygc_OhPyPZ94YuMbN3PtJRIycOC8MjS8 VD7U4EUbft.CpThnGm_nJD7UXtOdeJD6lB3CcJkzAIalFfOtDXwBuwVb.q.q qXS.afV5MS2QuY89Exx87lbQ3F.KXCNuzjXPtABM8yvlo5VwGskjD4n2Tj5f xijQQBEaJuTQra_CVWOW.pGZ6LJIZNAij.N2drb82eK9Cx5mgW59k1AfuulI UK3RL2ED4JdgHGL50e93KiLTmcccjSuot9DylUkuWeF6sHALrnioGdY_b8a9 WRHj9nfdwWLp5RKag3A0WwNjAeimjqdX4TIEuhgsBF18e5dQXQWeHD3WTyv7 InVfEPnT2SFnGe1o.0Ii7o1Nk_eU2mWtXgRPMctzMgNxwJWGbfxLPAF_jDaL 3gZaW6aIhdN2JweKYO11sHnRUJYZsLub2wVFVaamf.LA7Z8rqucwdKQE.sef j7YeWf9AWOxnIwLX9Ijxn8gSrdlLWs5bZvqKdZ9fKDDl3ImyA1SOcm8KNrPa pcCAFEWcMtrd2NCMG2980HwylgjWrqhZnKb7M.CNxSCUR0MCWH96NwNN_bss 3EvJCSah4d1EHZpQr8dS.t1HfZJ63bZ9DJJqPW25UbMcdb8QLkNNfl5cid0q 3xzWXJsBNajmUAjWKGtWw7dSl8OjIZJOMV3C_6bweml48O48smBfW2TBNKVU g55av_T6AqHeTqeMucTRUzBPEStJ9y6.GMMs- X-Yahoo-SMTP: xcjD0guswBAZaPPIbxpWwLcp9Unf Message-ID: <53FD1599.7040708@freebsd.org> Date: Tue, 26 Aug 2014 18:17:45 -0500 From: Pedro Giffuni User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:24.0) Gecko/20100101 Thunderbird/24.6.0 MIME-Version: 1.0 To: Baptiste Daroussin , ghostmansd@gmail.com Subject: Re: Report #9: Unicode support References: <20140826221610.GD65120@ivaldir.etoilebsd.net> In-Reply-To: <20140826221610.GD65120@ivaldir.etoilebsd.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: soc-status@freebsd.org, Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 23:17:40 -0000 Hi Baptiste; On 08/26/14 17:16, Baptiste Daroussin wrote: > On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: >> Hello everyone! >> >> Here are the last news about the Unicode support project[0]. >> You can always check my repository[1]. >> >> During these days I had hardware problems (my HDD peacefully died), so >> development didn't progress so much as before. However, I've >> eliminated these problems, so I tried to fix bugs and reorganize the >> code as much as possible. Now everything shall compile. >> >> I decided to use __attribute__((constructor)) and >> __attribute__((destructor)), since I don't know if there exist a >> better way to open a file once in the startup and closing it when all >> routines close. I've found one or two occurrences of this construction >> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I >> decided to use it. Hopefully it will also allow us to use root >> collation database in the embedded systems (if any such system really >> needs collation algorithm). >> >> As you may know we need a tool that can convert collation text files >> obtained from unicode.org to new collation database (colldb) format. >> There is a version of this tool written in Python >> (share/examples/colldb/colldb.py). IIRC we can't use Python when we >> have a base system though, so it seems that we need to written such >> tool using C language. I was thinking of lex/yacc combo; I've never >> tried it, but I think it shouldn't be too hard to write a tool using >> it. I'd like to know your opinions about this task. >> I've already written a man page (bin/colldb/colldb.1). The only thing >> which seems dubious is that I decided to use the same name as for the >> library itself (well, it seems I have a lack of imagination). So we >> have both colldb.1 and colldb.3 man pages. >> >> The other thing I'd really like to do is to really force network byte >> order in collation database format (I'm sure I've seen a way to do it >> in Berkley databases). It's a pity that I have no platform with >> big-endian (or even PDP!) byte order. Any help here is highly >> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts >> whether it fits well to my task). >> >> Since Google Summer of Code period has passed, I'd like to thank both >> my mentors, Pedro and David, who gave me a helping hand during this >> project, and especially Konrad Jankowski, who found time to answer my >> questions and help me too. Though GSoC is closed, I'd like to stay >> with FreeBSD project. First of all, I want to finish and bring to mind >> this project: I don't think it's really finished, especially its >> testing part, though it seems that new collation algorithm can already >> be used. Then I'd like to work in other parts of my project, >> especially in internationalization parts. I'd also like to improve my >> own library, qc, to provide a rich API for *BSD and POSIX systems, >> since I acutely feel the lack of such API. If it is possible to stay >> with project, I'd be very happy to do it. :-) >> >> P.S. Does anyone knows how to get diff between only for my branch >> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to >> give everything what all FreeBSD's GSoC have done, so I need some >> other command. Thanks for your help! >> >> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode >> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >> > First thank you very much for your work on this subject this is highly needed. > > Concerning the db format have you thought about using the new netbsd constant > database format? > > It has simple API way easier to use, the db format is endian safe and final file > is smaller than equivalent in bdb format. > > Lots of areas of FreeBSD could benefit from using this cdb format as well imho. While here, let me congratulate Dmitry. The Unicode Collation Algorithm is not something easy/fun to work with. Indeed both David and Konrad suggested it (or tinycdb). The reason for going bdb was that we had time constraints and bdb is already in libc. FWIW, Nexenta kindly re-licensed localedef [1] and their collation support in Illumos which basically implements their own very efficient format. We ended up re-using the tools that libc already has to better focus on the collation part. Changing it to use the NetBSD's cdb support[1] shouldn't be difficult. As Dmitry noted there are still details to work out and we have to run tests and get the code reviewed but all in all I am very satisfied with the advance in this GSoC. Best regards, Pedro. [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/