From owner-soc-status@FreeBSD.ORG Tue Aug 26 22:16:16 2014 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B07CEA04; Tue, 26 Aug 2014 22:16:16 +0000 (UTC) Received: from mail-wi0-x232.google.com (mail-wi0-x232.google.com [IPv6:2a00:1450:400c:c05::232]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A2FCB35BC; Tue, 26 Aug 2014 22:16:15 +0000 (UTC) Received: by mail-wi0-f178.google.com with SMTP id hi2so4876668wib.5 for ; Tue, 26 Aug 2014 15:16:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=ARZ6+7JpQ5CLPbInYPEajOm/CEh+32hGx7okdeynTgs=; b=oB2CL8XVmaPDovtDZOo+01pNdU73gtZTl9Nb76s3ayoeKR3yYhQDr85rAr6WhVUsqz +DkoTQQJlEr3lK8eTe0DXbczzQIPyWlGvL0/J1/+GcTxm/arXzDGEtbayrnstGIFA/qT J6mgrV0bchA3KHeFmkJA3wmuk5NXnrLtbu4kfpM/8CayEGaV3zr8mZlm9YYIpFnpWhFI Lrf7z6gQsWw7jVR20G4+fvHRm4iI9ez42jaX9lzPWSuQ4vnmOmM5PuTbK/mFsuPJ2/bs 0UiTqmYzaiCvFwHFGBQo868vhAm0i7RduJfXLRiIMCtyn2bDaQ8Z1c7JKAoDRouP09QW RlIw== X-Received: by 10.180.149.169 with SMTP id ub9mr24451406wib.32.1409091373535; Tue, 26 Aug 2014 15:16:13 -0700 (PDT) Received: from ivaldir.etoilebsd.net ([2001:41d0:8:db4c::1]) by mx.google.com with ESMTPSA id hi4sm11541340wjb.46.2014.08.26.15.16.12 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Aug 2014 15:16:12 -0700 (PDT) Sender: Baptiste Daroussin Date: Wed, 27 Aug 2014 00:16:10 +0200 From: Baptiste Daroussin To: ghostmansd@gmail.com Subject: Re: Report #9: Unicode support Message-ID: <20140826221610.GD65120@ivaldir.etoilebsd.net> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sXc4Kmr5FA7axrvy" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Cc: soc-status@freebsd.org, Pedro Giffuni , Konrad Jankowski , freebsd-i18n@freebsd.org X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 22:16:16 -0000 --sXc4Kmr5FA7axrvy Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote: > Hello everyone! >=20 > Here are the last news about the Unicode support project[0]. > You can always check my repository[1]. >=20 > During these days I had hardware problems (my HDD peacefully died), so > development didn't progress so much as before. However, I've > eliminated these problems, so I tried to fix bugs and reorganize the > code as much as possible. Now everything shall compile. >=20 > I decided to use __attribute__((constructor)) and > __attribute__((destructor)), since I don't know if there exist a > better way to open a file once in the startup and closing it when all > routines close. I've found one or two occurrences of this construction > in FreeBSD code; AFAICT it is rather common in clang and gcc, so I > decided to use it. Hopefully it will also allow us to use root > collation database in the embedded systems (if any such system really > needs collation algorithm). >=20 > As you may know we need a tool that can convert collation text files > obtained from unicode.org to new collation database (colldb) format. > There is a version of this tool written in Python > (share/examples/colldb/colldb.py). IIRC we can't use Python when we > have a base system though, so it seems that we need to written such > tool using C language. I was thinking of lex/yacc combo; I've never > tried it, but I think it shouldn't be too hard to write a tool using > it. I'd like to know your opinions about this task. > I've already written a man page (bin/colldb/colldb.1). The only thing > which seems dubious is that I decided to use the same name as for the > library itself (well, it seems I have a lack of imagination). So we > have both colldb.1 and colldb.3 man pages. >=20 > The other thing I'd really like to do is to really force network byte > order in collation database format (I'm sure I've seen a way to do it > in Berkley databases). It's a pity that I have no platform with > big-endian (or even PDP!) byte order. Any help here is highly > appreciated (as well as your thoughts about lex/yacc, i.e. thoughts > whether it fits well to my task). >=20 > Since Google Summer of Code period has passed, I'd like to thank both > my mentors, Pedro and David, who gave me a helping hand during this > project, and especially Konrad Jankowski, who found time to answer my > questions and help me too. Though GSoC is closed, I'd like to stay > with FreeBSD project. First of all, I want to finish and bring to mind > this project: I don't think it's really finished, especially its > testing part, though it seems that new collation algorithm can already > be used. Then I'd like to work in other parts of my project, > especially in internationalization parts. I'd also like to improve my > own library, qc, to provide a rich API for *BSD and POSIX systems, > since I acutely feel the lack of such API. If it is possible to stay > with project, I'd be very happy to do it. :-) >=20 > P.S. Does anyone knows how to get diff between only for my branch > (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to > give everything what all FreeBSD's GSoC have done, so I need some > other command. Thanks for your help! >=20 > [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode > [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd >=20 First thank you very much for your work on this subject this is highly need= ed. Concerning the db format have you thought about using the new netbsd consta= nt database format? It has simple API way easier to use, the db format is endian safe and final= file is smaller than equivalent in bdb format. Lots of areas of FreeBSD could benefit from using this cdb format as well i= mho. regards, Bapt --sXc4Kmr5FA7axrvy Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAlP9ByoACgkQ8kTtMUmk6Ez2IACgjTEpHU5zDDx4IdA99j7/O1Ty KT0AnjcnBEstTI1ZjNe8yurWOur1fi3l =taUl -----END PGP SIGNATURE----- --sXc4Kmr5FA7axrvy--