From owner-freebsd-i18n@FreeBSD.ORG Tue Aug 26 21:09:20 2014 Return-Path: Delivered-To: freebsd-i18n@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C7FCA7A9; Tue, 26 Aug 2014 21:09:20 +0000 (UTC) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C46CB3DE1; Tue, 26 Aug 2014 21:09:19 +0000 (UTC) Received: by mail-wi0-f181.google.com with SMTP id bs8so4809314wib.14 for ; Tue, 26 Aug 2014 14:09:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:from:date:message-id:subject:to:content-type; bh=cnr/8mqoU7mb6qYvKc4seokOzUYjWnJMKr03dvVdyy0=; b=POPCFLK6cYQ5XsUK+ftFfMaN2wT1kmDRk5YA38YNdjvCM+/6KC10ZDsj+rgdoprLfY qYHeeFS9G6C1eOLvkC/StQ5pUW7q2wu1yubFYqkKOxQjeGG8MQz5j1YM/ISy4G4sLZji O/Vm6q3/rWRNN6rg+FbTR+vX+aiea1QrABC6AbS1t59nccKYOoXJvuFm8FaLJbZdCEYm NfXhZgXiL7uWBp+ywkTX8zEFKZEp77r1NYwowiV+EGxtr3w8AOfGfQ66/evSfs0AmG0J 8NGukji3nyrTlDD5rWI6V+JOdZr5J3/mwW5qaRYzdd4zODUWpYIBTiyFbFOkcuC/pD7U KwNQ== X-Received: by 10.180.20.40 with SMTP id k8mr24479098wie.38.1409087358120; Tue, 26 Aug 2014 14:09:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.48.9 with HTTP; Tue, 26 Aug 2014 14:08:58 -0700 (PDT) Reply-To: ghostmansd@gmail.com From: Dmitry Selyutin Date: Wed, 27 Aug 2014 01:08:58 +0400 Message-ID: Subject: Report #9: Unicode support To: soc-status@freebsd.org, Pedro Giffuni , David Chisnall , Konrad Jankowski , freebsd-i18n@freebsd.org Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2014 21:09:21 -0000 Hello everyone! Here are the last news about the Unicode support project[0]. You can always check my repository[1]. During these days I had hardware problems (my HDD peacefully died), so development didn't progress so much as before. However, I've eliminated these problems, so I tried to fix bugs and reorganize the code as much as possible. Now everything shall compile. I decided to use __attribute__((constructor)) and __attribute__((destructor)), since I don't know if there exist a better way to open a file once in the startup and closing it when all routines close. I've found one or two occurrences of this construction in FreeBSD code; AFAICT it is rather common in clang and gcc, so I decided to use it. Hopefully it will also allow us to use root collation database in the embedded systems (if any such system really needs collation algorithm). As you may know we need a tool that can convert collation text files obtained from unicode.org to new collation database (colldb) format. There is a version of this tool written in Python (share/examples/colldb/colldb.py). IIRC we can't use Python when we have a base system though, so it seems that we need to written such tool using C language. I was thinking of lex/yacc combo; I've never tried it, but I think it shouldn't be too hard to write a tool using it. I'd like to know your opinions about this task. I've already written a man page (bin/colldb/colldb.1). The only thing which seems dubious is that I decided to use the same name as for the library itself (well, it seems I have a lack of imagination). So we have both colldb.1 and colldb.3 man pages. The other thing I'd really like to do is to really force network byte order in collation database format (I'm sure I've seen a way to do it in Berkley databases). It's a pity that I have no platform with big-endian (or even PDP!) byte order. Any help here is highly appreciated (as well as your thoughts about lex/yacc, i.e. thoughts whether it fits well to my task). Since Google Summer of Code period has passed, I'd like to thank both my mentors, Pedro and David, who gave me a helping hand during this project, and especially Konrad Jankowski, who found time to answer my questions and help me too. Though GSoC is closed, I'd like to stay with FreeBSD project. First of all, I want to finish and bring to mind this project: I don't think it's really finished, especially its testing part, though it seems that new collation algorithm can already be used. Then I'd like to work in other parts of my project, especially in internationalization parts. I'd also like to improve my own library, qc, to provide a rich API for *BSD and POSIX systems, since I acutely feel the lack of such API. If it is possible to stay with project, I'd be very happy to do it. :-) P.S. Does anyone knows how to get diff between only for my branch (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to give everything what all FreeBSD's GSoC have done, so I need some other command. Thanks for your help! [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd -- With best regards, Dmitry Selyutin