From owner-soc-status@FreeBSD.ORG  Wed Aug 27 10:51:24 2014
Return-Path: <owner-soc-status@FreeBSD.ORG>
Delivered-To: soc-status@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 328135C5;
 Wed, 27 Aug 2014 10:51:24 +0000 (UTC)
Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com
 [IPv6:2a00:1450:400c:c00::22a])
 (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0D58B3118;
 Wed, 27 Aug 2014 10:51:22 +0000 (UTC)
Received: by mail-wg0-f42.google.com with SMTP id l18so26061wgh.25
 for <multiple recipients>; Wed, 27 Aug 2014 03:51:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:reply-to:in-reply-to:references:from:date:message-id
 :subject:to:cc:content-type:content-transfer-encoding;
 bh=xs76FovOPQA6k17V9jDJw/utYC9/mTQ60nnu07rrW6A=;
 b=E/ddpuRoflzeJ9ReuzKCnabiaEVnwS5LHyY5ZH1iWNFxv90SwBrf6n/YUtZeT1qmh8
 7WchhmhGnEC9Dnct57EEmHKClntGqDnBHUkTVteqfRnvlvunODYH0vY1buDraDmbVgks
 QqVJ8973EFYs0uel6uhhcx4PEeVKHYdw7LKyd3HoHq5I4F8lJgr1PZnZLsvoS2NjDq4T
 OG4epC8uriVesasH3kP9OBy0NBWmURAjD9ytr+vqmMs2BQxOBimFRk6r0h9a/w6b4iUR
 9Hpvqr0jv6Qdrmwh5xgl2cGH8ee46M2rhczmT/TquZNZuNicaYTXkQPCV7sTmsC8OPef
 0i9A==
X-Received: by 10.180.92.134 with SMTP id cm6mr28112076wib.72.1409136681311;
 Wed, 27 Aug 2014 03:51:21 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.48.9 with HTTP; Wed, 27 Aug 2014 03:51:01 -0700 (PDT)
Reply-To: ghostmansd@gmail.com
In-Reply-To: <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX+6j=wpO7zw@mail.gmail.com>
References: <CAMqzjesx=uhUzmTEJEq8zoxkhWXBtYOXVXQ1bmiTiEw0=-gF0w@mail.gmail.com>
 <20140826221610.GD65120@ivaldir.etoilebsd.net> <53FD1599.7040708@freebsd.org>
 <CAMqzjesGZmpXgHHvOQqOHzTwZJK=KZNyDaC9QkTX+6j=wpO7zw@mail.gmail.com>
From: Dmitry Selyutin <ghostman.sd@gmail.com>
Date: Wed, 27 Aug 2014 14:51:01 +0400
Message-ID: <CAMqzjeuUrpOfkX41bTY62NRNap0NetCKzTpSv5JaUC4Qvh59sA@mail.gmail.com>
Subject: Re: Report #9: Unicode support
To: Pedro Giffuni <pfg@freebsd.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: soc-status@freebsd.org, Konrad Jankowski <versus@freebsd.org>,
 freebsd-i18n@freebsd.org
X-BeenThere: soc-status@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Summer of Code Status Reports and Discussion <soc-status.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/soc-status>,
 <mailto:soc-status-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/soc-status/>
List-Post: <mailto:soc-status@freebsd.org>
List-Help: <mailto:soc-status-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/soc-status>,
 <mailto:soc-status-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Aug 2014 10:51:24 -0000

I've just seen EuroBSDCon's calendar page and it seems that it is
impossible to join it (i.e. I missed the application deadline).[0]
Well, may be next year? :-)

2014-08-27 14:48 GMT+04:00 Dmitry Selyutin <ghostman.sd@gmail.com>:
> Hi, Pedro, Baptiste,
>
> first of all thanks for your congratulations and kind words! The
> project was really harder that anything I've ever met in my life, but
> at the same time it was the most interesting one. :-) And still
> remains! ;-)
>
>> That is not really uncommon :)
> Well, so I can leave it as it is. :-)
>
>> The project does have access to sparc64 machines so if you have some
>> self-contained test we can run it for you or we can test it as a routine=
 libc
>> test after committing.
> Hopefully I can finish it today or in the next two days.
>
>> You never answered my question concerning the fallback options.
> Really? I thought that I answered. :-D Well, I'll try to explain
> again. DUCET seems to be a bit obsolete collation table, which can be
> more or less successfully used with real languages. However, in real
> world it is completely unusable, so ICU and other use CLDR collation
> table, which supports more levels. I started with DUCET since there
> was much more information about it, but then I found that it doesn't
> fit well, so I switched to CLDR. We have DUCET table somewhere in our
> revisions though; as a fallback option, it still may be useful, so I
> can restore it if you want.
>
>> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.
> Well, I think I'll do it right after exams. bdb AFAIK is deprecated
> from Linux (though it can be used as bdb46 or something similar). I
> don't know reasons why they did such thing; it would be great if we
> could use a tool which can be used on different platforms without
> modifications and tons of conditional define's and undef's.
>
>> It has simple API way easier to use, the db format is endian safe and fi=
nal file
>> is smaller than equivalent in bdb format.
> It sounds great!
>
>> I do want to encourage you to go to EuroBSDCon 2014 in Sofia. The
>> FreeBSD Foundation will be allocating funds for students that want to go=
.
>> I won=E2=80=99t be there (I am a bit far away) but David and other devel=
opers will
>> likely be.
> Well, that depends on whether I pass my exams for the postgraduate
> course or not. I'd really like to listen to more experienced
> developers and may be even talk to other people about work which I did
> to better understand the community's opinions.
>
> 2014-08-27 3:17 GMT+04:00 Pedro Giffuni <pfg@freebsd.org>:
>> Hi Baptiste;
>>
>>
>> On 08/26/14 17:16, Baptiste Daroussin wrote:
>>>
>>> On Wed, Aug 27, 2014 at 01:08:58AM +0400, Dmitry Selyutin wrote:
>>>>
>>>> Hello everyone!
>>>>
>>>> Here are the last news about the Unicode support project[0].
>>>> You can always check my repository[1].
>>>>
>>>> During these days I had hardware problems (my HDD peacefully died), so
>>>> development didn't progress so much as before. However, I've
>>>> eliminated these problems, so I tried to fix bugs and reorganize the
>>>> code as much as possible. Now everything shall compile.
>>>>
>>>> I decided to use __attribute__((constructor)) and
>>>> __attribute__((destructor)), since I don't know if there exist a
>>>> better way to open a file once in the startup and closing it when all
>>>> routines close. I've found one or two occurrences of this construction
>>>> in FreeBSD code; AFAICT it is rather common in clang and gcc, so I
>>>> decided to use it. Hopefully it will also allow us to use root
>>>> collation database in the embedded systems (if any such system really
>>>> needs collation algorithm).
>>>>
>>>> As you may know we need a tool that can convert collation text files
>>>> obtained from unicode.org to new collation database (colldb) format.
>>>> There is a version of this tool written in Python
>>>> (share/examples/colldb/colldb.py). IIRC we can't use Python when we
>>>> have a base system though, so it seems that we need to written such
>>>> tool using C language. I was thinking of lex/yacc combo; I've never
>>>> tried it, but I think it shouldn't be too hard to write a tool using
>>>> it. I'd like to know your opinions about this task.
>>>> I've already written a man page (bin/colldb/colldb.1). The only thing
>>>> which seems dubious is that I decided to use the same name as for the
>>>> library itself (well, it seems I have a lack of imagination). So we
>>>> have both colldb.1 and colldb.3 man pages.
>>>>
>>>> The other thing I'd really like to do is to really force network byte
>>>> order in collation database format (I'm sure I've seen a way to do it
>>>> in Berkley databases). It's a pity that I have no platform with
>>>> big-endian (or even PDP!) byte order. Any help here is highly
>>>> appreciated (as well as your thoughts about lex/yacc, i.e. thoughts
>>>> whether it fits well to my task).
>>>>
>>>> Since Google Summer of Code period has passed, I'd like to thank both
>>>> my mentors, Pedro and David, who gave me a helping hand during this
>>>> project, and especially Konrad Jankowski, who found time to answer my
>>>> questions and help me too. Though GSoC is closed, I'd like to stay
>>>> with FreeBSD project. First of all, I want to finish and bring to mind
>>>> this project: I don't think it's really finished, especially its
>>>> testing part, though it seems that new collation algorithm can already
>>>> be used. Then I'd like to work in other parts of my project,
>>>> especially in internationalization parts. I'd also like to improve my
>>>> own library, qc, to provide a rich API for *BSD and POSIX systems,
>>>> since I acutely feel the lack of such API. If it is possible to stay
>>>> with project, I'd be very happy to do it. :-)
>>>>
>>>> P.S. Does anyone knows how to get diff between only for my branch
>>>> (i.e. for my part of repository)? svn diff -r $FIRST:$LAST seems to
>>>> give everything what all FreeBSD's GSoC have done, so I need some
>>>> other command. Thanks for your help!
>>>>
>>>> [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode
>>>> [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd
>>>>
>>> First thank you very much for your work on this subject this is highly
>>> needed.
>>>
>>> Concerning the db format have you thought about using the new netbsd
>>> constant
>>> database format?
>>>
>>> It has simple API way easier to use, the db format is endian safe and
>>> final file
>>> is smaller than equivalent in bdb format.
>>>
>>> Lots of areas of FreeBSD could benefit from using this cdb format as we=
ll
>>> imho.
>>
>>
>> While here, let me congratulate Dmitry. The Unicode Collation Algorithm =
is
>> not something easy/fun to work with.
>>
>> Indeed both David and Konrad suggested it (or tinycdb). The reason for
>> going bdb was that we had time constraints and bdb is already in libc.
>>
>> FWIW, Nexenta kindly re-licensed localedef [1] and their collation suppo=
rt
>> in Illumos which basically implements their own very efficient format. W=
e
>> ended up re-using the tools that libc already has to better focus on the
>> collation part.
>>
>> Changing it to use the NetBSD's cdb support[1] shouldn't be difficult.
>>
>> As Dmitry noted there are still details to work out and we have to run t=
ests
>> and get the code reviewed but all in all I am very satisfied with the
>> advance
>> in this GSoC.
>>
>> Best regards,
>>
>> Pedro.
>>
>> [1] https://github.com/Nexenta/illumos-nexenta/tree/republish-localedef
>> [2] http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libc/cdb/
>>
>
>
>
> --
> With best regards,
> Dmitry Selyutin



--=20
With best regards,
Dmitry Selyutin