From owner-soc-status@FreeBSD.ORG Wed Aug 6 01:30:12 2014 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id B3C8FA18; Wed, 6 Aug 2014 01:30:12 +0000 (UTC) Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D9A6025FD; Wed, 6 Aug 2014 01:30:11 +0000 (UTC) Received: by mail-wg0-f52.google.com with SMTP id a1so1815767wgh.11 for ; Tue, 05 Aug 2014 18:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:from:date:message-id:subject:to:content-type; bh=VjvDv341eeqTZr9Zct3Kif41POfKHkHlOTqTNgh3TNM=; b=jODBYG30eyybOh6OlWwGkx4gH0wbN98b28xKqtTHnIgT++FBy0H4mW7NyXRnyXVK/h +bqRHsjLO9Fc7CpoaT7P3LJZKGjtzewHicV+TwMDeNIP21ZeHGZhJe6gY3vjrT2LDyv8 z8sYzwOFjE5Uf0PUhs3PTE+oXMWtHJvtiq48i1G2ViOG7eQLl/Wr7ajKCy8etZz8T+sX kkGtx608Dop2G3Fg+llkGwv4v9oNOHnWKTgTigB84iDLsdJ4DBN9kIZ1S2HHykncMWl9 dAXgW+jpKUGvVlnnou2gUD9hE8fof1ScRCmT5qFlOounkjT5/vPSkhIEtCMXu51rNyfw T51Q== X-Received: by 10.180.86.65 with SMTP id n1mr33268087wiz.41.1407288610036; Tue, 05 Aug 2014 18:30:10 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.40.33 with HTTP; Tue, 5 Aug 2014 18:29:49 -0700 (PDT) Reply-To: ghostmansd@gmail.com From: Dmitry Selyutin Date: Wed, 6 Aug 2014 05:29:49 +0400 Message-ID: Subject: Report #6: Unicode support To: soc-status@freebsd.org, Pedro Giffuni , David Chisnall , Konrad Jankowski Content-Type: text/plain; charset=UTF-8 X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Aug 2014 01:30:12 -0000 Hello everyone! Here are the last news about the Unicode support project[0]. You can always check my repository[1]. During these days I've been working on storing data for Unicode Collation in the more appropriate format than it was before (strange tables with binary search right in C source files). According to one of Pedro's suggestions, I've used types and functions to finish it. I've tried to achive portability for all platforms that support in the way that FreeBSD does (I took care of some subtle things like endianness too). Full set of functions to work with collation databases is provided as well as Python bindings (they were written while creating CLDR database, but seemed to be so useful that I decided to commit them too). Right now code lives under lib/libcolldb directory, though it seems there may be a better place for it (especially for Python bindings). Any suggestions? I'd like to leave this stuff visible (first I wanted to leave it hidden in xlocale_private.h, but I found it really useful) for other developers, but the first what came to my mind was library. I was too tired to rewrite all existing functions to make them support collation databases; I hope to finish it tomorrow. Normalization and canonicalization parts are already done; as it seems, collation itself is also nearing completion, though there is still much to be done. I'd like to thank Pedro and especially Konrad Jankowski, who found the strength to return to his project and gave a helpful hand (and gives right now). There is still much to be done: since I got hooked by this part of work, I couldn't respond to all Pedro's and Konrad's mails during these days. So the nearest targets are to rewrite collation algorithms again to let them work and to begin testing. [0] https://wiki.freebsd.org/SummerOfCode2014/Unicode [1] https://socsvn.freebsd.org/socsvn/soc2014/ghostmansd -- With best regards, Dmitry Selyutin