From owner-soc-status@FreeBSD.ORG Mon Jul 21 11:24:15 2014 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 00806602; Mon, 21 Jul 2014 11:24:14 +0000 (UTC) Received: from mail-we0-x236.google.com (mail-we0-x236.google.com [IPv6:2a00:1450:400c:c03::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4B33727B5; Mon, 21 Jul 2014 11:24:14 +0000 (UTC) Received: by mail-we0-f182.google.com with SMTP id k48so6150368wev.41 for ; Mon, 21 Jul 2014 04:24:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:from:date:message-id:subject:to:content-type; bh=WEQpSHjTk4OjE5hrBaTFbiaeVdMxGMett9WdY77BhIo=; b=RKSKoXqGwneIuP78tn/keVNpM2MArHxwbY0nOvk1XuuEZDTnelGM1N9ohBI5vEhRtf 83Qbetny2WSpdKIs6JEotD8tBU3iG0DIb7OMJxNcpG2g6LyawkrAYaogO9tKOam7vFcj H7mQCy1BIhtRoWXovn8zei7PAT2kPldj8CT7UREWrMfzu/AW/o/Pmv5Y0SqsclVW3lDI yeQnTubNHdhZWQhhHz1j1LOm0aBoaZHT8fk1PD+97XpNHkdGnCKZ9HgQMev4Ij1WvOoj LTb6Tnvl3ahUTp4PfH70uMiTNnDGfwgdaop09gEHkm2FLtUNCud2nMcof6S0PiRoonXR HAQA== X-Received: by 10.180.36.238 with SMTP id t14mr3418694wij.38.1405941851637; Mon, 21 Jul 2014 04:24:11 -0700 (PDT) MIME-Version: 1.0 Received: by 10.194.40.33 with HTTP; Mon, 21 Jul 2014 04:23:50 -0700 (PDT) Reply-To: ghostmansd@gmail.com From: Dmitry Selyutin Date: Mon, 21 Jul 2014 15:23:50 +0400 Message-ID: Subject: Report #4: Unicode support To: soc-status@freebsd.org, Pedro Giffuni , David Chisnall Content-Type: text/plain; charset=UTF-8 X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Jul 2014 11:24:15 -0000 Hello everyone, here comes my report on progress during these two weeks. Pedro, David, excuse me for duplication, please: I should have just included you into this letter instead of sending you two letters. I've just realized that I've forgotten to write the report. :-( I've been intensively testing my normalization implementation and discovered that it was working incorrectly. Moreover, it's code seems to be completely cryptic, so I've rewritten it from the scratch. Now it seems to work correctly (at least it passes Unicode tests). The things that I've completely ignored are canonicalization and combining characters classes. I've decided to publish it in git repo and integrate it to head later, since it's a real pain to recompile the entire system every several hours after changes in source code (especially if changes are not large). I've also thought about your message where you doubt about project structure. We'll have `uniext.h' header, which is included if UNICODE_ADDENDA macro is defined. This header defines the following functions: strcanon, strcanon_l, wcscanon, strnorm, strnorm_l, wcsnorm, wcclass. The last one was written as a helper function which is used inside wcscanon and wcsnorm, but I thought that it also may be useful as a standalone function. I've rewritten algorithms: now everithing is performed using binary search and hashes, so it's really fast (before the search was linear). Now it works really fast (e.g. for decomposition it works from 10 to 12 times faster than Python's decomposition algorithm). I've also tested it on the wide strings, and it works as expected (at least!). So this part seems to be finished. The last thing to do is to place everything in the right place into the FreeBSD source tree. Here is my testing repo: https://github.com/ghostmansd/uniext. Just use `git clone https://github.com/ghostmansd/uniext'. P.S. You need to use gmake if you want to use my Makefile (I don't know BSD Makefile syntax well). However, all what you need is to add `-Iinclude' flag to CFLAGS, compile everithing in `src', compile `main.c' and link it all together. -- With best regards, Dmitry Selyutin