From owner-soc-status@FreeBSD.ORG Sun Jun 22 22:34:33 2014 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id EA14B3C4; Sun, 22 Jun 2014 22:34:33 +0000 (UTC) Received: from mail-la0-x22d.google.com (mail-la0-x22d.google.com [IPv6:2a00:1450:4010:c03::22d]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 1C7302CC7; Sun, 22 Jun 2014 22:34:32 +0000 (UTC) Received: by mail-la0-f45.google.com with SMTP id hr17so3627529lab.4 for ; Sun, 22 Jun 2014 15:34:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:message-id:date:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; bh=jnZHNpkUiLlvEp8cwm/UvnatJQ1C6COLU2FKUPI9y4g=; b=QlemqI5aktmTXOE2jkvrGvAvI5l7TvE7ZTipGm+Fuu4wsnSjWIe/nIHXy0cEeZNMei 1QL/AW1iLgh/CYpBXnqwMMiPa8SvEHJppNaaXobXPDqLy+n8rPdlgb/eZnoOoGU5xTz3 M8GjQStVhDXufTD28p6nf5vR2Z2oQuQGbC+dplGBZ+qgRLkJ79WRpprXCSZPWxsLEWeP 6lgJOG1wWR/JHU6Nh5Mc6sU1G/NsXusLfjCjklUt2AKRjbvn1jEkmwBl8C9vh8P9/WlH uhDQVMDtUrl2kY4SOget5HligiMjN5moV9TIHeYNRvhqUsv9NnMHGeKmgkVv4zx8W9Ki aNug== X-Received: by 10.113.3.69 with SMTP id bu5mr13087622lbd.29.1403476471057; Sun, 22 Jun 2014 15:34:31 -0700 (PDT) Received: from openSUSE.linux ([176.100.246.237]) by mx.google.com with ESMTPSA id b6sm6989731laa.20.2014.06.22.15.34.30 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 22 Jun 2014 15:34:30 -0700 (PDT) From: Dmitry Selyutin X-Google-Original-From: Dmitry Selyutin Message-ID: <53A759D9.1010804@gmail.com> Date: Mon, 23 Jun 2014 02:34:01 +0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0 MIME-Version: 1.0 To: soc-status@FreeBSD.org, Pedro Giffuni , David Chisnall Subject: Report #3: Unicode support Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 22 Jun 2014 22:34:34 -0000 Hello everyone! I'm glad to tell that I've finished a base sketch of the Unicode Normalization Algorithm, which seems to work. Files were recently updated to the most recent version of the Unicode (7.0.0). Of course this code needs some tuning, e.g. in the worst case one has to iterate over the whole table in order to check if character can be normalized; I'm going to fix it using other structure, where each byte denotes 8 characters, while each bit of this byte means flag if this character may or may not be normalized. Thus we need to have two arrays of 139264 characters (for composition and decomposition respectively), where the state of the each character may be determined by simple division. That's just a proposal; everyone is welcome to propose a better way to handle such things. Of course, the other important part is to prepare a testing suite, but for this part I have to consult with my mentors, Pedro and David. -- With best regards, Dmitry Selyutin