From owner-freebsd-i18n@freebsd.org Sun Apr 9 11:06:47 2017 Return-Path: Delivered-To: freebsd-i18n@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 16984D36668 for ; Sun, 9 Apr 2017 11:06:47 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.not-for.work (onlyone.not-for.work [IPv6:2a01:4f8:201:6350::2]) by mx1.freebsd.org (Postfix) with ESMTP id DCD3E154 for ; Sun, 9 Apr 2017 11:06:46 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from lion.home.serebryakov.spb.ru (unknown [IPv6:2001:470:923f:1:e0f4:994:662:862]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.not-for.work (Postfix) with ESMTPSA id 8ED49EC1 for ; Sun, 9 Apr 2017 14:06:38 +0300 (MSK) Date: Sun, 9 Apr 2017 14:06:46 +0300 From: Lev Serebryakov Organization: FreeBSD X-Priority: 3 (Normal) Message-ID: <137414834.20170409140646@serebryakov.spb.ru> To: freebsd-i18n@freebsd.org Subject: citrus/BSD iconv doesn't respect ICONV_SET_DISCARD_ILSEQ flag MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 11:06:47 -0000 Hello Freebsd-i18n, I understand, that iconvctl(3) is GNU extension, but as soon as citurs iconv used by FreeBSD libc formally supports this API and ICONV_SET_DISCARD_ILSEQ flag, they should work, IMHO. But they don't. If I try to convert simple UTF-8 string with illegal sequence to ASCII (all legal character in this string is ASCII), it stops on illegal sequence and returns error. GNU iconv from ports works correctly. I didn't try UTF-16 and UTF-32/UCS-4, but by looking at code, I'm afraid, they have same problems. Here are simple program, which reproduce problem: =============== #include #include #include #include int main(int argc, char *argv[]) { const char *src = "X\x80Y"; char dst[64] = {0, 0, 0, 0, 0, 0, 0}; char *s = (char*)src; char *d = &dst[0]; size_t ss = strlen(src) + 1; size_t ds = sizeof(dst); int flag; iconv_t ic = iconv_open("ascii", "utf-8"); flag = 1; iconvctl(ic, ICONV_SET_DISCARD_ILSEQ, &flag); printf("Result: %ld\n", iconv(ic, &s, &ss, &d, &ds)); printf("Converted: from %lu to %lu bytes\n", strlen(src) + 1 - ss, sizeof(dst) - ds); printf("Out: \"%s\"\n", &dst[0]); iconv_close(ic); return 0; } =============== % cc ic.c % ./a.out Result: -1 Converted: from 1 to 1 bytes Out: "X" % cc -L/usr/local/lib -I/usr/local/include ic.c -liconv % ./a.out Result: 0 Converted: from 4 to 3 bytes Out: "XY" % % uname -a FreeBSD blob.home.serebryakov.spb.ru 11.0-STABLE FreeBSD 11.0-STABLE #13 r315153M: Sun Mar 12 20:11:36 MSK 2017 root@blob.home.serebryakov.spb.ru:/usr/obj/usr/src/sys/BLOB amd64 % -- Best regards, Lev mailto:lev@FreeBSD.org From owner-freebsd-i18n@freebsd.org Sun Apr 9 13:06:32 2017 Return-Path: Delivered-To: freebsd-i18n@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 50F5BD35811 for ; Sun, 9 Apr 2017 13:06:32 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CC2BF6A5 for ; Sun, 9 Apr 2017 13:06:31 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: by mail-lf0-f65.google.com with SMTP id 75so1017981lfs.3 for ; Sun, 09 Apr 2017 06:06:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=obJ7RF2d19Qp1ERnptiAuh13xHMim/PsZDS/aFSbH18=; b=GEQynawcd6Ct5SQ+riA8Yt/+SOFFKi5Pi+OKrtTCeOxwjeKuY8jkfAef3sje6XzIdB qdFrldme1kmKfHP96bvV4I5nN65/tDFPoHZrU7C7cOrD+du5h+mStl0EHaSow1NGucjA MI9q8RPCInGZ+SuQk0kf506BfvN83umS+73cO+JXpzWbErKsiTZPPsEyrTgx77Wz3S+B np30gbKOxqLa+ZyrZPUTgi8lOaZYBeBH4birBxGI/HFdyh0LYeYiX/mo9VLvST9A4z0p 4YwNbr3wN0ob2Xlsg2O8Op5ozPq2kNXfpFv4pofWj/M4eqU0Tswr3CLDtwhM4H2k+M9p skrA== X-Gm-Message-State: AFeK/H0NN5Q7jMKuyqam7vE/zXnWi4A1wqZpAIH9rBa1W3VfkYFW5HWUhJjvsa7vvkKP8w== X-Received: by 10.25.25.207 with SMTP id 198mr16120422lfz.1.1491743189244; Sun, 09 Apr 2017 06:06:29 -0700 (PDT) Received: from [192.168.1.2] ([89.169.173.68]) by smtp.gmail.com with ESMTPSA id 85sm2123286ljj.56.2017.04.09.06.06.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 09 Apr 2017 06:06:28 -0700 (PDT) Subject: Re: citrus/BSD iconv doesn't respect ICONV_SET_DISCARD_ILSEQ flag To: Lev Serebryakov , freebsd-i18n@freebsd.org References: <137414834.20170409140646@serebryakov.spb.ru> From: Andrey Chernov Message-ID: <53b57139-8e68-da83-8fe8-e132ea524b6d@freebsd.org> Date: Sun, 9 Apr 2017 16:06:24 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <137414834.20170409140646@serebryakov.spb.ru> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 09 Apr 2017 13:06:32 -0000 On 09.04.2017 14:06, Lev Serebryakov wrote: > I understand, that iconvctl(3) is GNU extension, but as soon as citurs > iconv used by FreeBSD libc formally supports this API and > ICONV_SET_DISCARD_ILSEQ flag, they should work, IMHO. But they don't. If I > try to convert simple UTF-8 string with illegal sequence to ASCII (all > legal character in this string is ASCII), it stops on illegal sequence and > returns error. GNU iconv from ports works correctly. I didn't try UTF-16 > and UTF-32/UCS-4, but by looking at code, I'm afraid, they have same > problems. I need to add that our iconv also don't support anything after // forcing ports using GNU iconv: /* * Remove anything following a //, as these are options (like * //ignore, //translate, etc) and we just don't handle them. * This is for compatibility with software that uses these * blindly. */ //ignore is analogue of ICONV_SET_DISCARD_ILSEQ on. From owner-freebsd-i18n@freebsd.org Mon Apr 10 10:42:03 2017 Return-Path: Delivered-To: freebsd-i18n@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 07F28D351FF for ; Mon, 10 Apr 2017 10:42:03 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.not-for.work (onlyone.not-for.work [IPv6:2a01:4f8:201:6350::2]) by mx1.freebsd.org (Postfix) with ESMTP id C4F86DAD; Mon, 10 Apr 2017 10:42:02 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from [192.168.17.133] (unknown [89.113.128.32]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.not-for.work (Postfix) with ESMTPSA id 6AD7EFF5; Mon, 10 Apr 2017 13:42:00 +0300 (MSK) Reply-To: lev@FreeBSD.org Subject: Re: citrus/BSD iconv doesn't respect ICONV_SET_DISCARD_ILSEQ flag References: <137414834.20170409140646@serebryakov.spb.ru> <53b57139-8e68-da83-8fe8-e132ea524b6d@freebsd.org> To: Andrey Chernov , freebsd-i18n@freebsd.org From: Lev Serebryakov Organization: FreeBSD Message-ID: Date: Mon, 10 Apr 2017 13:41:59 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <53b57139-8e68-da83-8fe8-e132ea524b6d@freebsd.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 10:42:03 -0000 On 09.04.2017 16:06, Andrey Chernov wrote: > I need to add that our iconv also don't support anything after // > forcing ports using GNU iconv: > /* > * Remove anything following a //, as these are options (like > * //ignore, //translate, etc) and we just don't handle them. > * This is for compatibility with software that uses these > * blindly. > */ > //ignore is analogue of ICONV_SET_DISCARD_ILSEQ on. But later in code: handle->cv_shared->ci_discard_ilseq = strcasestr(out, "//IGNORE"); So, "//IGNORE" is "supported" (in same way as ICONV_SET_DISCARD_ILSEQ is "supported") -- // Lev Serebryakov From owner-freebsd-i18n@freebsd.org Mon Apr 10 11:59:51 2017 Return-Path: Delivered-To: freebsd-i18n@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 95CBCD37315 for ; Mon, 10 Apr 2017 11:59:51 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from onlyone.not-for.work (onlyone.not-for.work [148.251.9.81]) by mx1.freebsd.org (Postfix) with ESMTP id 625183CD; Mon, 10 Apr 2017 11:59:51 +0000 (UTC) (envelope-from lev@FreeBSD.org) Received: from [192.168.17.133] (unknown [89.113.128.32]) (Authenticated sender: lev@serebryakov.spb.ru) by onlyone.not-for.work (Postfix) with ESMTPSA id C2A2B30; Mon, 10 Apr 2017 14:59:43 +0300 (MSK) Reply-To: lev@FreeBSD.org Subject: Re: citrus/BSD iconv doesn't respect ICONV_SET_DISCARD_ILSEQ flag References: <137414834.20170409140646@serebryakov.spb.ru> <53b57139-8e68-da83-8fe8-e132ea524b6d@freebsd.org> To: Andrey Chernov , freebsd-i18n@freebsd.org From: Lev Serebryakov Organization: FreeBSD Message-ID: Date: Mon, 10 Apr 2017 14:59:43 +0300 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 11:59:51 -0000 On 10.04.2017 14:51, Andrey Chernov wrote: >>> I need to add that our iconv also don't support anything after // >>> forcing ports using GNU iconv: >>> /* >>> * Remove anything following a //, as these are options (like >>> * //ignore, //translate, etc) and we just don't handle them. >>> * This is for compatibility with software that uses these >>> * blindly. >>> */ >>> //ignore is analogue of ICONV_SET_DISCARD_ILSEQ on. >> But later in code: >> >> handle->cv_shared->ci_discard_ilseq = strcasestr(out, "//IGNORE"); >> >> So, "//IGNORE" is "supported" (in same way as ICONV_SET_DISCARD_ILSEQ >> is "supported") >> > I isn't, it just being set/unset without affecting anything. Grep > ci_discard_ilseq through BSD iconv sources. It is why I wrote "supported" not supported :( -- // Lev Serebryakov From owner-freebsd-i18n@freebsd.org Mon Apr 10 12:24:36 2017 Return-Path: Delivered-To: freebsd-i18n@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 70E6BD36CB0 for ; Mon, 10 Apr 2017 12:24:36 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: from mail-lf0-f65.google.com (mail-lf0-f65.google.com [209.85.215.65]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F0268920 for ; Mon, 10 Apr 2017 12:24:35 +0000 (UTC) (envelope-from mailing-machine@vniz.net) Received: by mail-lf0-f65.google.com with SMTP id 75so2918812lfs.3 for ; Mon, 10 Apr 2017 05:24:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=JZl6nNWe0vv+L0B706wHOcWbmI3TTG1+np8wEUJ9qls=; b=Nqbpsy9JymDF/bLy5c6lX12tsiKOo2MZBjScpqwPqBaTfWBNI95TIcBzU7QlFLoAqK E3xsmb1+FtuhCVk1vbWFZ03ygeDfu3fmpbadG1AzR3QEg3Mv9f+A7NtaXiIqCyDOGnf3 H1VEZ4OB+pRYVPbtZrtoaUHhwsOXVP9dHqDJs9uSNO2zYcdbSFbN8+9TgqeYmNP1qA/S +/2259pl/WdnJBn64unVgekS75vR+sm37JiQ2gAuz5WGP3TOaW0X115ldg09scGeeqiJ PwLRrAXC2+yVan09tUUlynNzixltmw9AdObSbZQe4GZIpwJcYRw6TQYI3zsDP23PZIYr aTUw== X-Gm-Message-State: AFeK/H3hyb3WRD8sZ1h8kYW/UlL7CPbjCTRUYk1M27E8ATdSPsHEW4VIIDa0WpqBO/7dxQ== X-Received: by 10.46.22.66 with SMTP id 2mr15189775ljw.120.1491825113254; Mon, 10 Apr 2017 04:51:53 -0700 (PDT) Received: from [192.168.1.2] ([89.169.173.68]) by smtp.gmail.com with ESMTPSA id h23sm2806506lji.21.2017.04.10.04.51.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Apr 2017 04:51:52 -0700 (PDT) Subject: Re: citrus/BSD iconv doesn't respect ICONV_SET_DISCARD_ILSEQ flag To: lev@FreeBSD.org, freebsd-i18n@freebsd.org References: <137414834.20170409140646@serebryakov.spb.ru> <53b57139-8e68-da83-8fe8-e132ea524b6d@freebsd.org> From: Andrey Chernov Message-ID: Date: Mon, 10 Apr 2017 14:51:49 +0300 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=us-ascii Content-Language: ru-English Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-i18n@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: FreeBSD Internationalization Effort List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Apr 2017 12:24:36 -0000 On 10.04.2017 13:41, Lev Serebryakov wrote: > On 09.04.2017 16:06, Andrey Chernov wrote: > >> I need to add that our iconv also don't support anything after // >> forcing ports using GNU iconv: >> /* >> * Remove anything following a //, as these are options (like >> * //ignore, //translate, etc) and we just don't handle them. >> * This is for compatibility with software that uses these >> * blindly. >> */ >> //ignore is analogue of ICONV_SET_DISCARD_ILSEQ on. > But later in code: > > handle->cv_shared->ci_discard_ilseq = strcasestr(out, "//IGNORE"); > > So, "//IGNORE" is "supported" (in same way as ICONV_SET_DISCARD_ILSEQ > is "supported") > I isn't, it just being set/unset without affecting anything. Grep ci_discard_ilseq through BSD iconv sources.