From owner-freebsd-current@freebsd.org Thu Jul 14 09:53:40 2016 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 104A5B92D6F for ; Thu, 14 Jul 2016 09:53:40 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-153.reflexion.net [208.70.211.153]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B3BB21A64 for ; Thu, 14 Jul 2016 09:53:39 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 28386 invoked from network); 14 Jul 2016 09:54:11 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 14 Jul 2016 09:54:11 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v7.90.3) with SMTP; Thu, 14 Jul 2016 05:53:37 -0400 (EDT) Received: (qmail 1143 invoked from network); 14 Jul 2016 09:53:36 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 14 Jul 2016 09:53:36 -0000 X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network X-No-Relay: not in my network Received: from [192.168.0.105] (ip70-189-131-151.lv.lv.cox.net [70.189.131.151]) by iron2.pdx.net (Postfix) with ESMTPSA id BC0BEB1E001; Thu, 14 Jul 2016 02:53:29 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\)) Subject: Re: svn commit: r302601 - in head/sys: arm/include arm64/include [clang 3.8.0: powerpc int instead of 32-bit SYSVR4's long and 64-bit ELF V2 long] From: Mark Millard In-Reply-To: <3DFF1DC9-2AE6-498A-9FE0-4970E76F8AB5@dsl-only.net> Date: Thu, 14 Jul 2016 02:53:29 -0700 Cc: svn-src-head@freebsd.org, FreeBSD Current , freebsd-stable@freebsd.org, freebsd-arm , FreeBSD PowerPC ML , FreeBSD Toolchain , Bruce Evans Content-Transfer-Encoding: quoted-printable Message-Id: <580A746B-3F02-44FA-AB2E-20CC71A1E9D2@dsl-only.net> References: <46153340-D2F4-48BD-B738-4792BC25FA3F@dsl-only.net> <38CF2C28-3BD1-4D09-939F-4DD0C2E8B58F@dsl-only.net> <3DFF1DC9-2AE6-498A-9FE0-4970E76F8AB5@dsl-only.net> To: Andrey Chernov X-Mailer: Apple Mail (2.3124) X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Jul 2016 09:53:40 -0000 [Top post of a history note for powerpc and wchar_t's type in FreeBSD. = The history is from looking around in svn.] [The below is not a complaint or a request for a change. It just looks = like int for wchar_t for powerpc was a choice made long ago for simpler = code given FreeBSD's pre-existing structure.] int being used for powerpc wchar_t on FreeBSD goes back to at least = 2001-Jan-1. [FYI: "27 February, 2008: FreeBSD 7.0 is the first release = to officially support the FreeBSD/ppc port". So long before official = support.] wchar_t's type is one place where FreeBSD choose to override the powerpc = (and powerpc64) ABI standards (that indicate long, not int). I'm not = sure if this was implicit vs. explicitly realizing the ABI mismatch. = [The SYSVR4 32-bit powerpc ABI goes back to 1995.] I first traced the history back to 2002-Aug-23: -r102315 of = sys/sys/_types.h standardized FreeBSD on the following until the ARM = change: typedef int __ct_rune_t; typedef __ct_rune_t __rune_t; typedef __ct_rune_t __wchar_t; typedef __ct_rune_t __wint_t; Prior to this there was 2002-Aug-21's -r102227 = sys/powerpc/include/_types.h that used __int32_t. Prior to that had ansi.h and types.h instead of _types.h --and ansi.h = had: #define _BSD_WCHAR_T_ _BSD_CT_RUNE_T_ /* wchar_t (see below) = */ . . . #define _BSD_CT_RUNE_T_ int /* arg type for ctype = funcs */ Going back to sys/powerpc/include/ansi.h's -r70571 (2001-Jan-1 creation = in svn): #define _BSD_WCHAR_T_ int /* wchar_t */ And the comments back then say: . . . It is not * unsigned so that EOF (-1) can be naturally assigned to it and used. . . . The reason an int was * chosen over a long is that the is*() and to*() routines take ints = (says * ANSI C), but they use __ct_rune_t instead of int. I've decided to not go any farther back in time (if there is prior = history for wchar_t for powerpc). Ignoring the temporary __int32_t use: FreeBSD has had its own powerpc = wchar_t type (int) for at least the last 15 years, at least when viewed = just relative to the powerpc ABI(s) FreeBSD is based on for powerpc. Modern gcc versions even have the FreeBSD wchar_t type correct for = powerpc variants in recent times: int. Previously some notation (L based = notation) used the wrong type for one of the powerpc variants (32-bit = vs. 64-bit), causing lots of false-positive compiler notices. gcc had = followed the ABI involved (long int) until the correction. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2016-Jul-13, at 11:46 PM, Mark Millard = wrote: > On 2016-Jul-13, at 6:00 PM, Andrey Chernov = wrote: >=20 >> On 13.07.2016 11:53, Mark Millard wrote: >>> [The below does note that TARGET=3Dpowerpc has a mix of signed = wchar_t and unsigned char types and most architectures have both being = signed types.] >>=20 >> POSIX says nothing about wchar_t and char should be the same = (un)signed. >> It is arm ABI docs may say so only. They are different entities >> differently encoded and cross assigning between wchar_t and char is = not >> recommended. >=20 > [My "odd" would better have been the longer phrase "unusual for = FreeBSD" for the signed type mismatch point.] >=20 > C11 (9899:2011[2012]) and C++11 (14882:2011(E)) agree with your POSIX = note: no constraint to have the same signed type status as char. >=20 > But when I then looked at the "System V Application Binary Interface = PowerpC Processor Supplement" (1995-Sept SunSoft document) that I = believe FreeBSD uses for powerpc (32-bit only: TARGET_ARCH=3Dpowerpc) it = has: >=20 > typedef long wchar_t; >=20 > as part of: Figure 6-39 (page labeled 6-38). >=20 > While agreeing about the signed-type status for wchar_t this does not = agree with FreeBSD 11.0's use of int as the type: >=20 > sys/powerpc/include/_types.h:typedef int ___wchar_t; > sys/powerpc/include/_types.h:#define __WCHAR_MIN __INT_MIN = /* min value for a wchar_t */ > sys/powerpc/include/_types.h:#define __WCHAR_MAX __INT_MAX = /* max value for a wchar_t */ >=20 > # clang --target=3Dpowerpc-freebsd11 -std=3Dc99 -E -dM - < /dev/null = | more > . . . > #define __WCHAR_MAX__ 2147483647 > #define __WCHAR_TYPE__ int > #define __WCHAR_WIDTH__ 32 > . . . >=20 > I'm not as sure of which document is official for = TARGET_ARCH=3Dpowerpc64 but using "Power Architecture 64-bit ELF V2 ABI = Specification" (Open POWER ABI for Linux Supplement) as an example of = what likely is common for that context: 5.1.3 Types Defined in Standard = header lists: >=20 > typedef long wchar_t; >=20 > which again does not agree with FreeBSD 11.0's use of int as the type: >=20 > # clang --target=3Dpowerpc64-freebsd11 -std=3Dc99 -E -dM - < = /dev/null | more > . . . > #define __WCHAR_MAX__ 2147483647 > #define __WCHAR_TYPE__ int > #define __WCHAR_WIDTH__ 32 > . . . >=20 >=20 > =3D=3D=3D > Mark Millard > markmi at dsl-only.net >=20 >=20 >>=20 >> On 2016-Jul-11, at 8:57 PM, Andrey Chernov = wrote: >>=20 >>> On 12.07.2016 5:44, Mark Millard wrote: >>>> My understanding of the criteria for __WCHAR_MIN and __WCHAR_MAX: >>>>=20 >>>> A) __WCHAR_MIN and __WCHAR_MAX: same type as the integer promotion = of >>>> ___wchar_t (if that is distinct). >>>> B) __WCHAR_MIN is the low value for ___wchar_t as an integer type; = not >>>> necessarily a valid char value >>>> C) __WCHAR_MAX is the high value for ___wchar_t as an integer type; = not >>>> necessarily a valid char value >>>=20 >>> It seems you are right about "not a valid char value", I'll back = this >>> change out. >>>=20 >>>> As far as I know arm FreeBSD uses unsigned character types (of = whatever >>>> width). >>>=20 >>> Probably it should be unsigned for other architectures too, clang = does >>> not generate negative values with L'' literals and locale use = only >>> positive values too. >>=20 >> Looking around: >>=20 >> # grep -i wchar sys/*/include/_types.h >> sys/arm/include/_types.h:typedef unsigned int ___wchar_t; >> sys/arm/include/_types.h:#define __WCHAR_MIN 0 = /* min value for a wchar_t */ >> sys/arm/include/_types.h:#define __WCHAR_MAX __UINT_MAX = /* max value for a wchar_t */ >> sys/arm64/include/_types.h:typedef unsigned int ___wchar_t; >> sys/arm64/include/_types.h:#define __WCHAR_MIN 0 = /* min value for a wchar_t */ >> sys/arm64/include/_types.h:#define __WCHAR_MAX __UINT_MAX = /* max value for a wchar_t */ >> sys/mips/include/_types.h:typedef int ___wchar_t; >> sys/mips/include/_types.h:#define __WCHAR_MIN __INT_MIN = /* min value for a wchar_t */ >> sys/mips/include/_types.h:#define __WCHAR_MAX __INT_MAX = /* max value for a wchar_t */ >> sys/powerpc/include/_types.h:typedef int ___wchar_t; >> sys/powerpc/include/_types.h:#define __WCHAR_MIN __INT_MIN = /* min value for a wchar_t */ >> sys/powerpc/include/_types.h:#define __WCHAR_MAX __INT_MAX = /* max value for a wchar_t */ >> sys/riscv/include/_types.h:typedef int ___wchar_t; >> sys/riscv/include/_types.h:#define __WCHAR_MIN __INT_MIN = /* min value for a wchar_t */ >> sys/riscv/include/_types.h:#define __WCHAR_MAX __INT_MAX = /* max value for a wchar_t */ >> sys/sparc64/include/_types.h:typedef int ___wchar_t; >> sys/sparc64/include/_types.h:#define __WCHAR_MIN __INT_MIN = /* min value for a wchar_t */ >> sys/sparc64/include/_types.h:#define __WCHAR_MAX __INT_MAX = /* max value for a wchar_t */ >> sys/x86/include/_types.h:typedef int ___wchar_t; >> sys/x86/include/_types.h:#define __WCHAR_MIN __INT_MIN = /* min value for a wchar_t */ >> sys/x86/include/_types.h:#define __WCHAR_MAX __INT_MAX = /* max value for a wchar_t */ >>=20 >> So only arm and arm64 have unsigned wchar_t types. >>=20 >> [NOTE: __CHAR16_TYPE__ and __CHAR32_TYPE__ are always unsigned: in = C++11 terms char16_t is like std::uint_least16_t and char32_t is like = std::uint_least32_t despite being distinct types. So __CHAR16_TYPE__ and = __CHAR32_TYPE__ are ignored below.] >>=20 >> The clang 3.8.0 compiler output has an odd mix for = TARGET_ARCH=3Dpowerpc and TARGET_ARCH=3Dpowerpc64 . . . >>=20 >> armv6 has unsigned types for both char and __WCHAR_TYPE__. >> aarch64 has unsigned types for both char and __WCHAR_TYPE__. >> powerpc has unsigned for char but signed for __WCHAR_TYPE__. >> powerpc64 has unsigned for char but signed for __WCHAR_TYPE__. >> amd64 has signed types for both char and __WCHAR_TYPE__. >> i386 has signed types for both char and __WCHAR_TYPE__. >> mips has signed types for both char and __WCHAR_TYPE__. >> sparc64 has signed types for both char and __WCHAR_TYPE__. >> (riscv is not covered by clang as I understand) >>=20 >> The details via compiler #define's. . . >>=20 >> # clang --target=3Darmv6-freebsd11 -std=3Dc99 -E -dM - < /dev/null | = more >> . . . >> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> #define __CHAR_UNSIGNED__ 1 >> . . . >> #define __WCHAR_MAX__ 4294967295U >> #define __WCHAR_TYPE__ unsigned int >> #define __WCHAR_UNSIGNED__ 1 >> #define __WCHAR_WIDTH__ 32 >> . . . >>=20 >> # clang --target=3Daarch64-freebsd11 -std=3Dc99 -E -dM - < /dev/null = | more >> . . . >> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> #define __CHAR_UNSIGNED__ 1 >> . . . >> #define __WCHAR_MAX__ 4294967295U >> #define __WCHAR_TYPE__ unsigned int >> #define __WCHAR_UNSIGNED__ 1 >> #define __WCHAR_WIDTH__ 32 >> . . . >>=20 >> # clang --target=3Dpowerpc-freebsd11 -std=3Dc99 -E -dM - < /dev/null = | more >> . . . >> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> #define __CHAR_UNSIGNED__ 1 >> . . . >> #define __WCHAR_MAX__ 2147483647 >> #define __WCHAR_TYPE__ int >> #define __WCHAR_WIDTH__ 32 >> . . . (note the lack of __WCHAR_UNSIGNED__) . . . >>=20 >> Is powerpc wrong? >>=20 >> # clang --target=3Dpowerpc64-freebsd11 -std=3Dc99 -E -dM - < = /dev/null | more >> . . . >> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> #define __CHAR_UNSIGNED__ 1 >> . . . >> #define __WCHAR_MAX__ 2147483647 >> #define __WCHAR_TYPE__ int >> #define __WCHAR_WIDTH__ 32 >> . . . (note the lack of __WCHAR_UNSIGNED__) . . . >>=20 >> Is powerpc64 wrong? >>=20 >>=20 >> # clang --target=3Damd64-freebsd11 -std=3Dc99 -E -dM - < /dev/null | = more >> . . . >> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> . . . (note the lack of __CHAR_UNSIGNED__) . . . >>=20 >> #define __WCHAR_MAX__ 2147483647 >> #define __WCHAR_TYPE__ int >> #define __WCHAR_WIDTH__ 32 >> . . . (note the lack of __WCHAR_UNSIGNED__) . . . >>=20 >> # clang --target=3Di386-freebsd11 -std=3Dc99 -E -dM - < /dev/null | = more >> . . . >> #define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> . . . (note the lack of __CHAR_UNSIGNED__) . . . >>=20 >> #define __WCHAR_MAX__ 2147483647 >> #define __WCHAR_TYPE__ int >> #define __WCHAR_WIDTH__ 32 >> . . . (note the lack of __WCHAR_UNSIGNED__) . . . >>=20 >>=20 >> # clang --target=3Dmips-freebsd11 -std=3Dc99 -E -dM - < /dev/null | = more >> . . . >> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> . . . (note the lack of __CHAR_UNSIGNED__) . . . >>=20 >> #define __WCHAR_MAX__ 2147483647 >> #define __WCHAR_TYPE__ int >> #define __WCHAR_WIDTH__ 32 >> . . . (note the lack of __WCHAR_UNSIGNED__) . . . >>=20 >> # clang --target=3Dsparc64-freebsd11 -std=3Dc99 -E -dM - < /dev/null = | more >> . . . >> #define __BYTE_ORDER__ __ORDER_BIG_ENDIAN__ >> . . . >> #define __CHAR_BIT__ 8 >> . . . (note the lack of __CHAR_UNSIGNED__) . . . >>=20 >> #define __WCHAR_MAX__ 2147483647 >> #define __WCHAR_TYPE__ int >> #define __WCHAR_WIDTH__ 32 >> . . . (note the lack of __WCHAR_UNSIGNED__) . . . >>=20 >>=20 >>=20 >> =3D=3D=3D >> Mark Millard >> markmi at dsl-only.net