From owner-freebsd-arm@freebsd.org Thu Mar 16 19:06:13 2017 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D3595D0F50E for ; Thu, 16 Mar 2017 19:06:13 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-177.reflexion.net [208.70.211.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 92F8417A2 for ; Thu, 16 Mar 2017 19:06:13 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 17915 invoked from network); 16 Mar 2017 19:06:06 -0000 Received: from unknown (HELO mail-cs-01.app.dca.reflexion.local) (10.81.19.1) by 0 (rfx-qmail) with SMTP; 16 Mar 2017 19:06:06 -0000 Received: by mail-cs-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Thu, 16 Mar 2017 15:06:06 -0400 (EDT) Received: (qmail 32300 invoked from network); 16 Mar 2017 19:06:06 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 16 Mar 2017 19:06:06 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 8C793EC9026; Thu, 16 Mar 2017 12:06:05 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: Is CPUTYPE=cortex-A7 supposed to work? From: Mark Millard In-Reply-To: <6D4C0F5B-6F13-44DE-A1C6-83A1D3608EB4@dsl-only.net> Date: Thu, 16 Mar 2017 12:06:04 -0700 Cc: freebsd-arm , mmel@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <91BB5C78-B2F8-458E-B5F4-DE31068D5943@dsl-only.net> References: <871suc3nv8.fsf@news-spur.riddles.org.uk> <8737ely05c.fsf@news-spur.riddles.org.uk> <87wpbxw3yd.fsf@news-spur.riddles.org.uk> <79EBD44B-2C2D-4394-A90C-DF494A049F20@dsl-only.net> <674facba-68cd-8ce1-887a-1ef3c51520bc@freebsd.org> <87pohh2n3c.fsf@news-spur.riddles.org.uk> <6D4C0F5B-6F13-44DE-A1C6-83A1D3608EB4@dsl-only.net> To: Andrew Gierth , Sylvain Garrigues X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 19:06:13 -0000 [I accidentally included some fpregs32 material, so this corrects the impression that might leave.] On 2017-Mar-16, at 11:44 AM, Mark Millard = wrote: > On 2017-Mar-16, at 10:27 AM, Andrew Gierth wrote: >>=20 >>>>>>> "Sylvain" =3D=3D Sylvain Garrigues = writes: >>=20 >> Sylvain> Thank you so much for your quick feedback Michal. Good to = know >> Sylvain> this matter is into good hands. I=E2=80=99m afraid I'm still = afraid >> Sylvain> about `basic=E2=80=99 programs like git being still = potentially broken >> Sylvain> when kernel+world+ports have CPUTYPE=3Dcortex-a7 in = make.conf - >> Sylvain> Andrew said a simple `git clone' could fail, more precisely >> Sylvain> (quoting him): >>=20 >>>> I have determined that the sha1 failures occur only if the >>>> NEON-enabled SHA1 block function is interrupted by a signal. This >>>> explains why it fails in git (which is using SIGALRM to set a >>>> "display progress" flag) but not in standalone SHA1 tests; >>=20 >> Sylvain> Removing CPUTYPE apparently fixes things hence I=E2=80=99m = not 100% >> Sylvain> confident yet of keeping CPUTYPE=3Dcortex-a7 myself even if = only >> Sylvain> a few ports might be affected. Git is an important port, who >> Sylvain> knows what other ports are broken :-) >>=20 >> Let me clarify this. >>=20 >> Without CPUTYPE, things _appear_ to work because only explicit use of >> floating-point exposes the bug, and it's extremely rare for programs = to >> use floating-point in signal handlers, and even then the only result >> would be incorrect floating-point calculations in the interrupted = code. >>=20 >> With CPUTYPE, the compiler can generate NEON instructions for integer >> code; even without heavy optimization enabled, it might choose to use >> NEON register load/store to copy small data structures for example. = One >> piece of code which is affected by this is the signal handling = functions >> in libthr, which wrap the program's own signal handler functions; so >> now, _every_ signal handler in a program linked with libthr uses the >> NEON registers, and the result isn't limited to corrupting >> floating-point calculations but can corrupt data structures or copied >> memory, or the results of vectorized code. >>=20 >> The specific failures that I saw -- git failing, emacs crashing, = errors >> from openssl speed -- were all of this second kind and therefore not >> directly reproducible without CPUTYPE; but the test program I gave = for >> the bug report demonstrates the problem by using explicit = floating-point >> (in a highly contrived way) and therefore reproduces the issue even >> without CPUTYPE. >>=20 >> So even though the bug is in the kernel and affects all armv6 targets >> whether NEON is in use or not, the chances of actually hitting it are >> pretty negligible if you built the world without CPUTYPE. But if you >> build with CPUTYPE, then potentially any code that catches a signal = is >> affected; it's just that programs (like git) that combine signal >> handling with vectorized crypto code, or programs (like emacs) that = use >> signal handling very heavily, have the greatest probability of = failure. >>=20 >> tl/dr: building without CPUTYPE is a workaround that simply reduces = both >> the chance and severity of failure; building with CPUTYPE currently >> breaks almost everything, but with a probability that varies wildly >> depending on what the application does. >>=20 >> --=20 >> Andrew. >=20 > As I understand there are also issues beyond the fix for signal > delivery. >=20 > On 2017-Mar-12, at 12:17 AM, Michal Meloun = wrote: >=20 >> The struct fpreg is also wrong and I'm not sure if >> or how we can to fix this in compatible way. >=20 > Looking up some details shows sys/arm/include/reg.h with: >=20 > struct fpreg { > unsigned int fpr_fpsr; > fp_reg_t fpr[8]; > }; >=20 > [Covers only 8 floating point registers?] >=20 > and shows sys/arm/include/fp.h with: >=20 > typedef struct fp_extended_precision { > u_int32_t fp_exponent; > u_int32_t fp_mantissa_hi; > u_int32_t fp_mantissa_lo; > } fp_extended_precision_t; >=20 > typedef struct fp_extended_precision fp_reg_t; > . . . > /* > * Type for a saved FP context, if we want to translate the context to = a > * user-readable form > */ >=20 > typedef struct { > u_int32_t fpsr; > fp_extended_precision_t regs[8]; > } fp_state_t; >=20 >=20 > So each of: >=20 > struct fpreg > fp_state_t >=20 > has room for 8 instances of 96 bits (beyond fpsr), not sufficient > for 32 double precision (i.e., 64-bit) registers. >=20 > The arm code also has: >=20 > # grep -r "\" /usr/src/sys/arm/ | more > /usr/src/sys/arm/arm/machdep.c:fill_fpregs(struct thread *td, struct = fpreg *regs) > /usr/src/sys/arm/arm/machdep.c:set_fpregs(struct thread *td, struct = fpreg *regs) > /usr/src/sys/arm/include/reg.h:struct fpreg { > /usr/src/sys/arm/include/reg.h:int fill_fpregs(struct thread *, = struct fpreg *); > /usr/src/sys/arm/include/reg.h:int set_fpregs(struct thread *, = struct fpreg *); >=20 > And the system has: >=20 > /usr/src/sys/sys/procfs.h:typedef struct fpreg fpregset_t; > /usr/src/sys/sys/procfs.h: size_t pr_fpregsetsz; /* = sizeof(fpregset_t) (1) */ > /usr/src/sys/sys/procfs.h:typedef fpregset_t prfpregset_t; > /usr/src/sys/sys/ptrace.h:struct fpreg; > /usr/src/sys/sys/ptrace.h:int proc_read_fpregs(struct thread *_td, = struct fpreg *_fpreg); > /usr/src/sys/sys/ptrace.h:int proc_write_fpregs(struct thread *_td, = struct fpreg *_fpreg); >=20 > and: >=20 > /usr/src/sys/kern/sys_process.c: * Ptrace doesn't support fpregs at = all, and there are no security holes > /usr/src/sys/kern/sys_process.c: * or translations for fpregs, so we = can just copy them. > /usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, = struct fpreg *fpregs) > /usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs(td, = fpregs)); > /usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, = struct fpreg *fpregs) > /usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs(td, = fpregs)); > /usr/src/sys/kern/sys_process.c: struct fpreg fpreg; > /usr/src/sys/kern/sys_process.c: error =3D = COPYIN(uap->addr, &r.fpreg, sizeof r.fpreg); > /usr/src/sys/kern/sys_process.c: error =3D = COPYOUT(&r.fpreg, uap->addr, sizeof r.fpreg); > /usr/src/sys/kern/sys_process.c: error =3D = PROC_WRITE(fpregs, td2, addr); > /usr/src/sys/kern/sys_process.c: error =3D = PROC_READ(fpregs, td2, addr); >=20 > and there is use of some of the above in: >=20 > /usr/src/sys/kern/sys_process.c: * proc_read_fpregs, proc_write_fpregs > /usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, = struct fpreg *fpregs) > /usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs(td, = fpregs)); > /usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, = struct fpreg *fpregs) > /usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs(td, = fpregs)); Sorry: I did not intend to include the "fpregs32" material: > /usr/src/sys/kern/sys_process.c:proc_read_fpregs32(struct thread *td, = struct fpreg32 *fpregs32) > /usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs32(td, = fpregs32)); > /usr/src/sys/kern/sys_process.c:proc_write_fpregs32(struct thread *td, = struct fpreg32 *fpregs32) > /usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs32(td, = fpregs32)); I have not looked into fpregs32 at all. As far as I saw looking at fpregs material fpregs32 seemed to be separate. My guess would be fpregs32 is for amd64, powerpc64, sparc64, mips and their supporting 32 bit processes as well as 64-bit on 64-bit processors. So far as I know FreeBSD does not have such support for 32-bit code in arm64/aarch64 contexts. > I may not have found everything relevant. >=20 > It appears that fp_state_t is unused in /usr/src/sys/ . >=20 >=20 > Note: I was looking at /usr/src/sys/ for. . . >=20 > # uname -paKU > FreeBSD FreeBSDx64 12.0-CURRENT FreeBSD 12.0-CURRENT r314687M amd64 = amd64 1200023 1200023 =3D=3D=3D Mark Millard markmi at dsl-only.net