From owner-freebsd-arm@freebsd.org Thu Mar 16 18:44:31 2017 Return-Path: Delivered-To: freebsd-arm@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3FD78D0FD4C for ; Thu, 16 Mar 2017 18:44:31 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-211-178.reflexion.net [208.70.211.178]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id F405B18C6 for ; Thu, 16 Mar 2017 18:44:30 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 18902 invoked from network); 16 Mar 2017 18:44:28 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 16 Mar 2017 18:44:28 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.30.2) with SMTP; Thu, 16 Mar 2017 14:44:28 -0400 (EDT) Received: (qmail 7637 invoked from network); 16 Mar 2017 18:44:28 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 16 Mar 2017 18:44:28 -0000 Received: from [192.168.1.111] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id 7C05EEC90DB; Thu, 16 Mar 2017 11:44:27 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Re: Is CPUTYPE=cortex-A7 supposed to work? From: Mark Millard In-Reply-To: <87pohh2n3c.fsf@news-spur.riddles.org.uk> Date: Thu, 16 Mar 2017 11:44:26 -0700 Cc: freebsd-arm , mmel@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: <6D4C0F5B-6F13-44DE-A1C6-83A1D3608EB4@dsl-only.net> References: <871suc3nv8.fsf@news-spur.riddles.org.uk> <8737ely05c.fsf@news-spur.riddles.org.uk> <87wpbxw3yd.fsf@news-spur.riddles.org.uk> <79EBD44B-2C2D-4394-A90C-DF494A049F20@dsl-only.net> <674facba-68cd-8ce1-887a-1ef3c51520bc@freebsd.org> <87pohh2n3c.fsf@news-spur.riddles.org.uk> To: Andrew Gierth , Sylvain Garrigues X-Mailer: Apple Mail (2.3259) X-BeenThere: freebsd-arm@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Porting FreeBSD to ARM processors." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Mar 2017 18:44:31 -0000 On 2017-Mar-16, at 10:27 AM, Andrew Gierth = wrote: >=20 >>>>>> "Sylvain" =3D=3D Sylvain Garrigues = writes: >=20 > Sylvain> Thank you so much for your quick feedback Michal. Good to = know > Sylvain> this matter is into good hands. I=E2=80=99m afraid I'm still = afraid > Sylvain> about `basic=E2=80=99 programs like git being still = potentially broken > Sylvain> when kernel+world+ports have CPUTYPE=3Dcortex-a7 in make.conf = - > Sylvain> Andrew said a simple `git clone' could fail, more precisely > Sylvain> (quoting him): >=20 >>> I have determined that the sha1 failures occur only if the >>> NEON-enabled SHA1 block function is interrupted by a signal. This >>> explains why it fails in git (which is using SIGALRM to set a >>> "display progress" flag) but not in standalone SHA1 tests; >=20 > Sylvain> Removing CPUTYPE apparently fixes things hence I=E2=80=99m = not 100% > Sylvain> confident yet of keeping CPUTYPE=3Dcortex-a7 myself even if = only > Sylvain> a few ports might be affected. Git is an important port, who > Sylvain> knows what other ports are broken :-) >=20 > Let me clarify this. >=20 > Without CPUTYPE, things _appear_ to work because only explicit use of > floating-point exposes the bug, and it's extremely rare for programs = to > use floating-point in signal handlers, and even then the only result > would be incorrect floating-point calculations in the interrupted = code. >=20 > With CPUTYPE, the compiler can generate NEON instructions for integer > code; even without heavy optimization enabled, it might choose to use > NEON register load/store to copy small data structures for example. = One > piece of code which is affected by this is the signal handling = functions > in libthr, which wrap the program's own signal handler functions; so > now, _every_ signal handler in a program linked with libthr uses the > NEON registers, and the result isn't limited to corrupting > floating-point calculations but can corrupt data structures or copied > memory, or the results of vectorized code. >=20 > The specific failures that I saw -- git failing, emacs crashing, = errors > from openssl speed -- were all of this second kind and therefore not > directly reproducible without CPUTYPE; but the test program I gave for > the bug report demonstrates the problem by using explicit = floating-point > (in a highly contrived way) and therefore reproduces the issue even > without CPUTYPE. >=20 > So even though the bug is in the kernel and affects all armv6 targets > whether NEON is in use or not, the chances of actually hitting it are > pretty negligible if you built the world without CPUTYPE. But if you > build with CPUTYPE, then potentially any code that catches a signal is > affected; it's just that programs (like git) that combine signal > handling with vectorized crypto code, or programs (like emacs) that = use > signal handling very heavily, have the greatest probability of = failure. >=20 > tl/dr: building without CPUTYPE is a workaround that simply reduces = both > the chance and severity of failure; building with CPUTYPE currently > breaks almost everything, but with a probability that varies wildly > depending on what the application does. >=20 > --=20 > Andrew. As I understand there are also issues beyond the fix for signal delivery. On 2017-Mar-12, at 12:17 AM, Michal Meloun = wrote: > The struct fpreg is also wrong and I'm not sure if > or how we can to fix this in compatible way. Looking up some details shows sys/arm/include/reg.h with: struct fpreg { unsigned int fpr_fpsr; fp_reg_t fpr[8]; }; [Covers only 8 floating point registers?] and shows sys/arm/include/fp.h with: typedef struct fp_extended_precision { u_int32_t fp_exponent; u_int32_t fp_mantissa_hi; u_int32_t fp_mantissa_lo; } fp_extended_precision_t; typedef struct fp_extended_precision fp_reg_t; . . . /* * Type for a saved FP context, if we want to translate the context to a * user-readable form */ typedef struct { u_int32_t fpsr; fp_extended_precision_t regs[8]; } fp_state_t; So each of: struct fpreg fp_state_t has room for 8 instances of 96 bits (beyond fpsr), not sufficient for 32 double precision (i.e., 64-bit) registers. The arm code also has: # grep -r "\" /usr/src/sys/arm/ | more /usr/src/sys/arm/arm/machdep.c:fill_fpregs(struct thread *td, struct = fpreg *regs) /usr/src/sys/arm/arm/machdep.c:set_fpregs(struct thread *td, struct = fpreg *regs) /usr/src/sys/arm/include/reg.h:struct fpreg { /usr/src/sys/arm/include/reg.h:int fill_fpregs(struct thread *, = struct fpreg *); /usr/src/sys/arm/include/reg.h:int set_fpregs(struct thread *, = struct fpreg *); And the system has: /usr/src/sys/sys/procfs.h:typedef struct fpreg fpregset_t; /usr/src/sys/sys/procfs.h: size_t pr_fpregsetsz; /* = sizeof(fpregset_t) (1) */ /usr/src/sys/sys/procfs.h:typedef fpregset_t prfpregset_t; /usr/src/sys/sys/ptrace.h:struct fpreg; /usr/src/sys/sys/ptrace.h:int proc_read_fpregs(struct thread *_td, = struct fpreg *_fpreg); /usr/src/sys/sys/ptrace.h:int proc_write_fpregs(struct thread *_td, = struct fpreg *_fpreg); and: /usr/src/sys/kern/sys_process.c: * Ptrace doesn't support fpregs at all, = and there are no security holes /usr/src/sys/kern/sys_process.c: * or translations for fpregs, so we can = just copy them. /usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, = struct fpreg *fpregs) /usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs(td, = fpregs)); /usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, = struct fpreg *fpregs) /usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs(td, = fpregs)); /usr/src/sys/kern/sys_process.c: struct fpreg fpreg; /usr/src/sys/kern/sys_process.c: error =3D = COPYIN(uap->addr, &r.fpreg, sizeof r.fpreg); /usr/src/sys/kern/sys_process.c: error =3D = COPYOUT(&r.fpreg, uap->addr, sizeof r.fpreg); /usr/src/sys/kern/sys_process.c: error =3D = PROC_WRITE(fpregs, td2, addr); /usr/src/sys/kern/sys_process.c: error =3D = PROC_READ(fpregs, td2, addr); and there is use of some of the above in: /usr/src/sys/kern/sys_process.c: * proc_read_fpregs, proc_write_fpregs /usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, = struct fpreg *fpregs) /usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs(td, = fpregs)); /usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, = struct fpreg *fpregs) /usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs(td, = fpregs)); /usr/src/sys/kern/sys_process.c:proc_read_fpregs32(struct thread *td, = struct fpreg32 *fpregs32) /usr/src/sys/kern/sys_process.c: PROC_ACTION(fill_fpregs32(td, = fpregs32)); /usr/src/sys/kern/sys_process.c:proc_write_fpregs32(struct thread *td, = struct fpreg32 *fpregs32) /usr/src/sys/kern/sys_process.c: PROC_ACTION(set_fpregs32(td, = fpregs32)); I may not have found everything relevant. It appears that fp_state_t is unused in /usr/src/sys/ . Note: I was looking at /usr/src/sys/ for. . . # uname -paKU FreeBSD FreeBSDx64 12.0-CURRENT FreeBSD 12.0-CURRENT r314687M amd64 = amd64 1200023 1200023 =3D=3D=3D Mark Millard markmi at dsl-only.net