From owner-freebsd-ppc@freebsd.org Wed Jun 7 04:49:45 2017 Return-Path: Delivered-To: freebsd-ppc@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B1C64BFB249 for ; Wed, 7 Jun 2017 04:49:45 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-13.reflexion.net [208.70.210.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7761111AA for ; Wed, 7 Jun 2017 04:49:45 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 7555 invoked from network); 7 Jun 2017 04:49:43 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 7 Jun 2017 04:49:43 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.0) with SMTP; Wed, 07 Jun 2017 00:49:43 -0400 (EDT) Received: (qmail 5344 invoked from network); 7 Jun 2017 04:49:43 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 7 Jun 2017 04:49:43 -0000 Received: from [192.168.1.114] (c-76-115-7-162.hsd1.or.comcast.net [76.115.7.162]) by iron2.pdx.net (Postfix) with ESMTPSA id AC81FEC7A6F; Tue, 6 Jun 2017 21:49:42 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: 32-bit powerpc FreeBSD head -r317820 panic example: Example showing register r10 having been trashed during set_affinity's time frame Message-Id: <1738C902-7A9C-4711-A2F7-D302249C9AF8@dsl-only.net> Date: Tue, 6 Jun 2017 21:49:42 -0700 To: Justin Hibbits , Nathan Whitehorn , FreeBSD PowerPC ML , freebsd-hackers@freebsd.org X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Jun 2017 04:49:45 -0000 Summary of the analysis below: Something trashed the r10 value during set_affinity, effectively replacing a struct td_sched * value with what happened to be a struct thread * value. That in turn lead to what should have been a cpuid (for one of 4 CPUs) to instead be a struct td_sched * value. And that in turn lead to a data storage interrupt from the indexing into cpuid_to_pcpu being way out of bounds. Context: 32-bit powerpc head -r317820 used on a old PowerMac G5 so-called "Quad Core". The analysis from a panic's dump and from pictures of ddb output and using the kernel debug file. . . I got a panic that it looked like I could back calculate a ways to see what was odd somewhat earlier. This is that back calculation sequence. This will start in ipi_cpu where the panic occurred and progress backwards into sched_affinity. void ipi_cpu(int cpu, u_int ipi) { ipi_send(cpuid_to_pcpu[cpu], ipi); } was: (kgdb) x/90i ipi_cpu 0x8eb0dc : stwu r1,-32(r1) 0x8eb0e0 : mflr r0 0x8eb0e4 : stw r30,24(r1) 0x8eb0e8 : stw r31,28(r1) 0x8eb0ec : stw r0,36(r1) 0x8eb0f0 : mr r31,r1 0x8eb0f4 : bcl- 20,4*cr7+so,0x8eb0f8 0x8eb0f8 : mflr r30 0x8eb0fc : lwz r0,-32(r30) 0x8eb100 : add r30,r0,r30 0x8eb104 : rlwinm r3,r3,2,0,29 0x8eb108 : lwz r9,-32756(r30) 0x8eb10c : lwzx r3,r3,r9 0x8eb110 : bl 0x8eb018 0x8eb114 : lwz r11,0(r1) 0x8eb118 : lwz r0,4(r11) 0x8eb11c : mtlr r0 0x8eb120 : lwz r30,-8(r11) 0x8eb124 : lwz r31,-4(r11) 0x8eb128 : mr r1,r11 0x8eb12c : blr 0x8eb130 : .long 0x43ce40 The point of failure (srr0 value) is: 0x8eb10c : lwzx r3,r3,r9 (Note: This is a form of r3:=3DMEM(r3+r9,4). Luckily r3 was not updated because of the bad r3+r9 value.) Note that the only prior instruction modifying r3 in ipi_cpu is: 0x8eb104 : rlwinm r3,r3,2,0,29 The "show reg r3" result was: 0x51f2880. So prior to the shift by 2 and masking off of the least significant 2 bits its value was: r3=3D0x147CA20 I will note that the 0x51f2880 + the "show reg r9" value 0xf2c9fc (i.e., &cpuid_to_pcpu) produces: 0x611f27c -- and the exception was reported as being for virtual address 0x611f27c . By the source code structure r3 was supposed to be one of the values: 0x0, 0x1, 0x2 or 0x3 instead. (The PowerMac G5 "Quad Core" has 4 cpus.) With the "show reg r4" result being 0x1 it means that the call to ipi_cpu was apparently (by values passed): ipi_cpu(0x147CA20, 0x1) "Show allpcpu" reported that cpuid=3D3 had: curthread =3D 0x147ca20: pid 11 tid 100006 "idle: cpu3" idlethread =3D 0x147ca20: did 100006 "idle: cpu3" where 0x147d000 (for cpuid 2) has 0x147d008 holding a 0x147ca20 value. So somehow a thread address for cpuid 3 was given to ipi_cpu instead of 0x3 . The backtrace shows that the caller of ipi_cpu was sched_affinity: void sched_affinity(struct thread *td) { #ifdef SMP struct td_sched *ts; THREAD_LOCK_ASSERT(td, MA_OWNED); ts =3D td_get_sched(td); if (THREAD_CAN_SCHED(td, ts->ts_cpu)) return; if (TD_ON_RUNQ(td)) { sched_rem(td); sched_add(td, SRQ_BORING); return; } if (!TD_IS_RUNNING(td)) return; /* * Force a switch before returning to userspace. If the * target thread is not running locally send an ipi to force * the issue. */ td->td_flags |=3D TDF_NEEDRESCHED; if (td !=3D curthread) ipi_cpu(ts->ts_cpu, IPI_PREEMPT); #endif } So the implication is that ts->ts_cpu had the value 0x147ca20 instead of the value 3 at the time of the call sequence. sched_affinity was: 0x535ab8 : stwu r1,-32(r1) 0x535abc : mflr r0 0x535ac0 : stw r29,20(r1) 0x535ac4 : stw r30,24(r1) 0x535ac8 : stw r31,28(r1) 0x535acc : stw r0,36(r1) 0x535ad0 : mr r31,r1 0x535ad4 : mr r29,r3 0x535ad8 : lwz r0,0(r3) 0x535adc : addi r10,r3,808 0x535ae0 : lwz r9,8(r10) 0x535ae4 : lwz r11,48(r3) 0x535ae8 : rlwinm r0,r9,29,3,29 0x535aec : clrlwi r9,r9,27 0x535af0 : lwzx r0,r11,r0 0x535af4 : sraw r0,r0,r9 0x535af8 : andi. r9,r0,1 0x535afc : bne- 0x535b4c 0x535b00 : lwz r0,652(r3) 0x535b04 : cmpwi cr7,r0,3 0x535b08 : bne+ cr7,0x535b20 = 0x535b0c : bl 0x5341c0 0x535b10 : mr r3,r29 0x535b14 : li r4,0 0x535b18 : bl 0x53583c 0x535b1c : b 0x535b4c 0x535b20 : cmpwi cr7,r0,4 0x535b24 : bne- cr7,0x535b4c = 0x535b28 : lwz r0,156(r3) 0x535b2c : oris r0,r0,1 0x535b30 : stw r0,156(r3) 0x535b34 : mr r0,r2 0x535b38 : cmpw cr7,r3,r0 0x535b3c : beq- cr7,0x535b4c = 0x535b40 : lwz r3,8(r10) 0x535b44 : li r4,1 0x535b48 : bl 0x8eb0dc 0x535b4c : lwz r11,0(r1) 0x535b50 : lwz r0,4(r11) 0x535b54 : mtlr r0 0x535b58 : lwz r29,-12(r11) 0x535b5c : lwz r30,-8(r11) 0x535b60 : lwz r31,-4(r11) 0x535b64 : mr r1,r11 0x535b68 : blr 0x535b6c : .long 0x7e7f18 The only call executed by sched_affinity was to ipi_cpu given what is inlined and the fact that it made it to do the ipi_cpu bl instruction. Since ipi_cpu leaves various registers alone some register values from sched_affinity were preserved for "show reg" use. r10 is an example in: 0x535ad4 : mr r29,r3 . . . 0x535adc : addi r10,r3,808 . . . 0x535b40 : lwz r3,8(r10) 0x535b44 : li r4,1 0x535b48 : bl 0x8eb0dc (r29 is another register value that is preserved in what sched_affinity called. r29 reports what r3 was when sched_affinity was called: 0x147d000, which makes sense. Also 808 is in decimal notation.) That sched_affinity code corresponds to (r10 for ts below): sched_affinity(struct thread *td) . . . struct td_sched *ts; . . . ts =3D td_get_sched(td); . . . ipi_cpu(ts->ts_cpu, IPI_PREEMPT); where "show reg r10" reported: 0x147d000, which matched the "show allpcpu" for cpuid=3D2: curthread =3D 0x147d000: pid11 tid 100005 "idle: cpu2" But it should not match that. . . r10 should have been 0x147d328 (0x147d00+0x328) but is reported as 0x147d000 and the behavior reported for ipi_cpu agrees with 0x147d000: 8(r10) then extracts a 0x0147ca20 value from 0x147d008 instead of a 0x0 from 0x147d328 . So something trashed r10 before: 0x535b40 : lwz r3,8(r10) and so effectively replaced ts with the td value. Trashing r3 so that r3+808=3D=3D0x147d000 when 0x535adc executes seems unlikely but is a technical possibility for how r10 ends up trashed. Other notes: Some of the details not shown above are: static __inline __pure2 struct td_sched * td_get_sched(struct thread *td) { =20 return ((struct td_sched *)&td[1]); } and: /* * Thread scheduler specific section. All fields are protected * by the thread lock. */ struct td_sched { struct runq *ts_runq; /* Run-queue we're queued on. */ short ts_flags; /* TSF_* flags. */ int ts_cpu; /* CPU that we have affinity = for. */ int ts_rltick; /* Real last tick, for affinity. = */ int ts_slice; /* Ticks of slice remaining. */ u_int ts_slptime; /* Number of ticks we vol. slept = */ u_int ts_runtime; /* Number of ticks we were = running */ int ts_ltick; /* Last tick that we were = running on */ int ts_ftick; /* First tick that we were = running on */ int ts_ticks; /* Tick count */ #ifdef KTR char ts_name[TS_NAME_LEN]; #endif }; where td_sched is a suffix after a "struct thread", such as the space reserved in: struct thread0_storage { struct thread t0st_thread; uint64_t t0st_sched[10]; }; and checked by: _Static_assert(sizeof(struct thread) + sizeof(struct td_sched) <=3D sizeof(struct thread0_storage), "increase struct thread0_storage.t0st_sched size"); =3D=3D=3D Mark Millard markmi at dsl-only.net