From owner-freebsd-ppc@FreeBSD.ORG Sat Jun 2 15:45:48 2012 Return-Path: Delivered-To: powerpc@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D6511106566B for ; Sat, 2 Jun 2012 15:45:48 +0000 (UTC) (envelope-from marcelm@juniper.net) Received: from exprod7og127.obsmtp.com (exprod7og127.obsmtp.com [64.18.2.210]) by mx1.freebsd.org (Postfix) with ESMTP id 895598FC08 for ; Sat, 2 Jun 2012 15:45:48 +0000 (UTC) Received: from P-EMHUB03-HQ.jnpr.net ([66.129.224.36]) (using TLSv1) by exprod7ob127.postini.com ([64.18.6.12]) with SMTP ID DSNKT8o1JrIqq0axrC/3MMqtWYZd2ppN6gQS@postini.com; Sat, 02 Jun 2012 08:45:48 PDT Received: from EMBX01-HQ.jnpr.net ([fe80::c821:7c81:f21f:8bc7]) by P-EMHUB03-HQ.jnpr.net ([::1]) with mapi; Sat, 2 Jun 2012 08:43:20 -0700 From: Marcel Moolenaar To: "powerpc@freebsd.org" Date: Sat, 2 Jun 2012 08:43:21 -0700 Thread-Topic: [P2020] Infinite EXC_ISI on executing /sbin/init Thread-Index: Ac1A1m80arQ8mDA9ScqtKG8zKSxMFg== Message-ID: <276B630D-417B-4FB1-82C6-676EB31C6275@juniper.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: Subject: [P2020] Infinite EXC_ISI on executing /sbin/init X-BeenThere: freebsd-ppc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the PowerPC List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Jun 2012 15:45:48 -0000 All, On a P2020 system (kernel configured without SMP -- see other email) we loo= se forward progress due to a TLB issue. In a nutshell, this is what I'm seeing= : 1. The kernel exits to execute the very first instruction of /sbin/init. 2. assumption: the kernel gets a TLB miss exception. 3. assumption: the miss cannot be handled so a fake TLB is created to trigger an ISI. 4. The kernel gets an ISI and calls vm_fault(). The contents of TLB0 WRT to the process is: 125: ( ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 2 mas1 =3D 0x000201= 00 mas2(va) =3D 0x7fffd004 mas3(pa) =3D 0x01ebe03f 253: ( ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x000301= 00 mas2(va) =3D 0x7fffd004 mas3(pa) =3D 0xffff0000 380: ( ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x000301= 00 mas2(va) =3D 0x7fffc004 mas3(pa) =3D 0xffff0000 381: (V ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x800301= 00 mas2(va) =3D 0x7fffd004 mas3(pa) =3D 0x01ed300f 508: (V ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x800301= 00 mas2(va) =3D 0x7fffc004 mas3(pa) =3D 0x01ed400f I don't see the fake TLB entry for init's entry point (0x0180000) so I'= m not sure (3) above happened. 5. mmu_booke_enter() is called, which flushes the TLB (i.e. removes the fake entry and adds the real one to the PMAP's page tables. 6. assumptipn: the kernel exists from the ISI trap and gets a TLB miss exception. 7. normally this can be handled and everything is fine, except what I'm seeing is that the kernel gets another ISI -- so it looks we're back at point 3. The TLB contents on second and subsequent ISI exceptions is effectively the same as given at (4) above: 253: ( ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x000301= 00 mas2(va) =3D 0x7fffd004 mas3(pa) =3D 0xffff0000 380: ( ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x000301= 00 mas2(va) =3D 0x7fffc004 mas3(pa) =3D 0xffff0000 381: (V ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x800301= 00 mas2(va) =3D 0x7fffd004 mas3(pa) =3D 0x01ed300f 508: (V ) [AS=3D0] sz =3D 0x00001000 tsz =3D 1 tid =3D 3 mas1 =3D 0x800301= 00 mas2(va) =3D 0x7fffc004 mas3(pa) =3D 0x01ed400f Questions: 1. Why don't I see the fake TLB 0 entry for init's entry point? 2. Assuming we're not looking at a TLB miss, what else can cause the ISI? The RM states that endianness can be another reason for the ISI, but I don't see anything wrong there. BTW: I already looked at the I-cache synchronization logic and tweaked it. No change. I also revisited the I-cache & D-cache enable & invalidate code and tweaked that too. No change. In short: I'm running out of ideas. Could this be related to the other P2020 issue I described: [P2020] FreeBSD cannot enable 2nd core. They're both pretty weird and together could indicate some hardware problem, right? Then again: A FreeBSD 6.1 derived version boots at least UP... --=20 Marcel Moolenaar marcelm@juniper.net