From owner-freebsd-hackers@freebsd.org Sun Sep 17 02:48:06 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5D308E1CD04 for ; Sun, 17 Sep 2017 02:48:06 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: from asp.reflexion.net (outbound-mail-210-66.reflexion.net [208.70.210.66]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 1ECCB6E26B for ; Sun, 17 Sep 2017 02:48:05 +0000 (UTC) (envelope-from markmi@dsl-only.net) Received: (qmail 22879 invoked from network); 17 Sep 2017 02:48:04 -0000 Received: from unknown (HELO rtc-sm-01.app.dca.reflexion.local) (10.81.150.1) by 0 (rfx-qmail) with SMTP; 17 Sep 2017 02:48:04 -0000 Received: by rtc-sm-01.app.dca.reflexion.local (Reflexion email security v8.40.3) with SMTP; Sat, 16 Sep 2017 22:48:04 -0400 (EDT) Received: (qmail 2970 invoked from network); 17 Sep 2017 02:48:04 -0000 Received: from unknown (HELO iron2.pdx.net) (69.64.224.71) by 0 (rfx-qmail) with (AES256-SHA encrypted) SMTP; 17 Sep 2017 02:48:04 -0000 Received: from [192.168.1.109] (c-67-170-167-181.hsd1.or.comcast.net [67.170.167.181]) by iron2.pdx.net (Postfix) with ESMTPSA id B3016EC86EE; Sat, 16 Sep 2017 19:48:03 -0700 (PDT) From: Mark Millard Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: SOLVED (patch): FYI: Pine64+ 2GB (so A64) booting and non-debug vs. debug kernel: "APs not started" for failure cases only: solved with patch Date: Sat, 16 Sep 2017 19:48:03 -0700 References: <1C18FF04-6772-4E9C-88C5-B8D5478C5809@dsl-only.net> <6D63486A-E933-4CC2-9A24-0688BE01A0DA@dsl-only.net> <8E15A747-3413-4537-9ECA-5EDAD1285351@dsl-only.net> <256CF612-1D52-4BCC-981B-E476F6EEC9AB@dsl-only.net> <2FC8A531-5E8F-4765-A1F3-A8D6A6AA0C14@dsl-only.net> To: Emmanuel Vadot , freebsd-arm , freebsd-hackers In-Reply-To: Message-Id: <514378CB-4E18-4BA6-B1BF-AEFFAFB7ADED@dsl-only.net> X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 17 Sep 2017 02:48:06 -0000 On 2017-Sep-16, at 7:21 PM, Mark Millard wrote: > [As the following does not flow well from the previous > message and stands somewhat on its own in some respects: > I top post this.] >=20 > ffff0000005fb14c adrp x8, ffff000000aaa000 = > ffff0000005fb150 add x8, x8, #0x778 > ffff0000005fb154 adrp x0, ffff0000006c9000 = > ffff0000005fb158 add x0, x0, #0xb8a > ffff0000005fb15c stlr w20, [x8] > ffff0000005fb160 sev >=20 > I ran into the following mention of SEV and making sure > it is appropriately delayed: >=20 >> The mechanism that signals an event to other PEs is IMPLEMENTATION = DEFINED. The PE is not required to guarantee the ordering of this event = with respect to the completion of memory accesses by instructions before = the SEV instruction. Therefore, ARM recommends that software includes a = DSB instruction before any SEV instruction. >>=20 >> The SEVL instruction appears to execute in program order relative to = any subsequent WFE instruction executed on the same PE, without the need = for any explicit insertion of barrier instructions. The following patch has allowed me to boot the pine64+ 2GB with even a non-debug kernel. # svnlite diff /usr/src/sys/arm64/arm64/mp_machdep.c Index: /usr/src/sys/arm64/arm64/mp_machdep.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D --- /usr/src/sys/arm64/arm64/mp_machdep.c (revision 323246) +++ /usr/src/sys/arm64/arm64/mp_machdep.c (working copy) @@ -236,7 +236,9 @@ =20 atomic_store_rel_int(&aps_ready, 1); /* Wake up the other CPUs */ - __asm __volatile("sev"); + __asm __volatile( + "dsb ish \n" + "sev \n"); =20 printf("Release APs\n"); I will add this patch to bugzilla 222234. =3D=3D=3D Mark Millard markmi at dsl-only.net On 2017-Sep-16, at 4:08 PM, Mark Millard wrote: > [Adding a couple of numbers that help > interpret what I found as far as what > specifically did not work as expected.] >=20 > On 2017-Sep-16, at 3:17 PM, Mark Millard wrote: >=20 >> A new finding: >>=20 >> When verbose boot messages are enabled >> there is an earlier contrast between when >> booting works overall vs. when it later >> fails: >>=20 >> When it works: >>=20 >> subsystem f000000 >> release_aps(0)... Release APs >> done. >>=20 >> When it fails:=20 >>=20 >> subsystem f000000 >> release_aps(0)... Release APs >> APs not started >> done. >>=20 >> And it well explains why ->pc_curthread >> ends up NULL for secondaries (in particular >> cpu =3D=3D 1), init_secondary had never executed >> the assignments show below:=20 >>=20 >> while (!aps_ready) >> __asm __volatile("wfe"); >>=20 >> /* Initialize curthread */ >> KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); >> pcpup->pc_curthread =3D pcpup->pc_idlethread; >> pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >>=20 >> The subsystem messages are from: >>=20 >> static void >> release_aps(void *dummy __unused) >> { =20 >> int i; >>=20 >> /* Only release CPUs if they exist */ >> if (mp_ncpus =3D=3D 1) >> return; >>=20 >> intr_pic_ipi_setup(IPI_AST, "ast", ipi_ast, NULL); >> intr_pic_ipi_setup(IPI_PREEMPT, "preempt", ipi_preempt, NULL); >> intr_pic_ipi_setup(IPI_RENDEZVOUS, "rendezvous", ipi_rendezvous, = NULL); >> intr_pic_ipi_setup(IPI_STOP, "stop", ipi_stop, NULL); >> intr_pic_ipi_setup(IPI_STOP_HARD, "stop hard", ipi_stop, NULL); >> intr_pic_ipi_setup(IPI_HARDCLOCK, "hardclock", ipi_hardclock, = NULL); >>=20 >> atomic_store_rel_int(&aps_ready, 1); >> /* Wake up the other CPUs */ >> __asm __volatile("sev"); >>=20 >> printf("Release APs\n"); >>=20 >> for (i =3D 0; i < 2000; i++) { >> if (smp_started) >> return; >> DELAY(1000); >> } >>=20 >> printf("APs not started\n"); >> } =20 >> SYSINIT(start_aps, SI_SUB_SMP, SI_ORDER_FIRST, release_aps, NULL); >>=20 >>=20 >> init_secondary has an example or two of not using >> atomic_load_acq_int when atomic_store_rel_int is in >> use. One is: >>=20 >> while (!aps_ready) >> __asm __volatile("wfe"); >>=20 >> /* Initialize curthread */ >> KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); >> pcpup->pc_curthread =3D pcpup->pc_idlethread; >> pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >>=20 >> where aps_ready was declared via: >>=20 >> /* Set to 1 once we're ready to let the APs out of the pen. */ >> volatile int aps_ready =3D 0; >>=20 >> where release_aps has the use of atomic_store_rel_int: >>=20 >> atomic_store_rel_int(&aps_ready, 1); >> /* Wake up the other CPUs */ >> __asm __volatile("sev"); >>=20 >> There is also in init_secondary: >>=20 >> atomic_add_rel_32(&smp_cpus, 1); >>=20 >> if (smp_cpus =3D=3D mp_ncpus) { >> /* enable IPI's, tlb shootdown, freezes etc */ >> atomic_store_rel_int(&smp_started, 1); >> } >>=20 >> where smp_cpus is accessed without being explicitly >> atomic. mp_ncpus seems to have no atomic use at all. >>=20 >> Where: >>=20 >> /usr/src/sys/sys/smp.h:extern int smp_cpus; >> /usr/src/sys/kern/subr_smp.c:int smp_cpus =3D 1; /* how many cpu's = running */ >>=20 >> So no "volatile", unlike the earlier example. >>=20 >> /usr/src/sys/kern/kern_umtx.c: if (smp_cpus > 1) { >> /usr/src/sys/kern/subr_smp.c:SYSCTL_INT(_kern_smp, OID_AUTO, cpus, = CTLFLAG_RD|CTLFLAG_CAPRD, &smp_cpus, 0, >>=20 >> /usr/src/sys/sys/smp.h:extern int mp_ncpus; >> /usr/src/sys/kern/subr_smp.c:int mp_ncpus; >>=20 >>=20 >> The smp_started is not explicitly accessed as atomic >> in release_aps but in init_secondary has its update >> to 1 via: >>=20 >> mtx_lock_spin(&ap_boot_mtx); >>=20 >> atomic_add_rel_32(&smp_cpus, 1); >>=20 >> if (smp_cpus =3D=3D mp_ncpus) { >> /* enable IPI's, tlb shootdown, freezes etc */ >> atomic_store_rel_int(&smp_started, 1); >> } >>=20 >> mtx_unlock_spin(&ap_boot_mtx); >>=20 >> where: >>=20 >> /usr/src/sys/sys/smp.h:extern volatile int smp_started; >> /usr/src/sys/kern/subr_smp.c:volatile int smp_started; >>=20 >> ("volatile" again for this context.) >>=20 >> I'll also note that for the sparc64 architecture >> there is some code like: >>=20 >> if (__predict_false(atomic_load_acq_int(&smp_started) =3D=3D 0)) >>=20 >> that is explicitly matched to the atomic_store_rel_int >> in its mp_machdep.c . >>=20 >> I do not have enough background aarch64 knowledge to >> know if it is provable that atomic_load_acq_int is not >> needed in some of these cases. >>=20 >> But getting "APs not started" at least sometimes >> suggests an intermittent failure of the code as >> it is. >>=20 >>=20 >> Another difference is lack of explicit initialization >> of smp_started but explicit initialization of aps_ready >> and smp_cpus . >>=20 >>=20 >>=20 >> I have no clue if the boot sequence is supposed to >> handle "APs not started" by reverting to not being >> a symmetric multiprocessing boot or some other >> specific way instead of trying to avoiding use of >> what was not initialized by: >>=20 >> pcpup->pc_curthread =3D pcpup->pc_idlethread; >> pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >>=20 >> in init_secondary. >=20 >=20 > Example mp_ncpus and smp_cpus figures from a > failed Pine64+ 2GB boot: >=20 > db> print/x *smp_cpus > 1 > db> print/x *mp_ncpus > 138800000004 >=20 > But that should be a 4 byte width. Showing > some context for reference: >=20 > db> x/bx mp_ncpus-4,4=20 > rebooting: 0 0 0 0 > db> x/bx mp_ncpus,4 > mp_ncpus: 4 0 0 0 > db> x/bx mp_ncpus+4,4=20 > scsi_delay: 88 13 0 0 >=20 > For completeness: >=20 > db> x/bx smp_cpus-4,4 > sysctl___kern_smp_disabled+0x5c: 0 0 0 0 > db> x/bx smp_cpus,4 > smp_cpus: 1 0 0 0 > db> x/bx smp_cpus+4,4=20 > smp_cpus+0x4: 0 0 0 0 >=20 > So smp_cpus was not incremented in memory. This > goes along with no occurances of: >=20 > pcpup->pc_curthread =3D pcpup->pc_idlethread; > pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; >=20 > updates happening in init_secondary: >=20 > /* Spin until the BSP releases the APs */ > while (!aps_ready) > __asm __volatile("wfe"); >=20 > /* Initialize curthread */ > KASSERT(PCPU_GET(idlethread) !=3D NULL, ("no idle thread")); > pcpup->pc_curthread =3D pcpup->pc_idlethread; > pcpup->pc_curpcb =3D pcpup->pc_idlethread->td_pcb; > . . . > mtx_lock_spin(&ap_boot_mtx); >=20 > atomic_add_rel_32(&smp_cpus, 1); >=20 > if (smp_cpus =3D=3D mp_ncpus) { > /* enable IPI's, tlb shootdown, freezes etc */ > atomic_store_rel_int(&smp_started, 1); > } >=20 > mtx_unlock_spin(&ap_boot_mtx); >=20 > Which seems to imply that the aps_ready > update: >=20 > atomic_store_rel_int(&aps_ready, 1); > /* Wake up the other CPUs */ > __asm __volatile("sev"); >=20 > in release_aps was not seen in the: >=20 > while (!aps_ready) > __asm __volatile("wfe"); >=20 > in init_secondary. >=20 > My guess is that "while (!aps_ready)" needs > to be explicit about its atomic status. > aps_ready is already volatile but apparently > that is not enough for this context to be > reliable. >=20 > The other potential needs for explicit atomics > are for later execution but may be required > overall as well.