Date: Fri, 8 Sep 2017 11:12:06 -0400 From: Scott Long <scottl@samsco.org> To: john hood <cgull@glup.org> Cc: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org Subject: Re: GEOM probes fail on aac with EARLY_AP_STARTUP Message-ID: <AB09EF90-2392-4D1D-8F22-B6E1EBDD0E45@samsco.org> In-Reply-To: <f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe@glup.org> References: <f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe@glup.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi John, Great bug report and analysis. I think you=E2=80=99re right, behavior = in the system changed with EARLY_AP_STARTUP and the intrhook is being released too soon now, before the driver is ready for concurrent access. I=E2=80=99ll = shepherd it into SVN. There=E2=80=99s a similar pattern in most of the non-CAM = drivers, so I=E2=80=99ll review them as well. Scott > On Sep 7, 2017, at 7:19 PM, john hood <cgull@glup.org> wrote: >=20 > I've got a devel machine here which was failing to boot on our = vendored > FreeBSD 11.1, because GEOM was unable to find the partitions on the = boot > drive and so the root mount failed. This started happening on many = but > not all boots after I upgraded the machine from 9.3. >=20 > The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs > (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 = RAID > volumes configured on 6 SATA drives. >=20 > When booting, it sees the aac0 controller and aacd0 > volume but GEOM does not find any of the partitions on that volume, = and the > initial mount of root on /dev/aacd0p2 fails. aacd0 is available and > readable, but the expected aacd0p{1,2,3} devices do not exist. > (However, aacd1 and its partitions/devices are configured normally.) >=20 > I think it's a race condition between the aac driver and GEOM probing, > probably newly triggered/exposed by EARLY_AP_STARTUP. I've reproduced > the problem on upstream FreeBSD 11.1 and -current. Disabling > EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel = to > start correctly. 'boot -v' also causes the kernel to start correctly. >=20 > The kernel calls aac_attach() which uses > configure_intrhook_establish() to run aac_startup() later. When that > runs, it adds devices via > aac_add_container()/device_add_child()/bus_generic_attach(). >=20 > However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag > is set. It is cleared at the end of aac_startup(). It appears that > GEOM probes call aac_disk_open(), which checks the flag and returns > error if it is set. On my system the race is that the GEOM probes > happen before that flag is cleared, possibly because GEOM is tasting > aacd0 while the aac driver is still attaching aacd1. So the GEOM = probes > fail and the geom nodes never get created. If I boot with the -v = flag, > the kernel boots successfully, I think because the message printing > takes long enough to delay GEOM probing past aac_start() completion. >=20 > I've attached a patch which resolves the problem on FreeBSD-current = (and 11.1), would anybody care to adopt it and shepherd it into SVN? >=20 > regards, >=20 > --John Hood >=20 > <aac.diff>_______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?AB09EF90-2392-4D1D-8F22-B6E1EBDD0E45>