From owner-freebsd-scsi@freebsd.org Fri Sep 8 15:12:09 2017 Return-Path: Delivered-To: freebsd-scsi@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D34D8E1D184 for ; Fri, 8 Sep 2017 15:12:09 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from out1-smtp.messagingengine.com (out1-smtp.messagingengine.com [66.111.4.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A343E83155; Fri, 8 Sep 2017 15:12:09 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from compute6.internal (compute6.nyi.internal [10.202.2.46]) by mailout.nyi.internal (Postfix) with ESMTP id 3DF2120B0A; Fri, 8 Sep 2017 11:12:08 -0400 (EDT) Received: from frontend1 ([10.202.2.160]) by compute6.internal (MEProxy); Fri, 08 Sep 2017 11:12:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=samsco.org; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc:x-sasl-enc; s=fm1; bh=r0ZjKGcdF3IT9tU5Oh hDIsFU8Dd9+Ee1UfclOjypKco=; b=ttpnviIvOHlryDQfRxo4RtQb/wnZuslsIC tQ+sGnjuWd+QOk9E350IilKCIU8pgHTN+gL/1VSgKSYvCzK/7T7KqQHL3XrpJ4df HuaLBpJ+T4WYtD1NEngUSZvZ76kW160yKhZDIFHa8L+cF1dMu6vQLhqCes876lJJ OykTCnJsLeG4sfqjBSEpC0vBpwhjdr056Sp0QnkLl8/Zo7tfJSvOasK8xcFqA1xc olo6EEv52H7mBfkF4X+ifVuZlrW80UtG/a+VB3yENY1sQYfOPoOM7pCgpP44xskh CE2gbd5wv8455kFalEH1+ShYiV3gsHJSZFEldKPo8Y03O58ENlAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc:x-sasl-enc; s= fm1; bh=r0ZjKGcdF3IT9tU5OhhDIsFU8Dd9+Ee1UfclOjypKco=; b=rka/cHE2 CdUTa8Sc6hfNuyZJbbTzR5hVCKKNLKUonLWSWQ0/J9oxsK9UYBmqjJHUkXgWONnk CoQMNLWWFXljyHgoJF2oCvKhM7XaqpR0Ux/htxKP7LSDWAXTNwIFIFAABFLQE2Aw WCt/VmLJMIhAdAuj8+Pug0IvjsAnucJDMjw0/42Mgkppp7xa1O6Y2y05Wuis/VV1 bNG2vl4AW4ylXOKXkYb+Kf27LWEOImvNxC1IesRO8lLE5l2XL3PFVclZDzybKl2t CMSPyfZFFiary2J3pKMqD4FQzuMDKihrcLAtLsbuYUvDdAursOqqj5dH3OhlREGu FBRCWRWu9hBuKQ== X-ME-Sender: X-Sasl-enc: bffrFF5O84L8kYEHZH37yvgLIlsBRSNBKGkaji1PeO/B 1504883527 Received: from [10.199.7.12] (unknown [50.235.236.73]) by mail.messagingengine.com (Postfix) with ESMTPA id AF3F57F96A; Fri, 8 Sep 2017 11:12:07 -0400 (EDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: GEOM probes fail on aac with EARLY_AP_STARTUP From: Scott Long In-Reply-To: Date: Fri, 8 Sep 2017 11:12:06 -0400 Cc: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org Content-Transfer-Encoding: quoted-printable Message-Id: References: To: john hood X-Mailer: Apple Mail (2.3273) X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Sep 2017 15:12:09 -0000 Hi John, Great bug report and analysis. I think you=E2=80=99re right, behavior = in the system changed with EARLY_AP_STARTUP and the intrhook is being released too soon now, before the driver is ready for concurrent access. I=E2=80=99ll = shepherd it into SVN. There=E2=80=99s a similar pattern in most of the non-CAM = drivers, so I=E2=80=99ll review them as well. Scott > On Sep 7, 2017, at 7:19 PM, john hood wrote: >=20 > I've got a devel machine here which was failing to boot on our = vendored > FreeBSD 11.1, because GEOM was unable to find the partitions on the = boot > drive and so the root mount failed. This started happening on many = but > not all boots after I upgraded the machine from 9.3. >=20 > The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs > (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 = RAID > volumes configured on 6 SATA drives. >=20 > When booting, it sees the aac0 controller and aacd0 > volume but GEOM does not find any of the partitions on that volume, = and the > initial mount of root on /dev/aacd0p2 fails. aacd0 is available and > readable, but the expected aacd0p{1,2,3} devices do not exist. > (However, aacd1 and its partitions/devices are configured normally.) >=20 > I think it's a race condition between the aac driver and GEOM probing, > probably newly triggered/exposed by EARLY_AP_STARTUP. I've reproduced > the problem on upstream FreeBSD 11.1 and -current. Disabling > EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel = to > start correctly. 'boot -v' also causes the kernel to start correctly. >=20 > The kernel calls aac_attach() which uses > configure_intrhook_establish() to run aac_startup() later. When that > runs, it adds devices via > aac_add_container()/device_add_child()/bus_generic_attach(). >=20 > However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag > is set. It is cleared at the end of aac_startup(). It appears that > GEOM probes call aac_disk_open(), which checks the flag and returns > error if it is set. On my system the race is that the GEOM probes > happen before that flag is cleared, possibly because GEOM is tasting > aacd0 while the aac driver is still attaching aacd1. So the GEOM = probes > fail and the geom nodes never get created. If I boot with the -v = flag, > the kernel boots successfully, I think because the message printing > takes long enough to delay GEOM probing past aac_start() completion. >=20 > I've attached a patch which resolves the problem on FreeBSD-current = (and 11.1), would anybody care to adopt it and shepherd it into SVN? >=20 > regards, >=20 > --John Hood >=20 > _______________________________________________ > freebsd-scsi@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-scsi > To unsubscribe, send any mail to = "freebsd-scsi-unsubscribe@freebsd.org"