Date: Thu, 7 Sep 2017 19:19:40 -0400 From: john hood <cgull@glup.org> To: freebsd-scsi@freebsd.org, jhb@freebsd.org, jkim@freebsd.org Subject: GEOM probes fail on aac with EARLY_AP_STARTUP Message-ID: <f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe@glup.org>
next in thread | raw e-mail | index | archive | help
This is a multi-part message in MIME format. --------------CFD810116ED4341A2805A58E Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable I've got a devel machine here which was failing to boot on our vendored FreeBSD 11.1, because GEOM was unable to find the partitions on the boot drive and so the root mount failed.=C2=A0 This started happening on many = but not all boots after I upgraded the machine from 9.3. The machine is an Intel S25520UR motherboard with 2x Xeon E5620 CPUs (Hyperthreading enabled, so hw.ncpu=3D16) and an Adaptec 5805, and 2 RAID= volumes configured on 6 SATA drives. When booting, it sees the aac0 controller and aacd0 volume but GEOM does not find any of the partitions on that volume, and t= he initial mount of root on /dev/aacd0p2 fails. aacd0 is available and readable, but the expected aacd0p{1,2,3} devices do not exist. (However, aacd1 and its partitions/devices are configured normally.) I think it's a race condition between the aac driver and GEOM probing, probably newly triggered/exposed by EARLY_AP_STARTUP.=C2=A0 I've reproduc= ed the problem on upstream FreeBSD 11.1 and -current.=C2=A0 Disabling EARLY_AP_STARTUP, or setting kern.smp.disabled=3D1, causes the kernel to start correctly. 'boot -v' also causes the kernel to start correctly. The kernel calls aac_attach() which uses configure_intrhook_establish() to run aac_startup() later. When that runs, it adds devices via aac_add_container()/device_add_child()/bus_generic_attach(). However, at the beginning of aac_attach(), an AAC_STATE_SUSPEND flag is set. It is cleared at the end of aac_startup(). It appears that GEOM probes call aac_disk_open(), which checks the flag and returns error if it is set. On my system the race is that the GEOM probes happen before that flag is cleared, possibly because GEOM is tasting aacd0 while the aac driver is still attaching aacd1. So the GEOM probes fail and the geom nodes never get created. If I boot with the -v flag, the kernel boots successfully, I think because the message printing takes long enough to delay GEOM probing past aac_start() completion. I've attached a patch which resolves the problem on FreeBSD-current (and = 11.1), would anybody care to adopt it and shepherd it into SVN? regards, --John Hood --------------CFD810116ED4341A2805A58E Content-Type: text/plain; charset=UTF-8; x-mac-type="0"; x-mac-creator="0"; name="aac.diff" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="aac.diff" T25seSBpbiBzeXMvYW1kNjQvY29tcGlsZTogQUFDUFJPQkUKT25seSBpbiBzeXMvYW1kNjQv Y29uZjogQUFDUFJPQkUKT25seSBpbiBzeXMvYW1kNjQvY29uZjogQUFDUFJPQkV+CmRpZmYg LXUgLXIgc3lzLm9yaWcvZGV2L2FhYy9hYWMuYyBzeXMvZGV2L2FhYy9hYWMuYwotLS0gc3lz Lm9yaWcvZGV2L2FhYy9hYWMuYwkyMDE3LTA5LTA1IDA5OjA2OjI2LjAwMDAwMDAwMCAtMDQw MAorKysgc3lzL2Rldi9hYWMvYWFjLmMJMjAxNy0wOS0wNyAxNDoyNzozMi40NjE1MjgwMDAg LTA0MDAKQEAgLTQxOCw5ICs0MTgsNiBAQAogCXNjID0gKHN0cnVjdCBhYWNfc29mdGMgKilh cmc7CiAJZndwcmludGYoc2MsIEhCQV9GTEFHU19EQkdfRlVOQ1RJT05fRU5UUllfQiwgIiIp OwogCi0JLyogZGlzY29ubmVjdCBvdXJzZWx2ZXMgZnJvbSB0aGUgaW50cmhvb2sgY2hhaW4g Ki8KLQljb25maWdfaW50cmhvb2tfZGlzZXN0YWJsaXNoKCZzYy0+YWFjX2ljaCk7Ci0KIAlt dHhfbG9jaygmc2MtPmFhY19pb19sb2NrKTsKIAlhYWNfYWxsb2Nfc3luY19maWIoc2MsICZm aWIpOwogCkBAIC00MzcsMTIgKzQzNCwxNSBAQAogCWFhY19yZWxlYXNlX3N5bmNfZmliKHNj KTsKIAltdHhfdW5sb2NrKCZzYy0+YWFjX2lvX2xvY2spOwogCisJLyogbWFyayB0aGUgY29u dHJvbGxlciB1cCAqLworCXNjLT5hYWNfc3RhdGUgJj0gfkFBQ19TVEFURV9TVVNQRU5EOwor CiAJLyogcG9rZSB0aGUgYnVzIHRvIGFjdHVhbGx5IGF0dGFjaCB0aGUgY2hpbGQgZGV2aWNl cyAqLwogCWlmIChidXNfZ2VuZXJpY19hdHRhY2goc2MtPmFhY19kZXYpKQogCQlkZXZpY2Vf cHJpbnRmKHNjLT5hYWNfZGV2LCAiYnVzX2dlbmVyaWNfYXR0YWNoIGZhaWxlZFxuIik7CiAK LQkvKiBtYXJrIHRoZSBjb250cm9sbGVyIHVwICovCi0Jc2MtPmFhY19zdGF0ZSAmPSB+QUFD X1NUQVRFX1NVU1BFTkQ7CisJLyogZGlzY29ubmVjdCBvdXJzZWx2ZXMgZnJvbSB0aGUgaW50 cmhvb2sgY2hhaW4gKi8KKwljb25maWdfaW50cmhvb2tfZGlzZXN0YWJsaXNoKCZzYy0+YWFj X2ljaCk7CiAKIAkvKiBlbmFibGUgaW50ZXJydXB0cyBub3cgKi8KIAlBQUNfVU5NQVNLX0lO VEVSUlVQVFMoc2MpOwpPbmx5IGluIHN5cy9kZXYvYWFjOiBhYWMuYy5vcmlnCk9ubHkgaW4g c3lzL2Rldi9hYWM6IGFhYy5jfgpPbmx5IGluIHN5cy9kZXYvYWFjOiBhYWNfZGlzay5jLm9y aWcKT25seSBpbiBzeXMvZGV2L2FhYzogYWFjX2Rpc2suY34KT25seSBpbiBzeXMvZ2VvbTog Z2VvbV9kaXNrLmMub3JpZwpPbmx5IGluIHN5cy9nZW9tOiBnZW9tX2Rpc2suY34K --------------CFD810116ED4341A2805A58E--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?f2dceefe-8c2c-1bad-95ab-9dd138c8fcbe>