Date: Wed, 26 Feb 2020 18:03:09 +0000 From: bugzilla-noreply@freebsd.org To: bugs@FreeBSD.org Subject: [Bug 237463] aacraid(4) doesn't work on powerpc64 Message-ID: <bug-237463-227-ieZCZsaPFg@https.bugs.freebsd.org/bugzilla/> In-Reply-To: <bug-237463-227@https.bugs.freebsd.org/bugzilla/> References: <bug-237463-227@https.bugs.freebsd.org/bugzilla/>
next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237463 --- Comment #5 from Leandro Lupori <luporl@FreeBSD.org> --- I've noticed that the AIF interrupts always occur about 5 minutes after a reboot. Luckily, they occur on Petitboot too, which made it possible to collect the following information about the remaining issue: / # dmesg | tail -20 [ 40.494002] sd 1:2:23:0: [sdi] 4096-byte physical blocks [ 40.494004] scsi 1:3:123:0: Enclosure ADAPTEC Smart Adapter = 4.02 PQ: 0 ANSI: 5 [ 40.495376] sd 1:2:23:0: [sdi] Write Protect is off [ 40.495379] sd 1:2:23:0: [sdi] Mode Sense: 46 00 10 08 [ 40.495520] scsi 1:3:123:0: Attached scsi generic sg11 type 13 [ 40.498220] sd 1:2:23:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 40.533826] udevd[2649]: inotify_add_watch(6, /dev/dm-8, 10) failed: No = such file or directory [ 40.585006] sd 1:2:23:0: [sdi] Attached SCSI disk [ 41.437318] udevd[2688]: inotify_add_watch(6, /dev/dm-11, 10) failed: No such file or directory [ 321.101655] sd 1:2:16:0: [sdb] Synchronizing SCSI cache [ 321.102364] sd 1:2:16:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=3DDID_NO_CONNECT driverbyte=3DDRIVER_OK [ 334.245061] scsi 1:2:16:0: Direct-Access ATA ST4000NM0115-1YZ S= N04 PQ: 0 ANSI: 6 [ 334.250710] sd 1:2:16:0: Attached scsi generic sg2 type 0 [ 334.260739] sd 1:2:16:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [ 334.260742] sd 1:2:16:0: [sdb] 4096-byte physical blocks [ 334.261614] sd 1:2:16:0: [sdb] Write Protect is off [ 334.261616] sd 1:2:16:0: [sdb] Mode Sense: 46 00 10 08 [ 334.264430] sd 1:2:16:0: [sdb] Write cache: disabled, read cache: enable= d, supports DPO and FUA [ 334.325386] sdb: sdb1 sdb2 sdb3 [ 334.349896] sd 1:2:16:0: [sdb] Attached SCSI disk /var/petitboot/mnt/dev/sda2/bsd # ./arcconf getlogs 1 event Controllers found: 1 <ControllerLog controllerID=3D"0" time=3D"Wed Feb 26 16:52:47 2020"> <eventlog> <event message=3D"Previous Firmware Lockup Detected, Lockup Code=3D= 227 Detail=3D0x00000000" eventTag=3D"1" relativeControllerTime=3D"4" eventClass= Code=3D"12" eventSubClassCode=3D"0" eventDetailCode =3D"0"/> <event message=3D"Cache battery/Super cap is missing" eventTag=3D"2" relativeControllerTime=3D"4" eventClassCode=3D"2" eventSubClassCode=3D"4" eventDetailCode=3D"2"/> <event message=3D"Encryption Self-Test failed" eventTag=3D"3" relativeControllerTime=3D"4" eventClassCode=3D"2" eventSubClassCode=3D"10" eventDetailCode=3D"0"/> <event message=3D"Hot-plug drive removed, Port=3DC0 Box=3D1 Bay=3D0= SN=3D=20=20=20=20=20=20=20=20 ZC19RD9E" eventTag=3D"4" relativeControllerTime=3D"335" eventClassCode= =3D"1" eventSubClassCode=3D"0" eventDetailCode =3D"0"/> <event message=3D"Physical drive failure, Port=3DC0 Box=3D1 Bay=3D0 reason=3D0x14" eventTag=3D"5" relativeControllerTime=3D"335" eventClassCode= =3D"4" eventSubClassCode=3D"0" eventDetailCode=3D"0"/> <event message=3D"Hot-plug drive inserted, Port=3DC0 Box=3D1 Bay=3D= 0 SN=3D=20=20=20=20=20=20=20 ZC19RD9E" eventTag=3D"6" relativeControllerTime=3D"348" eventClassCode= =3D"1" eventSubClassCode=3D"0" eventDetailCod e=3D"1"/> <event message=3D"Drive is re-enabled, Port=3DC0 Box=3D1 Bay=3D0" e= ventTag=3D"7" relativeControllerTime=3D"348" eventClassCode=3D"4" eventSubClassCode=3D"0" eventDetailCode=3D"3"/> </eventlog> </ControllerLog> So, the AIFs are about the drive being removed and then re-inserted after a= few seconds, which explains the "Target Selection Timeout" errors that were bei= ng seen right after the AIF interrupts occurred. However, further investigation is needed to understand why the drive is bei= ng removed. It could be due to a bad HDD/SAS expander cable, a write cache iss= ue, or maybe a setup issue with the 2 SAS controllers/cabling on the machine, or maybe something else. --=20 You are receiving this mail because: You are the assignee for the bug.=
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-237463-227-ieZCZsaPFg>