Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 26 Feb 2020 18:03:09 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 237463] aacraid(4) doesn't work on powerpc64
Message-ID:  <bug-237463-227-ieZCZsaPFg@https.bugs.freebsd.org/bugzilla/>
In-Reply-To: <bug-237463-227@https.bugs.freebsd.org/bugzilla/>
References:  <bug-237463-227@https.bugs.freebsd.org/bugzilla/>

next in thread | previous in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D237463

--- Comment #5 from Leandro Lupori <luporl@FreeBSD.org> ---
I've noticed that the AIF interrupts always occur about 5 minutes after a
reboot.
Luckily, they occur on Petitboot too, which made it possible to collect the
following information about the remaining issue:


/ # dmesg | tail -20
[   40.494002] sd 1:2:23:0: [sdi] 4096-byte physical blocks
[   40.494004] scsi 1:3:123:0: Enclosure         ADAPTEC  Smart Adapter    =
4.02
PQ: 0 ANSI: 5
[   40.495376] sd 1:2:23:0: [sdi] Write Protect is off
[   40.495379] sd 1:2:23:0: [sdi] Mode Sense: 46 00 10 08
[   40.495520] scsi 1:3:123:0: Attached scsi generic sg11 type 13
[   40.498220] sd 1:2:23:0: [sdi] Write cache: enabled, read cache: enabled,
supports DPO and FUA
[   40.533826] udevd[2649]: inotify_add_watch(6, /dev/dm-8, 10) failed: No =
such
file or directory
[   40.585006] sd 1:2:23:0: [sdi] Attached SCSI disk
[   41.437318] udevd[2688]: inotify_add_watch(6, /dev/dm-11, 10) failed: No
such file or directory
[  321.101655] sd 1:2:16:0: [sdb] Synchronizing SCSI cache
[  321.102364] sd 1:2:16:0: [sdb] Synchronize Cache(10) failed: Result:
hostbyte=3DDID_NO_CONNECT driverbyte=3DDRIVER_OK
[  334.245061] scsi 1:2:16:0: Direct-Access     ATA      ST4000NM0115-1YZ S=
N04
PQ: 0 ANSI: 6
[  334.250710] sd 1:2:16:0: Attached scsi generic sg2 type 0
[  334.260739] sd 1:2:16:0: [sdb] 7814037168 512-byte logical blocks: (4.00
TB/3.64 TiB)
[  334.260742] sd 1:2:16:0: [sdb] 4096-byte physical blocks
[  334.261614] sd 1:2:16:0: [sdb] Write Protect is off
[  334.261616] sd 1:2:16:0: [sdb] Mode Sense: 46 00 10 08
[  334.264430] sd 1:2:16:0: [sdb] Write cache: disabled, read cache: enable=
d,
supports DPO and FUA
[  334.325386]  sdb: sdb1 sdb2 sdb3
[  334.349896] sd 1:2:16:0: [sdb] Attached SCSI disk


/var/petitboot/mnt/dev/sda2/bsd # ./arcconf getlogs 1 event
Controllers found: 1
<ControllerLog controllerID=3D"0" time=3D"Wed Feb 26 16:52:47 2020">
    <eventlog>
        <event message=3D"Previous Firmware Lockup Detected, Lockup Code=3D=
227
Detail=3D0x00000000" eventTag=3D"1" relativeControllerTime=3D"4" eventClass=
Code=3D"12"
eventSubClassCode=3D"0" eventDetailCode
=3D"0"/>
        <event message=3D"Cache battery/Super cap is missing" eventTag=3D"2"
relativeControllerTime=3D"4" eventClassCode=3D"2" eventSubClassCode=3D"4"
eventDetailCode=3D"2"/>
        <event message=3D"Encryption Self-Test failed" eventTag=3D"3"
relativeControllerTime=3D"4" eventClassCode=3D"2" eventSubClassCode=3D"10"
eventDetailCode=3D"0"/>
        <event message=3D"Hot-plug drive removed, Port=3DC0 Box=3D1 Bay=3D0=
 SN=3D=20=20=20=20=20=20=20=20
   ZC19RD9E" eventTag=3D"4" relativeControllerTime=3D"335" eventClassCode=
=3D"1"
eventSubClassCode=3D"0" eventDetailCode
=3D"0"/>
        <event message=3D"Physical drive failure, Port=3DC0 Box=3D1 Bay=3D0
reason=3D0x14" eventTag=3D"5" relativeControllerTime=3D"335" eventClassCode=
=3D"4"
eventSubClassCode=3D"0" eventDetailCode=3D"0"/>
        <event message=3D"Hot-plug drive inserted, Port=3DC0 Box=3D1 Bay=3D=
0 SN=3D=20=20=20=20=20=20=20
    ZC19RD9E" eventTag=3D"6" relativeControllerTime=3D"348" eventClassCode=
=3D"1"
eventSubClassCode=3D"0" eventDetailCod
e=3D"1"/>
        <event message=3D"Drive is re-enabled, Port=3DC0 Box=3D1 Bay=3D0" e=
ventTag=3D"7"
relativeControllerTime=3D"348" eventClassCode=3D"4" eventSubClassCode=3D"0"
eventDetailCode=3D"3"/>
    </eventlog>
</ControllerLog>


So, the AIFs are about the drive being removed and then re-inserted after a=
 few
seconds, which explains the "Target Selection Timeout" errors that were bei=
ng
seen right after the AIF interrupts occurred.

However, further investigation is needed to understand why the drive is bei=
ng
removed. It could be due to a bad HDD/SAS expander cable, a write cache iss=
ue,
or maybe a setup issue with the 2 SAS controllers/cabling on the machine, or
maybe something else.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-237463-227-ieZCZsaPFg>