From owner-netperf-users@freebsd.org Sat Dec 19 21:57:15 2020 Return-Path: Delivered-To: netperf-users@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 836544ABF63 for ; Sat, 19 Dec 2020 21:57:15 +0000 (UTC) (envelope-from mike@sentex.net) Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [IPv6:2607:f3e0:0:3::19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "pyroxene.sentex.ca", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4Cz02M336Kz3GWT; Sat, 19 Dec 2020 21:57:15 +0000 (UTC) (envelope-from mike@sentex.net) Received: from [IPv6:2607:f3e0:0:4:ad7a:e7da:3453:c4b3] ([IPv6:2607:f3e0:0:4:ad7a:e7da:3453:c4b3]) by pyroxene2a.sentex.ca (8.15.2/8.15.2) with ESMTPS id 0BJLvCqQ009813 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Sat, 19 Dec 2020 16:57:13 -0500 (EST) (envelope-from mike@sentex.net) To: Mateusz Guzik Cc: George Neville-Neil , "netperf-admin@FreeBSD.org" , netperf-users@freebsd.org, Paul Holes , Hans Petter Selasky References: <5483e76e-4a2f-3153-c10b-7902839c1b68@sentex.net> <8c26a0d3-3bd0-7535-0abc-3d1e9e5ac7c4@sentex.net> <64923d33-4bf2-0fd5-1b17-d6bd73e9fd32@sentex.net> <13a9ab42-1df8-c054-0c83-5708ab9d9e2b@sentex.net> <6cef40cd-de57-aa84-bc70-ceea71add397@sentex.net> <837ce2bc-9731-85b0-c6a5-1b3c7bcadb72@sentex.net> From: mike tancsa Subject: Re: zoo back online (was Re: zoo hang) Message-ID: <7c508e03-7575-b06a-3b14-f8b6e1ed10db@sentex.net> Date: Sat, 19 Dec 2020 16:57:13 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: en-US X-Rspamd-Queue-Id: 4Cz02M336Kz3GWT X-Spamd-Bar: ---- Authentication-Results: mx1.freebsd.org; none X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[] X-BeenThere: netperf-users@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Announcements and discussions related to the netperf cluster. " List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Dec 2020 21:57:15 -0000 I was able to do a zpool clear zoobackup; zpool export zoobackup even though it threw a few more errors (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain Solaris: WARNING: Pool 'zoobackup' has encountered an uncorrectable I/O failure and has been suspended. (da2:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 02 38 00 00 10 00 (da2:umass-sim0:0:0:0): CAM status: SCSI Status Error (da2:umass-sim0:0:0:0): SCSI status: Check Condition (da2:umass-sim0:0:0:0): SCSI sense: NOT READY asc:4,1 (Logical unit is in process of becoming ready) (da2:umass-sim0:0:0:0): Polling device for readiness I wonder if Monday we should try upgrading the BIOS first BIOS Information =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Vendor: American Megatrends In= c. =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Version: 1.0b =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Release Date: 01/29/2015 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Address: 0xF0000 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Runtime Size: 64 kB =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ROM Size: 16 MB =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Characteristics: System Information =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Manufacturer: Supermicro =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Product Name: SYS-7048R-C1RT4+= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Version: 0123456789 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Serial Number: S16909225402569= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 UUID: 00000000-0000-0000-0000-= 0cc47a1f2fa0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Wake-up Type: Power Switch =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 SKU Number: To be filled by O.= E.M. =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Family: To be filled by O.E.M.= Handle 0x0002, DMI type 2, 15 bytes Base Board Information =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Manufacturer: Supermicro =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Product Name: X10DRC-T4+ =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Version: 1.01 https://www.supermicro.com/Bios/softfiles/10079/P-X10DRC(-I-LN4-T4_)_BIOS= _3_2_release_notes.pdf is from 2019 On 12/19/2020 3:16 PM, Mateusz Guzik wrote: > I'm adding hps for USB stack comments. > > On 12/19/20, mike tancsa wrote: >> Hmm, This has happened again. Not sure if its a bug with the driver, t= he >> firmware or both, but after a period of time the usb drive starts to >> throw errors. This unit was working fine on RELENG12 and we swapped i= t >> with another drive too, but same results. The drive is clean >> >> smartctl -a /dev/da2 -T permissive >> >> >> >> da2 at umass-sim0 bus 0 scbus14 target 0 lun 0 >> da2: Fixed Direct Access SPC-4 SCSI devic= e >> da2: Serial Number 00000000000000000000 >> da2: 400.000MB/s transfers >> da2: 3815447MB (7814037168 512 byte sectors) >> da2: quirks=3D0xa >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Error 5, Retries exhausted >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Error 5, Retries exhausted >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Error 5, Retries exhausted >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain >> (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 >> (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an erro= r >> (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain >> Solaris: WARNING: Pool 'zoobackup' has encountered an uncorrectable I/= O >> failure and has been suspended. >> >> >> On 12/18/2020 10:08 AM, George Neville-Neil wrote: >>> OK, once we get the backup complete we should probably work on the >>> rest of the cleanup. Let me know if and how I can help. >>> >>> Best, >>> George >>> >>> >>> On 18 Dec 2020, at 9:14, mike tancsa wrote: >>> >>>> Hi George, >>>> >>>> I think the boot loader is now fixed as those features are white= >>>> listed. Will start backups once again via zrepl. >>>> >>>> ---Mike >>>> >>>> On 12/17/2020 1:58 PM, George Neville-Neil wrote: >>>>> Howdy, >>>>> >>>>> How do we want to handle the old tank stuff? >>>>> >>>>> Best, >>>>> George >>>>> >>>>> >>>>> On 15 Dec 2020, at 16:24, mike tancsa wrote: >>>>> >>>>>> OK, thanks to Josh P's suggestion, deleting the v2 bookmarks from = the >>>>>> pool allowed us to boot. >>>>>> >>>>>> Booted from a temp drive, imported the pool, >>>>>> >>>>>> root@zoo-temp:~ # zpool import -R /mnt -f zooroot >>>>>> root@zoo-temp:~ # zfs list -t bookmark | grep ^z | awk '{print "zf= s >>>>>> destroy "$1}' >>>>>> zfs destroy zooroot#zrepl_CURSOR_G_77296a02a81c78cc_J_push_to_driv= e >>>>>> zfs destroy >>>>>> zooroot/ROOT#zrepl_CURSOR_G_e27691751ed1660b_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/ROOT/default#zrepl_CURSOR_G_607fa8e4c7df13b5_J_push_to_dri= ve >>>>>> zfs destroy >>>>>> zooroot/tmp#zrepl_CURSOR_G_25ae8e2b8723a008_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/usr#zrepl_CURSOR_G_344a884262b3e387_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/usr/home#zrepl_CURSOR_G_2e4087f8f219bd83_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/usr/ports#zrepl_CURSOR_G_fb8384d458dd82b3_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/usr/src#zrepl_CURSOR_G_b867573acd8a57f8_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/var#zrepl_CURSOR_G_ea9efdf01fdf65b5_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/var/audit#zrepl_CURSOR_G_e71132efb0fee45a_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/var/crash#zrepl_CURSOR_G_191c17e9538113f4_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/var/log#zrepl_CURSOR_G_f30668295109ad60_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/var/mail#zrepl_CURSOR_G_7d1eac92237e2603_J_push_to_drive >>>>>> zfs destroy >>>>>> zooroot/var/tmp#zrepl_CURSOR_G_d593288357e0a319_J_push_to_drive >>>>>> root@zoo-temp:~ # zfs list -t bookmark | grep ^z | awk '{print "zf= s >>>>>> destroy "$1}' | sh >>>>>> root@zoo-temp:~ # >>>>>> root@zoo-temp:~ # zpool export zooroot >>>>>> root@zoo-temp:~ # >>>>>> >>>>>> and rebooted and its up. Sadly, will need to come up with another >>>>>> backup >>>>>> system as sysutils/zrepl uses bookmarks :( >>>>>> >>>>>> ---Mike >>>>>> >>>>>> On 12/15/2020 1:46 PM, mike tancsa wrote: >>>>>>> Looks like the loader does not support v2 bookmarks. I am going t= o >>>>>>> get >>>>>>> Paul to put in another disk to boot from, mjg will login, either >>>>>>> destroy >>>>>>> the bookmarks or hack a loader fix that will allow the box to boo= t >>>>>>> with >>>>>>> this feature. Will be an hour or so as we have a office meeting >>>>>>> at 2pm >>>>>>> we both have to attend. >>>>>>> >>>>>>> ---Mike >>>>>>> >>>>>>> On 12/15/2020 1:28 PM, mike tancsa wrote: >>>>>>>> I am guessing because I was using zrepl from the ports to do >>>>>>>> replication >>>>>>>> / backup to a secondary disk, the use of the bookmark_v2 feature= is >>>>>>>> not >>>>>>>> supported on ZoL ? Any way to recover from this ? >>>>>>>> >>>>>>>> >>>>>>>> On 12/15/2020 1:10 PM, mike tancsa wrote: >>>>>>>>> OK, but the first problem to deal with :( >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> BIOS drive C: is >>>>>>>>> disk0 >>>>>>>>> BIOS drive D: is >>>>>>>>> disk1 >>>>>>>>> ZFS: unsupported feature: >>>>>>>>> com.datto:bookmark_v2 >>>>>>>>> ZFS: pool zooroot is not >>>>>>>>> supported >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Can't find >>>>>>>>> /boot/zfsloader >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Can't find >>>>>>>>> /boot/loader >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Can't find >>>>>>>>> /boot/kernel/kernel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> FreeBSD/x86 >>>>>>>>> boot >>>>>>>>> Default: >>>>>>>>> /boot/kernel/kernel >>>>>>>>> >>>>>>>>> >>>>>>>>> boot: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Can't find >>>>>>>>> /boot/kernel/kernel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> FreeBSD/x86 >>>>>>>>> boot >>>>>>>>> Default: >>>>>>>>> /boot/kernel/kernel >>>>>>>>> >>>>>>>>> >>>>>>>>> boot: >>>>>>>>> >>>>>>>>> On 12/15/2020 1:02 PM, Mateusz Guzik wrote: >>>>>>>>>> We need to update to r368649 for a pmap fix regardless of the >>>>>>>>>> above. I >>>>>>>>>> can do the work and make the box ready for the next reboot. >>>>>>>>>> >>>>>>>>>> On 12/15/20, mike tancsa wrote: >>>>>>>>>>> The USB backup disk was throwing errors and I was trying to >>>>>>>>>>> export the >>>>>>>>>>> backup pool and it looks like the box is hung now. I am going= to >>>>>>>>>>> power >>>>>>>>>>> cycle it >>>>>>>>>>> >>>>>>>>>>> ---Mike >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >