From owner-netperf-users@freebsd.org Sat Dec 19 19:59:02 2020 Return-Path: Delivered-To: netperf-users@mailman.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.nyi.freebsd.org (Postfix) with ESMTP id 802534CEF6C for ; Sat, 19 Dec 2020 19:59:02 +0000 (UTC) (envelope-from mike@sentex.net) Received: from pyroxene2a.sentex.ca (pyroxene19.sentex.ca [IPv6:2607:f3e0:0:3::19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "pyroxene.sentex.ca", Issuer "Let's Encrypt Authority X3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4CyxPx4Trnz4YpL; Sat, 19 Dec 2020 19:59:01 +0000 (UTC) (envelope-from mike@sentex.net) Received: from [IPv6:2607:f3e0:0:4:ad7a:e7da:3453:c4b3] ([IPv6:2607:f3e0:0:4:ad7a:e7da:3453:c4b3]) by pyroxene2a.sentex.ca (8.15.2/8.15.2) with ESMTPS id 0BJJwwSu069796 (version=TLSv1.3 cipher=TLS_AES_128_GCM_SHA256 bits=128 verify=NO); Sat, 19 Dec 2020 14:58:58 -0500 (EST) (envelope-from mike@sentex.net) Subject: Re: zoo back online (was Re: zoo hang) To: George Neville-Neil Cc: Mateusz Guzik , "netperf-admin@FreeBSD.org" , netperf-users@freebsd.org, Paul Holes References: <5483e76e-4a2f-3153-c10b-7902839c1b68@sentex.net> <8c26a0d3-3bd0-7535-0abc-3d1e9e5ac7c4@sentex.net> <64923d33-4bf2-0fd5-1b17-d6bd73e9fd32@sentex.net> <13a9ab42-1df8-c054-0c83-5708ab9d9e2b@sentex.net> <6cef40cd-de57-aa84-bc70-ceea71add397@sentex.net> From: mike tancsa Message-ID: <837ce2bc-9731-85b0-c6a5-1b3c7bcadb72@sentex.net> Date: Sat, 19 Dec 2020 14:58:59 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.5.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US X-Rspamd-Queue-Id: 4CyxPx4Trnz4YpL X-Spamd-Bar: -- Authentication-Results: mx1.freebsd.org; dkim=none; dmarc=none; spf=pass (mx1.freebsd.org: domain of mike@sentex.net designates 2607:f3e0:0:3::19 as permitted sender) smtp.mailfrom=mike@sentex.net X-Spamd-Result: default: False [-2.00 / 15.00]; TO_DN_EQ_ADDR_SOME(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2607:f3e0::/32]; HFILTER_HELO_IP_A(1.00)[pyroxene2a.sentex.ca]; HFILTER_HELO_NORES_A_OR_MX(0.30)[pyroxene2a.sentex.ca]; RCPT_COUNT_FIVE(0.00)[5]; NEURAL_HAM_SHORT(-1.00)[-1.000]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RBL_DBL_DONT_QUERY_IPS(0.00)[2607:f3e0:0:3::19:from]; ASN(0.00)[asn:11647, ipnet:2607:f3e0::/32, country:CA]; R_DKIM_NA(0.00)[]; MID_RHS_MATCH_FROM(0.00)[]; ARC_NA(0.00)[]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; FREEFALL_USER(0.00)[mike]; FROM_HAS_DN(0.00)[]; NEURAL_HAM_LONG(-1.00)[-1.000]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[sentex.net]; SPAMHAUS_ZRD(0.00)[2607:f3e0:0:3::19:from:127.0.2.255]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FREEMAIL_CC(0.00)[gmail.com,freebsd.org,sentex.ca]; RCVD_TLS_ALL(0.00)[]; MAILMAN_DEST(0.00)[netperf-users]; RCVD_COUNT_TWO(0.00)[2] X-BeenThere: netperf-users@freebsd.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: "Announcements and discussions related to the netperf cluster. " List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 19 Dec 2020 19:59:02 -0000 Hmm, This has happened again. Not sure if its a bug with the driver, the firmware or both, but after a period of time the usb drive starts to throw errors.  This unit was working fine on RELENG12 and we swapped it with another drive too, but same results. The drive is clean smartctl -a /dev/da2 -T permissive da2 at umass-sim0 bus 0 scbus14 target 0 lun 0 da2: Fixed Direct Access SPC-4 SCSI device da2: Serial Number 00000000000000000000 da2: 400.000MB/s transfers da2: 3815447MB (7814037168 512 byte sectors) da2: quirks=0xa (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 f6 a5 a8 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Error 5, Retries exhausted (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 04 5c 65 f0 00 00 40 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 40 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Error 5, Retries exhausted (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 0 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 ba 00 20 48 00 00 08 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Error 5, Retries exhausted (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 3 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 2 more tries remain (da2:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 b4 00 20 28 00 00 18 00 (da2:umass-sim0:0:0:0): CAM status: CCB request completed with an error (da2:umass-sim0:0:0:0): Retrying command, 1 more tries remain Solaris: WARNING: Pool 'zoobackup' has encountered an uncorrectable I/O failure and has been suspended. On 12/18/2020 10:08 AM, George Neville-Neil wrote: > OK, once we get the backup complete we should probably work on the > rest of the cleanup.  Let me know if and how I can help. > > Best, > George > > > On 18 Dec 2020, at 9:14, mike tancsa wrote: > >> Hi George, >> >>     I think the boot loader is now fixed as those features are white >> listed.  Will start backups once again via zrepl. >> >>     ---Mike >> >> On 12/17/2020 1:58 PM, George Neville-Neil wrote: >>> Howdy, >>> >>> How do we want to handle the old tank stuff? >>> >>> Best, >>> George >>> >>> >>> On 15 Dec 2020, at 16:24, mike tancsa wrote: >>> >>>> OK, thanks to Josh P's suggestion, deleting the v2 bookmarks from the >>>> pool allowed us to boot. >>>> >>>> Booted from a temp drive, imported the pool, >>>> >>>> root@zoo-temp:~ # zpool import -R /mnt -f zooroot >>>> root@zoo-temp:~ # zfs list -t bookmark | grep ^z | awk '{print "zfs >>>> destroy "$1}' >>>> zfs destroy zooroot#zrepl_CURSOR_G_77296a02a81c78cc_J_push_to_drive >>>> zfs destroy >>>> zooroot/ROOT#zrepl_CURSOR_G_e27691751ed1660b_J_push_to_drive >>>> zfs destroy >>>> zooroot/ROOT/default#zrepl_CURSOR_G_607fa8e4c7df13b5_J_push_to_drive >>>> zfs destroy >>>> zooroot/tmp#zrepl_CURSOR_G_25ae8e2b8723a008_J_push_to_drive >>>> zfs destroy >>>> zooroot/usr#zrepl_CURSOR_G_344a884262b3e387_J_push_to_drive >>>> zfs destroy >>>> zooroot/usr/home#zrepl_CURSOR_G_2e4087f8f219bd83_J_push_to_drive >>>> zfs destroy >>>> zooroot/usr/ports#zrepl_CURSOR_G_fb8384d458dd82b3_J_push_to_drive >>>> zfs destroy >>>> zooroot/usr/src#zrepl_CURSOR_G_b867573acd8a57f8_J_push_to_drive >>>> zfs destroy >>>> zooroot/var#zrepl_CURSOR_G_ea9efdf01fdf65b5_J_push_to_drive >>>> zfs destroy >>>> zooroot/var/audit#zrepl_CURSOR_G_e71132efb0fee45a_J_push_to_drive >>>> zfs destroy >>>> zooroot/var/crash#zrepl_CURSOR_G_191c17e9538113f4_J_push_to_drive >>>> zfs destroy >>>> zooroot/var/log#zrepl_CURSOR_G_f30668295109ad60_J_push_to_drive >>>> zfs destroy >>>> zooroot/var/mail#zrepl_CURSOR_G_7d1eac92237e2603_J_push_to_drive >>>> zfs destroy >>>> zooroot/var/tmp#zrepl_CURSOR_G_d593288357e0a319_J_push_to_drive >>>> root@zoo-temp:~ # zfs list -t bookmark | grep ^z | awk '{print "zfs >>>> destroy "$1}' | sh >>>> root@zoo-temp:~ # >>>> root@zoo-temp:~ # zpool export zooroot >>>> root@zoo-temp:~ # >>>> >>>> and rebooted and its up. Sadly, will need to come up with another >>>> backup >>>> system as sysutils/zrepl uses bookmarks :( >>>> >>>>     ---Mike >>>> >>>> On 12/15/2020 1:46 PM, mike tancsa wrote: >>>>> Looks like the loader does not support v2 bookmarks. I am going to >>>>> get >>>>> Paul to put in another disk to boot from, mjg will login, either >>>>> destroy >>>>> the bookmarks or hack a loader fix that will allow the box to boot >>>>> with >>>>> this feature.  Will be an hour or so as we have a office meeting >>>>> at 2pm >>>>> we both have to attend. >>>>> >>>>>     ---Mike >>>>> >>>>> On 12/15/2020 1:28 PM, mike tancsa wrote: >>>>>> I am guessing because I was using zrepl from the ports to do >>>>>> replication >>>>>> / backup to a secondary disk, the use of the bookmark_v2 feature is >>>>>> not >>>>>> supported on ZoL ? Any way to recover from this ? >>>>>> >>>>>> >>>>>> On 12/15/2020 1:10 PM, mike tancsa wrote: >>>>>>> OK, but the first problem to deal with :( >>>>>>> >>>>>>>                                                                               >>>>>>> >>>>>>> >>>>>>> >>>>>>> BIOS drive C: is >>>>>>> disk0                                                          >>>>>>> BIOS drive D: is >>>>>>> disk1                                                          >>>>>>> ZFS: unsupported feature: >>>>>>> com.datto:bookmark_v2                                 >>>>>>> ZFS: pool zooroot is not >>>>>>> supported                                              >>>>>>>                                                                                 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can't find >>>>>>> /boot/zfsloader                                                      >>>>>>> >>>>>>>                                                                                 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can't find >>>>>>> /boot/loader                                                         >>>>>>> >>>>>>>                                                                                 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can't find >>>>>>> /boot/kernel/kernel                                                  >>>>>>> >>>>>>>                                                                                 >>>>>>> >>>>>>> >>>>>>> >>>>>>> FreeBSD/x86 >>>>>>> boot                                                                >>>>>>> Default: >>>>>>> /boot/kernel/kernel                                                    >>>>>>> >>>>>>> >>>>>>> boot:                                                                           >>>>>>> >>>>>>> >>>>>>> >>>>>>>                                                                                 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Can't find >>>>>>> /boot/kernel/kernel                                                  >>>>>>> >>>>>>>                                                                                 >>>>>>> >>>>>>> >>>>>>> >>>>>>> FreeBSD/x86 >>>>>>> boot                                                                >>>>>>> Default: >>>>>>> /boot/kernel/kernel                                                    >>>>>>> >>>>>>> >>>>>>> boot:                                 >>>>>>> >>>>>>> On 12/15/2020 1:02 PM, Mateusz Guzik wrote: >>>>>>>> We need to update to r368649 for a pmap fix regardless of the >>>>>>>> above. I >>>>>>>> can do the work and make the box ready for the next reboot. >>>>>>>> >>>>>>>> On 12/15/20, mike tancsa wrote: >>>>>>>>> The USB backup disk was throwing errors and I was trying to >>>>>>>>> export the >>>>>>>>> backup pool and it looks like the box is hung now. I am going to >>>>>>>>> power >>>>>>>>> cycle it >>>>>>>>> >>>>>>>>>     ---Mike >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>> >