From owner-freebsd-stable@freebsd.org Mon Jul 11 13:47:02 2016 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02AB6B844EB for ; Mon, 11 Jul 2016 13:47:02 +0000 (UTC) (envelope-from karl@denninger.net) Received: from mail.denninger.net (denninger.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9AEFE1BCF for ; Mon, 11 Jul 2016 13:47:01 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (Karl-Desktop.Denninger.net [192.168.1.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.denninger.net (Postfix) with ESMTPSA id 6A2E5408CEC for ; Mon, 11 Jul 2016 08:46:59 -0500 (CDT) Subject: Re: Not-so stable if you take a CAM error.... To: freebsd-stable@freebsd.org References: <2b0c454b-c1a0-4b5b-e778-bf0939e90ae1@denninger.net> <6e9c07e1-12a6-a7cd-f775-6b0fe5a706bc@denninger.net> <1468243977.72182.118.camel@freebsd.org> From: Karl Denninger Message-ID: <877f5e8e-c1e7-6fb0-6ceb-031ce3e68582@denninger.net> Date: Mon, 11 Jul 2016 08:46:33 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 MIME-Version: 1.0 In-Reply-To: <1468243977.72182.118.camel@freebsd.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha-512; boundary="------------ms020802060606020801040803" X-Content-Filtered-By: Mailman/MimeDel 2.1.22 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2016 13:47:02 -0000 This is a cryptographically signed message in MIME format. --------------ms020802060606020801040803 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 7/11/2016 08:32, Ian Lepore wrote: > On Mon, 2016-07-11 at 06:30 -0500, Karl Denninger wrote: >> On 7/11/2016 02:57, Ronald Klop wrote: >>> On Mon, 11 Jul 2016 02:54:38 +0200, Karl Denninger >>> wrote: >>> >>>> Got a (nasty) surprise this afternoon on my sandbox machine. >>>> >>>> I was updating some Raspberry Pi2 machines which involved taking >>>> the sd >>>> card out, sticking it in an adapter and plugging it into the >>>> sandbox, >>>> then mounting the partition and using rsync. >>>> >>>> Unfortunately one of the cards was, unknown to me, bad and >>>> returned a >>>> write error during the update. >>>> >>>> The machine panic'd immediately after the CAM write error popped >>>> up. >>>> >>>> I was quite surprised by this, since (1) the SD card was (of >>>> course) >>>> mounted as a UFS filesystem; it shows up as a CAM device, (2) the >>>> machine itself is running off a ZFS root on a normal host-adapter >>>> and >>>> thus there is no comingling of the buffer cache and (3) there >>>> were no >>>> images being run from (can't, wrong architecture!) nor any system >>>> I/O >>>> (e.g. pagefile) going to the SD card. >>>> >>>> I certainly understand that under some circumstances (maybe even >>>> most >>>> circumstances) taking a hard I/O error to a system device is >>>> going to >>>> hose you and a panic() is arguably "least astonishment" when the >>>> price >>>> of being wrong might be a corrupted system file or worse (e.g. >>>> corrupted >>>> paged-out RSS, etc.) But I didn't expect a panic out a failed >>>> write to >>>> a device that is mounted and being used purely for data. >>>> >>>> I don't have a crash dump but can almost-certainly reproduce this >>>> if >>>> it's something that shouldn't happen and thus merits >>>> investigation. >>>> >>> Hi, >>> >>> I understand you are surprised by this. I don't think it is the way >>> it >>> should work. >>> Is there _any_ debugging information for people to use and try to >>> help >>> you? Like which FreeBSD version are you running? Which FreeBSD >>> version >>> was used to create the UFS fs? Does it use softupdates (SU) or also >>> journaling (SU+J)? >>> Maybe some output of dmesg? Or type of SD-card and reader. Other >>> people might have similar problems with similar hardware. >>> >>> Regards, >>> Ronald. >>> >> FreeBSD 11.0-BETA1 #0 r302489: Sat Jul 9 10:15:24 CDT 2016 =20 >> karl@NewFS.denninger.net:/usr/obj/usr/src/sys/KSD-SMP >> >> and >> >> FreeBSD 11.0-BETA1 #0 r302526: Sun Jul 10 10:39:31 CDT 2016 =20 >> karl@NewFS.denninger.net:/pics/CrossBuild/obj/arm.armv6/pics/CrossBui >> ld/src/sys/RPI2 >> >> Both blew up in the same way when stimulated with same I/O error. >> >> The filesystem in question does have softupdates enabled (the RPI >> images >> have it turned on by default) but no journaling. It's not >> card/reader >> dependent no architecture dependent; when it occurred the first time >> I >> stuck the card and reader into one of my Pis and attempted to update >> it >> there (thinking that perhaps my sandbox machine's USB port was wonky) >> and it blew up the Pi2 in the exact same way. >> >> This isn't (obviously, given both Intel-style and ARM machines being >> involved) architecture dependent. >> >> It's been a good long while since I took an actual hard I/O error >> that >> was 'visible' at the OS level (I've had plenty of disks die on ZFS >> over >> last few years but no "double failures" on a mirror or similar, and I >> on >> my servers I haven't had a UFS-based system for a while. This >> definitely looks like some sort of regression in the code; I've run >> FreeBSD for a hell of a long time and have had plenty of instances >> where >> disks have failed without having the machine go out from under me. >> > Unfortunately, this is "just the way it works". A hard IO error while > writing to a ufs filesystem with softupdates enabled will cause a > panic, because the softupdates code doesn't handle that sort of > failure, and the failure means that filesystem integrity is lost. The > code has no idea how important the data is to the functioning of the > system, no basis on which to decide whether to panic or not. > > -- Ian > Here's the backtrace ... sounds like expected behavior, which is not-so good all-in for a situation like this. I guess the strategy is to turn off softupdates before attempting such an update so as not to crash the host machine if there's a problem with the card. root@Dbms2:/var/crash # kgdb /boot/kernel/kernel vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you = are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for detail= s. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: panic: initiate_write_inodeblock_ufs2: already started cpuid =3D 14 KDB: stack backtrace: #0 0xffffffff80b1f357 at kdb_backtrace+0x67 #1 0xffffffff80ad6ec2 at vpanic+0x182 #2 0xffffffff80ad6d33 at panic+0x43 #3 0xffffffff80dc16ad at softdep_disk_io_initiation+0x159d #4 0xffffffff80de61eb at ffs_geom_strategy+0x13b #5 0xffffffff80b872f7 at bufwrite+0x267 #6 0xffffffff80b8ac6a at vfs_bio_awrite+0x3ca #7 0xffffffff80b96b77 at vop_stdfsync+0x277 #8 0xffffffff80983766 at devfs_fsync+0x26 #9 0xffffffff81101f7d at VOP_FSYNC_APV+0x8d #10 0xffffffff80baf1ae at sched_sync+0x3be #11 0xffffffff80a8dcb5 at fork_exit+0x85 #12 0xffffffff80f7f85e at fork_trampoline+0xe Uptime: 27m9s (kgdb) where #0 doadump (textdump=3D) at pcpu.h:221 #1 0xffffffff80ad6949 in kern_reboot (howto=3D260) at /usr/src/sys/kern/kern_shutdown.c:366 #2 0xffffffff80ad6efb in vpanic (fmt=3D, ap=3D) at /usr/src/sys/kern/kern_shutdown.c:759 #3 0xffffffff80ad6d33 in panic (fmt=3D0x0) at /usr/src/sys/kern/kern_shutdown.c:690 #4 0xffffffff80dc16ad in softdep_disk_io_initiation (bp=3D) at /usr/src/sys/ufs/ffs/ffs_softdep.c:10301 #5 0xffffffff80de61eb in ffs_geom_strategy (bo=3D, bp=3D) at buf.h:412 #6 0xffffffff80b872f7 in bufwrite (bp=3D0xfffffe02e8629b30) at buf.h:405= #7 0xffffffff80b8ac6a in vfs_bio_awrite (bp=3D) at buf.h:393 #8 0xffffffff80b96b77 in vop_stdfsync (ap=3D0xfffffe034f481b68) at /usr/src/sys/kern/vfs_default.c:692 #9 0xffffffff80983766 in devfs_fsync (ap=3D0xfffffe034f481b68) at /usr/src/sys/fs/devfs/devfs_vnops.c:702 #10 0xffffffff81101f7d in VOP_FSYNC_APV (vop=3D, a=3D) at vnode_if.c:1331 #11 0xffffffff80baf1ae in sched_sync () at vnode_if.h:549 #12 0xffffffff80a8dcb5 in fork_exit (callout=3D0xffffffff80baedf0 , arg=3D0x0, frame=3D0xfffffe034f481c00) at /usr/src/sys/kern/kern_fork= =2Ec:1038 #13 0xffffffff80f7f85e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:611 #14 0x0000000000000000 in ?? () (kgdb) FreeBSD 11.0-BETA1 #0 r302439: Fri Jul 8 14:37:27 CDT 2016 =20 karl@Dbms2.denninger.net:/usr/obj/usr/src/sys/GENERIC The offending code line: static void initiate_write_inodeblock_ufs2(inodedep, bp) struct inodedep *inodedep; struct buf *bp; /* The inode block */ { struct allocdirect *adp, *lastadp; struct ufs2_dinode *dp; struct ufs2_dinode *sip; struct inoref *inoref; struct ufsmount *ump; struct fs *fs; ufs_lbn_t i; #ifdef INVARIANTS ufs_lbn_t prevlbn =3D 0; #endif int deplist; * if (inodedep->id_state & IOSTARTED)** ** panic("initiate_write_inodeblock_ufs2: already started"= );* inodedep->id_state |=3D IOSTARTED; --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms020802060606020801040803 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgMFADCABgkqhkiG9w0BBwEAAKCC Bl8wggZbMIIEQ6ADAgECAgEpMA0GCSqGSIb3DQEBCwUAMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBMB4XDTE1MDQyMTAyMjE1OVoXDTIwMDQxOTAyMjE1OVowWjEL MAkGA1UEBhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHjAcBgNVBAMTFUthcmwgRGVubmluZ2VyIChPQ1NQKTCCAiIwDQYJKoZIhvcNAQEBBQAD ggIPADCCAgoCggIBALmEWPhAdphrWd4K5VTvE5pxL3blRQPyGF3ApjUjgtavqU1Y8pbI3Byg XDj2/Uz9Si8XVj/kNbKEjkRh5SsNvx3Fc0oQ1uVjyCq7zC/kctF7yLzQbvWnU4grAPZ3IuAp 3/fFxIVaXpxEdKmyZAVDhk9az+IgHH43rdJRIMzxJ5vqQMb+n2EjadVqiGPbtG9aZEImlq7f IYDTnKyToi23PAnkPwwT+q1IkI2DTvf2jzWrhLR5DTX0fUYC0nxlHWbjgpiapyJWtR7K2YQO aevQb/3vN9gSojT2h+cBem7QIj6U69rEYcEDvPyCMXEV9VcXdcmW42LSRsPvZcBHFkWAJqMZ Myiz4kumaP+s+cIDaXitR/szoqDKGSHM4CPAZV9Yh8asvxQL5uDxz5wvLPgS5yS8K/o7zDR5 vNkMCyfYQuR6PAJxVOk5Arqvj9lfP3JSVapwbr01CoWDBkpuJlKfpQIEeC/pcCBKknllbMYq yHBO2TipLyO5Ocd1nhN/nOsO+C+j31lQHfOMRZaPQykXVPWG5BbhWT7ttX4vy5hOW6yJgeT/ o3apynlp1cEavkQRS8uJHoQszF6KIrQMID/JfySWvVQ4ksnfzwB2lRomrdrwnQ4eG/HBS+0l eozwOJNDIBlAP+hLe8A5oWZgooIIK/SulUAsfI6Sgd8dTZTTYmlhAgMBAAGjgfQwgfEwNwYI KwYBBQUHAQEEKzApMCcGCCsGAQUFBzABhhtodHRwOi8vY3VkYXN5c3RlbXMubmV0Ojg4ODgw CQYDVR0TBAIwADARBglghkgBhvhCAQEEBAMCBaAwCwYDVR0PBAQDAgXgMCwGCWCGSAGG+EIB DQQfFh1PcGVuU1NMIEdlbmVyYXRlZCBDZXJ0aWZpY2F0ZTAdBgNVHQ4EFgQUxRyULenJaFwX RtT79aNmIB/u5VkwHwYDVR0jBBgwFoAUJHGbnYV9/N3dvbDKkpQDofrTbTUwHQYDVR0RBBYw FIESa2FybEBkZW5uaW5nZXIubmV0MA0GCSqGSIb3DQEBCwUAA4ICAQBPf3cYtmKowmGIYsm6 eBinJu7QVWvxi1vqnBz3KE+HapqoIZS8/PolB/hwiY0UAE1RsjBJ7yEjihVRwummSBvkoOyf G30uPn4yg4vbJkR9lTz8d21fPshWETa6DBh2jx2Qf13LZpr3Pj2fTtlu6xMYKzg7cSDgd2bO sJGH/rcvva9Spkx5Vfq0RyOrYph9boshRN3D4tbWgBAcX9POdXCVfJONDxhfBuPHsJ6vEmPb An+XL5Yl26XYFPiODQ+Qbk44Ot1kt9s7oS3dVUrh92Qv0G3J3DF+Vt6C15nED+f+bk4gScu+ JHT7RjEmfa18GT8DcT//D1zEke1Ymhb41JH+GyZchDRWtjxsS5OBFMzrju7d264zJUFtX7iJ 3xvpKN7VcZKNtB6dLShj3v/XDsQVQWXmR/1YKWZ93C3LpRs2Y5nYdn6gEOpL/WfQFThtfnat HNc7fNs5vjotaYpBl5H8+VCautKbGOs219uQbhGZLYTv6okuKcY8W+4EJEtK0xB08vqr9Jd0 FS9MGjQE++GWo+5eQxFt6nUENHbVYnsr6bYPQsZH0CRNycgTG9MwY/UIXOf4W034UpR82TBG 1LiMsYfb8ahQJhs3wdf1nzipIjRwoZKT1vGXh/cj3gwSr64GfenURBxaFZA5O1acOZUjPrRT n3ci4McYW/0WVVA3lDGCBRMwggUPAgEBMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4GA1UECBMH RmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3RlbXMgTExD MRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhNDdWRhIFN5 c3RlbXMgTExDIENBAgEpMA0GCWCGSAFlAwQCAwUAoIICTTAYBgkqhkiG9w0BCQMxCwYJKoZI hvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0xNjA3MTExMzQ2MzNaME8GCSqGSIb3DQEJBDFCBEAX noBAQXaUoBaO0aw54Bts0nIIp/PSE/465xMN36fzeqITV2T6iUHINIZxlPZRvqxW5qLj4RXD x/UE1JptbJKGMGwGCSqGSIb3DQEJDzFfMF0wCwYJYIZIAWUDBAEqMAsGCWCGSAFlAwQBAjAK BggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcNAwICAUAwBwYFKw4DAgcwDQYI KoZIhvcNAwICASgwgacGCSsGAQQBgjcQBDGBmTCBljCBkDELMAkGA1UEBhMCVVMxEDAOBgNV BAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1z IExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqGSIb3DQEJARYTQ3Vk YSBTeXN0ZW1zIExMQyBDQQIBKTCBqQYLKoZIhvcNAQkQAgsxgZmggZYwgZAxCzAJBgNVBAYT AlVTMRAwDgYDVQQIEwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1 ZGEgU3lzdGVtcyBMTEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG 9w0BCQEWE0N1ZGEgU3lzdGVtcyBMTEMgQ0ECASkwDQYJKoZIhvcNAQEBBQAEggIAk2bSzDng ehDbyKQj1GkDrMkK5V7isdI7H5xwDyKvZJVNys43RkRJGQd7H1Q+m4wi8XFOlyCV7lESi4fA W+ucHniCiGtHrdltCTCNXSJ2hvdDyfu9Wf6T2d3dtHERwsXu7jmzFQ0YJg0Kn2jPAN1NTZyq FdzBDbsv1sM+1BJ2zMQbyexqKEjfFD8voqXhAYDhI1JqYthWS35gP31KfWWhhzRgYgGk1IdM wykFyyJKul4v9ALosaAv3mr0WNq0ZQQNrVTqS0hevRTblidcjSPytQ3GhYneDICrXVpxYrFn uNxW0Vv0V2vmMJYKpRNqPwkHIWVlGN3g+gRdL4LdspPZevDHOdzhfeGtH0QtZn1HrdG3DdSn +PuUCjE0iZmXvvcA9JARgGEen5OE+XcMwNSvEN2Sga6/u8EX6195h9wXHTq7Q0Ujp0TR4KTy ANpCNKDwomy4sY8o2q2lhMCNQ0ggetz0X5j92eIwymn83JvQcsUu8qsxO0nJUrx0/ZBOcy8u WaNLy591ap2jHxqHSzVBRMGmo5GjeIMH84v2E92AAhmi5VjEiDPhaT1nTx3cySbHkCcDHs13 MHPV/H7cGdqGAcSc4gCCW77u3rbcZNoySef4pU1K6SbMsbd1O6+BJwQadIkQu/z8/4qyauhA cygJCsjvfMq2pmoC4qhjcPvqoy0AAAAAAAA= --------------ms020802060606020801040803--