From owner-freebsd-questions@FreeBSD.ORG Mon Sep 2 11:45:20 2013 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id 749BB702 for ; Mon, 2 Sep 2013 11:45:20 +0000 (UTC) (envelope-from ni@vm.ag) Received: from relay2.mail.vrmd.de (relay2.mail.vrmd.de [81.28.224.28]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.freebsd.org (Postfix) with ESMTPS id 34CD52887 for ; Mon, 2 Sep 2013 11:45:19 +0000 (UTC) Received: from [80.153.91.253] (helo=jesaja.office.vm.ag) by relay2.mail.vrmd.de with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1VGSK6-0000Ga-IF for freebsd-questions@freebsd.org; Mon, 02 Sep 2013 13:29:34 +0200 From: Nils Pascal Illenseer Content-Type: multipart/signed; boundary="Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270"; protocol="application/pgp-signature"; micalg=pgp-sha512 Message-Id: <4A7374B9-1940-4380-A306-4804B7C93188@vm.ag> Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: System hangs for several minutes (disk IO related) Date: Mon, 2 Sep 2013 13:29:29 +0200 References: <20130730171938.GA3602@aurora.oekb.co.at> To: freebsd-questions@freebsd.org In-Reply-To: <20130730171938.GA3602@aurora.oekb.co.at> X-Mailer: Apple Mail (2.1508) X-Relay-User: illenseer@variomedia.de X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 02 Sep 2013 11:45:20 -0000 --Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Hi, I see similar hangs on one of our Supermicro servers. We have a ZFS RAID (mirrored stripped vdevs) and when I use "zfs = receive" to receive snapshots the whole system hangs for up to ten or = even more minutes at the end. Kernel: latest (9.2-RC3) Adaptec 6805 RAID-Controller provides disks for ZFS via JBOD /var/log/messages and dmesg do not show anything related to the hangs. I hope this helps to analyze that issue any further. Regards, Nils Pascal Illenseer ------------------------------ < Cut here > = ------------------------------ Copyright (c) 1992-2013 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 9.2-RC3 #0 r254795: Sat Aug 24 20:25:04 UTC 2013 root@bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 gcc version 4.2.1 20070831 patched [FreeBSD] CPU: AMD Opteron(tm) Processor 6376 (2300.05-MHz = K8-class CPU) Origin =3D "AuthenticAMD" Id =3D 0x600f20 Family =3D 0x15 Model =3D = 0x2 Stepping =3D 0 = Features=3D0x178bfbff = Features2=3D0x3e98320b AMD Features=3D0x2e500800 AMD = Features2=3D0x1ebbfff,NodeId,TBM,Topology,,> Standard Extended Features=3D0x8 TSC: P-state invariant, performance statistics real memory =3D 137438953472 (131072 MB) avail memory =3D 133006090240 (126844 MB) Event timer "LAPIC" quality 400 ACPI APIC Table: <050713 APIC1654> FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs FreeBSD/SMP: 1 package(s) x 16 core(s) =85 aacraid0: mem = 0xfd800000-0xfdbfffff,0xfd7bf800-0xfd7bffff,0xfd7bf400-0xfd7bf4ff irq 28 = at device 0.0 on pci1 aacraid0: Enable Raw I/O aacraid0: Enable 64-bit array aacraid0: New comm. interface type1 enabled aacraid0: Adaptec 6805, aacraid driver 3.1.1-1 aacraidp0 on aacraid0 aacraidp1 on aacraid0 aacraidp2 on aacraid0 aacraidp3 on aacraid0 ------------------------------ < Cut here > = ------------------------------ Am 30.07.2013 um 19:19 schrieb Ewald Jenisch : > Hi, >=20 > I'm seeing rather strange behavior on an HP DL585 G5 wrt. disk IO: >=20 > When there's any disk io the machine completely freezes, i.e. no > console input possible, no screen output - complete hang. After some > minutes the box comes back to normal again - but sure enough with the > next disk io it freezes again. >=20 > To give you a typical example: While a "portsnap fetch extract" was > running I did a "sync". Normally this should complete in a matter of > milliseconds to seconds in the worst case - but dig this: >=20 > # date;time sync;date > Tue Jul 30 09:57:38 CEST 2013 > 0.000u 0.311s 9:54.69 0.0% 4+161k 0+1287io 0pf+0w > Tue Jul 30 10:07:38 CEST 2013 > # >=20 > No, this is not a typo - it really took nearly ten minutes (!) for the > sync to complete. In the meantime - every windows, all activity > (console, screen-output etc.) is completely blocked. ('portsnap fetch > extract' was only given as an example here - the lockup occurs > whenever there is disk io like for example tar, etc). >=20 > We're speaking about a machine with decent hardware here, here's an > excerpt from "dmesg": >=20 > ------------------------------ < Cut here > = ------------------------------ >=20 > FreeBSD 9.2-BETA2 #0 r253750: Mon Jul 29 11:07:04 CEST 2013 > root@sniff-rz2:/usr/obj/usr/src/sys/GENERIC amd64 > gcc version 4.2.1 20070831 patched [FreeBSD] > CPU: Quad-Core AMD Opteron(tm) Processor 8358 SE (2411.16-MHz K8-class = CPU) > Origin =3D "AuthenticAMD" Id =3D 0x100f23 Family =3D 0x10 Model =3D = 0x2 Stepping =3D 3 > = Features=3D0x178bfbff > Features2=3D0x802009 > AMD = Features=3D0xee400800= > AMD = Features2=3D0x7ff > TSC: P-state invariant > real memory =3D 137438953472 (131072 MB) > avail memory =3D 132973432832 (126813 MB) > Event timer "LAPIC" quality 400 > ACPI APIC Table: > FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs > ... > ciss0: port 0x3000-0x30ff mem = 0xd9e00000-0xd9efffff,0xd9df0000-0xd9df0fff irq 16 at device 0.0 on pci8 > ciss0: PERFORMANT Transport > ... > da0 at ciss0 bus 0 scbus2 target 0 lun 0 > da0: Fixed Direct Access SCSI-5 device=20 > da0: 135.168MB/s transfers > da0: Command Queueing enabled > da0: 139979MB (286677120 512 byte sectors: 255H 32S/T 35132C) > da0: quirks=3D0x1 >=20 > ------------------------------ < Cut here > = ------------------------------ >=20 > Kernel: Latest kernel as of yesterday (9.2Beta) >=20 > BIOS: is at the latest level (Support pack as of Spring 2013) > installed which updated BIOS, iLO etc. Aside from that I reset BIOS to > default values just to be sure.=20 >=20 > SmartArray P400 - Firmware 7.24 (latest) >=20 > Harddisks: Two 146GB HDs running in Raid1-mode. Already tried > hot-swapping the disks - didn't change anything. >=20 > Needless to say - no error message etc. in neither dmesg nor > /var/log/messages :-( >=20 > To me it looks like this is some sort of timing problem - but where > should I start looking? >=20 > Thanks much in advance for any help, > -ewald >=20 --Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270 Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=signature.asc Content-Type: application/pgp-signature; name=signature.asc Content-Description: Message signed with OpenPGP using GPGMail -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJSJHadAAoJEKwiCCnO1v5scqwP/1GOL7pFnGGraoevuRg6jlfU b0bcI9cN1YXnGp1DwgHcMtTutWWF6GBFyYe0otp1gUyfygRwvhudLQLpNCJIJoTT YE2O0vJe0wp5bsRHDDfXZpAvDLa9mO2RIpOn+QoABx5nRm0P0d9SPxkh1e9+po8P A6IKmIHlh49LjRwsvOl7z2FxnI7ThN2lxepgu5+EqztMau+fbAUEdN+WESHmDJVL Tvo3RH3m4A5hSxoGdALEdcIfSV/NNH/7R9XfYYaJ7/bYg0VmbmLHViVEhDLSbmkf D25SmnUboEzEvOEPFwPwgw3qBt3R3knN3siuHXqOi5kbEjYLLmEjmibzd66+Ttt0 fAJcsjvGfYaRCq/Z3hHw5qHNdQjOD1gu3t7jHCH+6wWh5UHVRKwHB3VwkX+UZvqA 1ZstEbVzLG8ZV3t5ZZtGQRIP0EbHJoHnDVBpBbFFbjEUbFlFMAdpImpJodRU60W/ dsnqovmH/HIiBGkDnSQAS3SVab3AVfUWXckb3wy8SMhHxAos4qR973eBu/OSbO7b ZNdrKu5hDgetj0g6DGgNkSF5HtCeGN0iSjCwyeidXoAkpr9pG4W+A9T4iwhz2rQJ mdghrnf6Xd0gu9CTXHK5NoVgyif+cHfMu1YokNhw1zaSQRyZc+1PBQpI3LcRItp9 X8b8G2uOuw/a/Ds8QuAR =XgIN -----END PGP SIGNATURE----- --Apple-Mail=_157D8768-B8BA-42D5-8F96-375ED0DD5270--