From owner-freebsd-stable@FreeBSD.ORG Mon Oct 12 21:41:14 2009 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BAA301065672 for ; Mon, 12 Oct 2009 21:41:14 +0000 (UTC) (envelope-from prvs=1536df4ea9=killing@multiplay.co.uk) Received: from mail1.multiplay.co.uk (mail1.multiplay.co.uk [85.236.96.23]) by mx1.freebsd.org (Postfix) with ESMTP id 1DE0B8FC14 for ; Mon, 12 Oct 2009 21:41:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple; d=multiplay.co.uk; s=Multiplay; t=1255383070; x=1255987870; q=dns/txt; h=Received: Message-ID:From:To:References:Subject:Date:MIME-Version: Content-Type:Content-Transfer-Encoding; bh=gDSuCgrz5Vxn58wVfvvB1 eplerQxB+0vVZLLc8pnQTY=; b=fHhl9yylF8ZbitwfaNSqUqEp1gPFGzxFAe9Ks GpGdZvtk9plLl2CpiQWMIxLdkQwfAM9yXHZ9fRJJwNsnCoPrJ0o3KTSEjPToGmWm b7DW85RUr+TVo9wtmChVeTUeUsnuB+a5bjp0gMllvSOqbVid7QMtWjl+kTyZQ1WQ IoeI48= X-MDAV-Processed: mail1.multiplay.co.uk, Mon, 12 Oct 2009 22:31:10 +0100 Received: from r2d2 by mail1.multiplay.co.uk (MDaemon PRO v10.0.4) with ESMTP id md50008361045.msg for ; Mon, 12 Oct 2009 22:31:10 +0100 X-Spam-Processed: mail1.multiplay.co.uk, Mon, 12 Oct 2009 22:31:10 +0100 (not processed: message from trusted or authenticated source) X-Authenticated-Sender: Killing@multiplay.co.uk X-MDRemoteIP: 213.123.247.160 X-Return-Path: prvs=1536df4ea9=killing@multiplay.co.uk X-Envelope-From: killing@multiplay.co.uk X-MDaemon-Deliver-To: freebsd-stable@freebsd.org Message-ID: <43727653B60248EEA363AA3A886DC42C@multiplay.co.uk> From: "Steven Hartland" To: "Thomas Backman" , "freebsd-stable" References: Date: Mon, 12 Oct 2009 22:31:04 +0100 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5843 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 Cc: Subject: Re: Extreme console latency during disk IO (8.0-RC1, previous releases also affected according to others) X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Oct 2009 21:41:14 -0000 We're not running 8 yet but we do have a 7.x box which its under fairly high IO load doing mrtg graphs which has similar behaviour. When typing a command on ssh it will freeze for may seconds. I even went to far as to write a little C app which just prints out the time to screen and even that sees the big delay. Its always been like and I've never managed to get to the bottom of it, there's something in the IO / disk subsystem which can totally lock up the system under high IO load. Regards Steve ----- Original Message ----- From: "Thomas Backman" To: "freebsd-stable" Sent: Monday, October 12, 2009 8:48 PM Subject: Extreme console latency during disk IO (8.0-RC1,previous releases also affected according to others) > I'm copying this over from the freebsd-performance list, as I'm looking for a few more opinions - not on the problems *I* am > having, but rather to check whether the problem is universal or not, and if not, find a possible common factor. > In other words: I want to hear about your experiences, *good or bad*! > > Here's the original thread (not from the beginning, though): > http://lists.freebsd.org/pipermail/freebsd-performance/2009-October/003843.html > > Long story short, my version: when the disk is stressed hard enough, console IO becomes COMPLETELY unbearable. 10+ seconds to > switch between windows in screen(1), running (or even typing) simple commands, etc. This happens both via SSH and the serial > console. > > How to reproduce/test: > 1) time file /etc/* > /dev/null a few times, or something similar that uses the disk; write down a > common/average/median/whatever time. > 2) cat /dev/zero > /uncompressed_fs/filename # please make *sure* it's uncompressed, since ZFS with lzjb/gzip enabled will > squish this into a kilobyte-sized file, thus creating virtually *no* IO. > 3) When cat has been running say 10 seconds, re-time command #1 and do some interactive stuff - run commands, edit files, etc. > > I couldn't actually reproduce the *completely* horrific increase in latency I posted about below just now (I did update my > sources and rebuild, but I'm pretty sure the delta between ~Sep 29 and Oct 6 had no major IO changes in 8-STABLE), and the > "time file /etc/*" test "only" jumped by about 3x (compared to 20-60x+ previously), but it's still bad enough: commands such > as "ls" and "w" take 2-3 seconds to run, as opposed to 0.005s for ls without the added IO... On Linux, the increase in latency > is closer to 4%. A bit better than, oh, 400 times. ;) > > Oh, and again: this post is not a complaint; this is a post asking for your experiences. I know I'm not alone in having these > issues - I just want to know if there are a lot of people that *don't* too, and what could cause them. I can't possibly switch > to FreeBSD in production with this behaviour - and I've been looking forward to doing so for quite a while now. > > Regards, > Thomas > > PS. > > I'll leave my post to the original discussion below. (I don't usually top post, but I don't consider this a reply, more of a > new post with an addition below.) > > On Oct 5, 2009, at 10:45 AM, Thomas Backman wrote: > >> Hey everyone, >> I'm having serious trouble with the same thing, and just found this thread while looking for the correct place to post. Looks >> like I found it. (I wrote most of the post before finding the thread, so some of it will seem a bit odd.) >> >> I run 8.0-RC1/amd64 with ZFS on an A64 3200+ with 2GB RAM and an old 80GB 7200rpm disk. >> >> My problem is that I get completely unacceptable latency on console IO (both via SSH and serial console) when the system is >> performing disk IO. The worst case I've noticed yet was when I tried copying a core dump from a lzjb compressed ZFS file >> system to a gzip-9 compressed one, to compare the file size/compression ratio. screen (1) took at LEAST ten seconds - probably >> a bit more - I'm not exaggerating here - to switch to another window, and an "ls" in an empty directory also about 5-10 >> seconds. >> Doing some silly CPU load with two instances of "yes >/dev/null" (on a single core, remember) doesn't change anything, the >> system remains very responsive. "cat /dev/zero > /uncompressed_fs/..." however produces the extreme slowdown. (On a gzip-1 FS >> it doesn't, since the file ends up extremely small - a kilobyte or so - even after a while, thus performing minimal IO). >> >> I'm thinking about switching to FreeBSD on my beefier "production" system (dual-core amd64, 4GB RAM, 4x1TB disks, compared to >> this one, single-core, 2GB RAM, 80GB disk), but unless I feel assured this won't happen there as well, I'm not so sure >> anymore. I can do any kind of heavy IO/compilation/whatever on that box, currently running Linux, and it's always >> unnoticable. In this case it's impossible *not* to notice that your key input is lagging behind 5-10 seconds... I thought >> multiple times that the box must have panicked. >> I do realize that the hardware isn't the best, especially the disks, but this is far worse than it should be! >> >> Here's some of the testing done in this thread (or at least something like it): >> >> [root@chaos ~]# time file /etc/* >/dev/null >> real 0m1.725s >> user 0m0.993s >> sys 0m0.021s >> [root@chaos ~]# time file /etc/* >/dev/null >> >> real 0m1.008s >> user 0m0.990s >> sys 0m0.015s >> [root@chaos ~]# time file /etc/* >/dev/null >> >> real 0m1.008s >> user 0m0.967s >> sys 0m0.038s >> [root@chaos ~]# time file /etc/* >/dev/null >> >> real 0m1.015s >> user 0m0.998s >> sys 0m0.008s >> >> So, pretty much exactly 1 second every time once the cache is warmed up. Now, let's try it 10 seconds after starting heavy >> disk writing... >> >> [root@chaos ~]# cat /dev/zero > /DELETE_ME & >> (wait for 10 seconds) >> [root@chaos ~]# time file /etc/* >/dev/null >> >> real 0m13.217s >> user 0m1.004s >> sys 0m0.023s >> [root@chaos ~]# time file /etc/* >/dev/null >> >> real 0m24.012s >> user 0m1.020s >> sys 0m0.008s >> >> ... the numbers speak for themselves. FWIW I tried the same on the Linux box: "file" took 0.8 seconds the first time, 0.125s >> subsequent times. While running cat, 0.13-0.21 seconds (0.21 being the first, subsequent runs took 0.13s). The system >> remained completely responsive, which cannot be said for the FreeBSD one! >> >> Any advice? Info below - please ask if you need more! >> >> Regards, >> Thomas >> >> ----------------------------------------------------------------------------- >> >> Basic info: >> >> [root@chaos ~]# uname -a >> FreeBSD chaos.exscape.org 8.0-RC1 FreeBSD 8.0-RC1 #1 r197613M: Tue Sep 29 15:04:44 CEST 2009 >> root@chaos.exscape.org:/usr/obj/usr/ src/sys/DTRACE amd64 >> (KDB/DDB enabled, WITNESS/INVARIANTS *disabled*) >> >> [root@chaos ~]# mount >> tank/root on / (zfs, local, noatime) >> devfs on /dev (devfs, local, multilabel) >> /dev/ad0s1a on /bootdir (ufs, local, soft-updates) >> tank/export on /export (zfs, NFS exported, local, noatime) >> tank/tmp on /tmp (zfs, local, noatime) >> tank/usr on /usr (zfs, local, noatime) >> tank/usr/obj on /usr/obj (zfs, local, noatime) >> tank/usr/ports on /usr/ports (zfs, local, noatime) >> tank/usr/ports/distfiles on /usr/ports/distfiles (zfs, local, noatime) >> tank/usr/src on /usr/src (zfs, local, noatime) >> tank/usr/src_r196905 on /usr/src_r196905 (zfs, local, noatime, read- only) >> tank/var on /var (zfs, local, noatime) >> tank/var/crash on /var/crash (zfs, local, noatime) >> tank/var/log on /var/log (zfs, local, noatime) >> tank/var/tmp on /var/tmp (zfs, local, noatime) >> >> dmesg: >> >> ----------------------------------------------------------------------------- >> Copyright (c) 1992-2009 The FreeBSD Project. >> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 >> The Regents of the University of California. All rights reserved. >> FreeBSD is a registered trademark of The FreeBSD Foundation. >> FreeBSD 8.0-RC1 #1 r197613M: Tue Sep 29 15:04:44 CEST 2009 >> root@chaos.exscape.org:/usr/obj/usr/src/sys/DTRACE >> Timecounter "i8254" frequency 1193182 Hz quality 0 >> CPU: AMD Athlon(tm) 64 Processor 3200+ (2009.27-MHz K8-class CPU) >> Origin = "AuthenticAMD" Id = 0x10ff0 Stepping = 0 >> Features = 0x78bfbff < FPU ,VME ,DE ,PSE ,TSC ,MSR ,PAE ,MCE >> ,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> >> AMD Features=0xe2500800 >> AMD Features2=0x1 >> real memory = 2147483648 (2048 MB) >> avail memory = 2052362240 (1957 MB) >> ACPI APIC Table: >> ioapic0 irqs 0-23 on motherboard >> kbd1 at kbdmux0 >> acpi0: on motherboard >> acpi0: [ITHREAD] >> acpi0: Power Button (fixed) >> acpi0: reservation of 0, a0000 (3) failed >> acpi0: reservation of 100000, 7fef0000 (3) failed >> Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 >> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 >> acpi_button0: on acpi0 >> pcib0: port 0xcf8-0xcff on acpi0 >> pci0: on pcib0 >> pci0: at device 0.0 (no driver attached) >> isab0: at device 1.0 on pci0 >> isa0: on isab0 >> pci0: at device 1.1 (no driver attached) >> ohci0: mem 0xfe02f000-0xfe02ffff irq 21 at device 2.0 on pci0 >> ohci0: [ITHREAD] >> usbus0: on ohci0 >> ehci0: mem 0xfe02e000-0xfe02e0ff irq 22 at device 2.1 on pci0 >> ehci0: [ITHREAD] >> usbus1: EHCI version 1.0 >> usbus1: on ehci0 >> atapci0: port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xfb00-0xfb0f at device 6.0 on >> pci0 >> ata0: on atapci0 >> ata0: [ITHREAD] >> ata1: on atapci0 >> ata1: [ITHREAD] >> atapci1: port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xf600-0xf60f mem >> 0xfe02b000-0xfe02bfff irq 23 at device 7.0 on pci0 >> atapci1: [ITHREAD] >> ata2: on atapci1 >> ata2: [ITHREAD] >> ata3: on atapci1 >> ata3: [ITHREAD] >> atapci2: port 0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xf100-0xf10f mem >> 0xfe02a000-0xfe02afff irq 21 at device 8.0 on pci0 >> atapci2: [ITHREAD] >> ata4: on atapci2 >> ata4: [ITHREAD] >> ata5: on atapci2 >> ata5: [ITHREAD] >> pcib1: at device 9.0 on pci0 >> pci1: on pcib1 >> vgapci0: mem 0xfcff8000-0xfcffbfff, 0xfd000000-0xfd7fffff,0xfc000000-0xfc7fffff irq 17 at device 7.0 >> on pci1 >> xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xdf00-0xdf7f mem 0xfcfff000-0xfcfff07f irq 18 at device 9.0 on pci1 >> miibus0: on xl0 >> xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0 >> xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto >> xl0: Ethernet address: 00:50:da:44:c0:4a >> xl0: [ITHREAD] >> nfe0: port 0xf000-0xf007 mem 0xfe029000-0xfe029fff irq 22 at device 10.0 on >> pci0 >> miibus1: on nfe0 >> e1000phy0: PHY 1 on miibus1 >> e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto >> nfe0: Ethernet address: 00:13:d3:a2:aa:0f >> nfe0: [FILTER] >> pcib2: at device 11.0 on pci0 >> pci2: on pcib2 >> pcib3: at device 12.0 on pci0 >> pci3: on pcib3 >> pcib4: at device 13.0 on pci0 >> pci4: on pcib4 >> pcib5: at device 14.0 on pci0 >> pci5: on pcib5 >> amdtemp0: on hostb3 >> acpi_tz0: on acpi0 >> atrtc0: port 0x70-0x73 irq 8 on acpi0 >> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 >> uart0: [FILTER] >> uart0: console (115200,n,8,1) >> atkbdc0: port 0x60,0x64 irq 1 on acpi0 >> atkbd0: irq 1 on atkbdc0 >> kbd0 at atkbd0 >> atkbd0: [GIANT-LOCKED] >> atkbd0: [ITHREAD] >> cpu0: on acpi0 >> powernow0: on cpu0 >> device_attach: powernow0 attach returned 6 >> orm0: at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff, 0xcc000-0xcc7ff on isa0 >> sc0: at flags 0x100 on isa0 >> sc0: VGA <16 virtual consoles, flags=0x300> >> vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 >> ZFS NOTICE: system has less than 4GB and prefetch enable is not set... disabling. >> ZFS filesystem version 13 >> ZFS storage pool version 13 >> Timecounter "TSC" frequency 2009269513 Hz quality 800 >> Timecounters tick every 1.000 msec >> usbus0: 12Mbps Full Speed USB v1.0 >> usbus1: 480Mbps High Speed USB v2.0 >> ad0: 76318MB at ata0-master UDMA100 >> ugen0.1: at usbus0 >> uhub0: on usbus0 >> ugen1.1: at usbus1 >> uhub1: on usbus1 >> ad2: 9768MB at ata1-master UDMA100 >> GEOM: ad2s1: geometry does not match label (255h,63s != 16h,63s). >> Root mount waiting for: usbus1 usbus0 >> uhub0: 10 ports with 10 removable, self powered >> Root mount waiting for: usbus1 >> Root mount waiting for: usbus1 >> Root mount waiting for: usbus1 >> uhub1: 10 ports with 10 removable, self powered >> Trying to mount root from zfs:tank/root >> netsmb_dev: loaded >> >> ----------------------------------------------------------------------------- >> The 80GB disk is used for everything except swap (aka. dump device) for which the 10GB disk is used, exclusively. >> >> loader.conf has nothing of value (just loading a few modules and a vfs.root.mountfrom, plus serial console settings). > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org" > ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.