From owner-freebsd-stable@FreeBSD.ORG Thu Mar 11 08:54:49 2010 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 88CA7106564A; Thu, 11 Mar 2010 08:54:49 +0000 (UTC) (envelope-from borjam@sarenet.es) Received: from proxypop1.sarenet.es (proxypop1.sarenet.es [194.30.0.99]) by mx1.freebsd.org (Postfix) with ESMTP id D19048FC0A; Thu, 11 Mar 2010 08:54:48 +0000 (UTC) Received: from [172.16.1.204] (unknown [192.148.167.2]) by proxypop1.sarenet.es (Postfix) with ESMTP id 8F1635CDE; Thu, 11 Mar 2010 09:54:47 +0100 (CET) Mime-Version: 1.0 (Apple Message framework v1077) Content-Type: text/plain; charset=us-ascii From: Borja Marcos In-Reply-To: <20100311084527.2934034895hvgxaw@webmail.leidinger.net> Date: Thu, 11 Mar 2010 09:54:47 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <764BD545-B86C-47DC-9004-964EB2216AF0@sarenet.es> References: <864468D4-DCE9-493B-9280-00E5FAB2A05C@lassitu.de> <20100309122954.GE3155@garage.freebsd.pl> <20100309125815.GF3155@garage.freebsd.pl> <20100310110202.GA1715@garage.freebsd.pl> <20100310173143.GD1715@garage.freebsd.pl> <20100311084527.2934034895hvgxaw@webmail.leidinger.net> To: Alexander Leidinger X-Mailer: Apple Mail (2.1077) Cc: freebsd-fs@FreeBSD.org, Stable , FreeBSD@FreeBSD.ORG, Pawel Jakub Dawidek Subject: Re: Many processes stuck in zfs X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Mar 2010 08:54:49 -0000 On Mar 11, 2010, at 8:45 AM, Alexander Leidinger wrote: > Quoting Pawel Jakub Dawidek (from Wed, 10 Mar 2010 = 18:31:43 +0100): >=20 > There is a 4th possibility, if you can rule out everything else: bugs = in the CPU. I stumbled upon this with ZFS (but UFS was exposing the = problem much faster). The problem in my case was that the BIOS was not = recognizing the CPU and as such was not uploading microcode updates. >=20 > Borja, can you confirm that the CPU is correctly announced in FreeBSD = (just look at "dmesg | grep CPU:" output, if it tells you it is a AMD or = Intel XXX CPU it is correctly detected by the BIOS)? A CPU bug? Weird. Very. Let me explain the whole history of this. We are using ZFS to maintain a couple of servers in an active/passive = arrangement. At 30 second intervals we create a snapshot on the master = server and send it to the slave. Actually I prefer this scheme to = drbd-style arrangements, but that's another story ;) We started our tests and soon ran into problems: deadlocked filesystem. = At one point I remember that the deadlock affected UFS as well, not only = ZFS. I mean, having both ZFS and UFS, the system also lost access to the = UFS filesystems when this happened. Looking at the hours when it happened, it turned out to be one or two of = these events: periodic scripts running (which, among other things, = traverse the whole filesystem) and/or a backup being made with Bacula. = Either way, there seemed to be a problem: read activity on a dataset on = which I was receiving a snapshot at the same time could lead to a = deadlock. I am sure I have never tried to receive two snapshots = simultaneously, etc. The replicating program guaratees it. As the servers had to be rolled into production, and such tests with = real servers can be quite time consuming, I set up a couple of FreeBSD = virtual machines, using VMWare Fusion (version 2 then, now version 3) on = a Macbook (Macbook 4,1 Intel Core2Duo, 2.1 GHz) and tried to reproduce = it. To reproduce it, I set up a "master" machine, with /usr/src and /usr/obj = on a dataset (pool/src), replicating it at 30 second intervals to = another virtual machine, the slave. On the slave, I launch "tar" in an = infinite loop, so that the contents of the replicated dataser (pool/src) = is copied to another dataset (pool/thecopy). With that running, and, remember, there are replications at 30 second = intervals (longer if a replication takes a long time, of course) I run a = make buildworld on the master machine. The destination soon gets = deadlocked. I have tried to fiddle with the virtual machine, for example, trying to = offer a single or dual core CPU, and there's no difference. With dual = cores it *seems* to deadlock earlier, but I'm not sure. For the latest = test results I've posted, I was using a single core CPU.=20 The original machines on which I detected the problem (problem I have = subsequently reproduced successfully on virtual machines running on = VMWare Fusion) are Dell PowerEdge 2950, and this is the CPU description: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU L5420 @ 2.50GHz (2496.25-MHz = K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x1067a Stepping =3D 10 = Features=3D0xbfebfbff = Features2=3D0x40ce3bd AMD Features=3D0x20100800 AMD Features2=3D0x1 TSC: P-state invariant real memory =3D 8589934592 (8192 MB) avail memory =3D 8250003456 (7867 MB) ACPI APIC Table: FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs FreeBSD/SMP: 1 package(s) x 8 core(s) cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 cpu4 (AP): APIC ID: 4 cpu5 (AP): APIC ID: 5 cpu6 (AP): APIC ID: 6 cpu7 (AP): APIC ID: 7 ioapic0: Changing APIC ID to 8 ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 acpi_hpet0: iomem 0xfed00000-0xfed003ff on = acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 The virtual machine (VMWare Fusion 3.0.0, Macbook, Mac OS X 10.6.2) = reports this: Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU T8100 @ 2.10GHz (2116.62-MHz = K8-class CPU) Origin =3D "GenuineIntel" Id =3D 0x10676 Stepping =3D 6 = Features=3D0xfebfbff Features2=3D0x80082201> AMD Features=3D0x20100800 AMD Features2=3D0x1 TSC: P-state invariant real memory =3D 1153433600 (1100 MB) avail memory =3D 1090441216 (1039 MB) ACPI APIC Table: MADT: Forcing active-low polarity and level trigger for SCI ioapic0 irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-safe" frequency 3579545 Hz quality 850 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 In order to compare to Solaris, I installed a virtual machine running = Solaris 10 as well, and used it as a target for the replication. The = same test didn't deadlock and it seemed to work like a charm. Sometimes I've tried to run more than one "tar" job in parallel instead = of just one. It just makes it deadlock earlier, no other difference. Any more tests I can do? Borja.