From owner-freebsd-geom@FreeBSD.ORG Wed Oct 1 12:19:17 2008 Return-Path: Delivered-To: freebsd-geom@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 40D801065688 for ; Wed, 1 Oct 2008 12:19:17 +0000 (UTC) (envelope-from crahman@gmail.com) Received: from hs-out-0708.google.com (hs-out-0708.google.com [64.233.178.248]) by mx1.freebsd.org (Postfix) with ESMTP id F3D058FC17 for ; Wed, 1 Oct 2008 12:19:16 +0000 (UTC) (envelope-from crahman@gmail.com) Received: by hs-out-0708.google.com with SMTP id 54so155493hsz.11 for ; Wed, 01 Oct 2008 05:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type:content-transfer-encoding :content-disposition; bh=rHtLhiHabfBoMyFrfFH4yJ5grHAc3mQElvh8mYh0vWU=; b=YIk0QrUeotUcUgEVYNNKI94InWMfHMH6QXh5iE6DV6mFUmfVjnLrMmHXycFS3EYL0W mtoyuYZFZempXdGpyN1jKsGnrIxDnGgvKbzOjoJkdPAnmhm/mAV7Ex7HSBFrqNuS9BZb nbQnHWWBClMhE5zV2tcp2oZPvJ7n5G0BRpAtI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type :content-transfer-encoding:content-disposition; b=AfS/wUrYI2RQ5dBIwZA4SaCpVsbHDJYSU7kyGENySWtFtL9SLeAXV6ZnVACCd3zcIp U6h42h7Kg4RZvjTCjA+9uaCzLrUJmDQWMOPQ2GVcAh7ai71zefS182W9LejZ1EPGuWBD 0PkoUYmxokyKZ58dnT7smfPEN9+mJvRdHZJhI= Received: by 10.65.75.2 with SMTP id c2mr13923972qbl.58.1222861972039; Wed, 01 Oct 2008 04:52:52 -0700 (PDT) Received: by 10.65.176.2 with HTTP; Wed, 1 Oct 2008 04:52:52 -0700 (PDT) Message-ID: <9e77bdb50810010452r3bd4a01bs14facb8fa9a97b4a@mail.gmail.com> Date: Wed, 1 Oct 2008 05:52:52 -0600 From: "Cyrus Rahman" To: freebsd-geom@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: gjournal deadlock X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Oct 2008 12:19:17 -0000 I continue to experience deadlocks using gjournal with large files. In a previous message I mentioned that they occur frequently with snapshots. Although useful, it is certainly possible to do without snapshots, however, lately I have experienced them in another context, namely, building nanobsd images. The problem occurs when writing out the image file through the md(4) device. Writing 128MB images causes no trouble, but moving to a 2GB image causes the deadlock every time. In fact, I was only able to succeed by building the image on a non-journaled filesystem. The deadlock occurs while sleeping on wdrain - here's the ps(1) output of the processes involved in one such event: 0 51 0 0 -16 0 0 16 wdrain DL ?? 1:24.22 [g_journal switcher] 0 52022 52018 0 -16 0 4640 1152 wdrain D ?? 0:00.02 newsyslog 1001 52069 1725 0 -16 0 2596 636 wdrain D p3 0:00.01 sync 0 51935 51933 0 -16 0 4640 1124 wdrain T p7 0:00.38 cpio -dump /usr/obj/nanobsd.img 0 51924 0 0 75 0 0 16 suspfs DL ?? 0:00.12 [md0] These values are used when deciding to msleep in wdrain: vfs.hirunningspace: 1048576 vfs.lorunningspace: 524288 vfs.runningbufspace: 1956352 They remain static after the deadlock. The really unacceptable aspect of this is that if you don't notice the deadlock has occurred, you can continue to work for many hours on other projects. However, none of the changes made to the filesystem after the deadlock will be committed to the disk. So all your work, including any notes about the deadlock, will vanish when you reboot. It's strange seeing all those deleted files reappear and your code revert back ten hours to the instant the deadlock occurred, and this issue represents a serious danger to anyone using gjournal in a production environment. Furthermore, the problem affects all gjournaled filesystems, not just the one involved in the observed deadlock - so, for example, your successfully received mail and such will also vanish. I expect what happens is that all the changes after the deadlock pile up in the journals, and so remain visible until the inevitable reboot, at which time they are discarded.