From owner-freebsd-fs@FreeBSD.ORG Tue Apr 1 20:15:56 2008 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 603751065676 for ; Tue, 1 Apr 2008 20:15:56 +0000 (UTC) (envelope-from crahman@gmail.com) Received: from gv-out-0910.google.com (gv-out-0910.google.com [216.239.58.184]) by mx1.freebsd.org (Postfix) with ESMTP id E07358FC2A for ; Tue, 1 Apr 2008 20:15:55 +0000 (UTC) (envelope-from crahman@gmail.com) Received: by gv-out-0910.google.com with SMTP id n40so441543gve.39 for ; Tue, 01 Apr 2008 13:15:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=pbI5ViE5a2V95xufsRG0mCZHe6Bh/gcosBz0qc72gbE=; b=MyvLSjPsKYorNPN8IRxeZOQa0AJJ5ypVzACFHYvIslWWoQqsg1SFntzvHsqI9PgweM7CL0PpHIJMn1d61s9L9k4VdL/vdbjEuTJcIv781Qs+7QMr1FHRMBpCezpYysx6AxtACuvVGI6CrQsCiBlw1pJ1AJS5VtwGQpxhYlVpOLA= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=gJoZwdu0AM/SUPfSyG0tnbCTzCZHj4HPIOgq5+zgON1aYp64umCTteo3iEvkrm8hj/7mW288qB/KvLbd4YaNEdj+ObEPshn/9sWzsswwecDBZseWiIofIjuaXEVkfx10FVwOY37Lp1zlsZo0tDsO5QIuXl5I2JwKUctSdUljcDY= Received: by 10.142.240.9 with SMTP id n9mr5273269wfh.136.1207079487231; Tue, 01 Apr 2008 12:51:27 -0700 (PDT) Received: by 10.142.188.17 with HTTP; Tue, 1 Apr 2008 12:51:27 -0700 (PDT) Message-ID: <9e77bdb50804011251q65eca371kc6bc9a60ac0c248@mail.gmail.com> Date: Tue, 1 Apr 2008 13:51:27 -0600 From: "Cyrus Rahman" To: freebsd-fs@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Subject: Trouble with snapshots X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 20:15:56 -0000 I'm seeing serious problems with snapshot deadlocks on 7.0-RELEASE right now. I haven't been able to set up a test environment to really determine precise details, but this much I know: Filesystem i/o will eventually lock up, requiring a hard reset, after the snapshot mount sleeps permanently on suspfs. Eventually there's a cascade and everything ends up waiting on suspfs. Running a 'sync' after mount hangs is a sure way to propagate the problem. This happens very often - probably 15% probability per snapshot on the server running 7.0. It's bad enough so that it's not realistic to use snapshots there. Other strange things have been observed, in that an entire day's worth of work vanished - after the reset/reboot the filesystems were consistent, but in the state they were in many hours before, at the time the snapshot hung. The snapshot had been observed hanging, but everything else seemed to work so a decision was made to reboot at the end of the day - with disastrous effect! During the day nothing unusual except for the hung snapshot was noticed. I'm guessing everything just got cached (for hours!) and the cache never got flushed. This is happening on a system set up with journaled ufs filesystems, so that may be part of the problem. The system is running amd64 with an Intel Q6600. The filesystem that has trouble with this has a number of large files, about 500-700Mb on it. Filesystems with only small files do not seem to have trouble, even though they are bigger filesystems with more files. I can't think of anything else unique about it.