From owner-freebsd-fs@FreeBSD.ORG Sun Mar 4 08:02:54 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 863B5106566B for ; Sun, 4 Mar 2012 08:02:54 +0000 (UTC) (envelope-from bsalinux@gmail.com) Received: from mail-qw0-f47.google.com (mail-qw0-f47.google.com [209.85.216.47]) by mx1.freebsd.org (Postfix) with ESMTP id 3C9478FC08 for ; Sun, 4 Mar 2012 08:02:52 +0000 (UTC) Received: by qadz30 with SMTP id z30so380902qad.13 for ; Sun, 04 Mar 2012 00:02:51 -0800 (PST) Received-SPF: pass (google.com: domain of bsalinux@gmail.com designates 10.224.100.197 as permitted sender) client-ip=10.224.100.197; Authentication-Results: mr.google.com; spf=pass (google.com: domain of bsalinux@gmail.com designates 10.224.100.197 as permitted sender) smtp.mail=bsalinux@gmail.com; dkim=pass header.i=bsalinux@gmail.com Received: from mr.google.com ([10.224.100.197]) by 10.224.100.197 with SMTP id z5mr6563307qan.61.1330848171835 (num_hops = 1); Sun, 04 Mar 2012 00:02:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=2Q/zob3d8896Hi7FQX9y/k8ddlM5DMEwxdxwZuUCWWw=; b=s+oGOr8LJhQk3jRNQB/mQutK30tRm7BwIirbfQ98DamsUjawSGybOEPX+3nHKPdNGF ME1jmDpnI4s/Z+VUuuHCGKwO7wAtFuEC/ze5stdxgm2yps0zlyqbqTBioNdu+t4fV0dI LPizdppXM+Gks2plGk30VFXZhOSDTs0kU6njljQWzEE8DTdzVGiD9WvU5XL+ESsE8G7m khbCBUspOscwRojq4Q8rQkUSwJ1qlHVBQhKK8qSnOFEAnM2ibbN9umAOdFZhxA4QjfJE CwI/Tf2tRb5tDS8AjmJQFtEQEV29O7+H7te70178bM7HBLG1TSLhYErQ7krJtQbtNmI2 FJ1A== MIME-Version: 1.0 Received: by 10.224.100.197 with SMTP id z5mr5559793qan.61.1330846556090; Sat, 03 Mar 2012 23:35:56 -0800 (PST) Received: by 10.229.211.80 with HTTP; Sat, 3 Mar 2012 23:35:56 -0800 (PST) Date: Sat, 3 Mar 2012 23:35:56 -0800 Message-ID: From: "bsalinux@gmail.com" To: freebsd-scsi@freebsd.org, freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: HP SmartArray P400 with 1TB Drives X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Mar 2012 08:02:54 -0000 Hi, Are there any concerns using 8x 1TB SAS drives with HP P400 controller running raid5. I would appreciate any usage reports or comments if you are running similar configuration. Thanks. From owner-freebsd-fs@FreeBSD.ORG Sun Mar 4 13:47:50 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B84EC106566B for ; Sun, 4 Mar 2012 13:47:50 +0000 (UTC) (envelope-from job@getjobs.eu) Received: from sd88.btc-net.bg (SD88.btc-net.bg [212.39.90.88]) by mx1.freebsd.org (Postfix) with SMTP id 1A4EA8FC0A for ; Sun, 4 Mar 2012 13:47:49 +0000 (UTC) Received: (qmail 31513 invoked by uid 605); 4 Mar 2012 13:21:10 -0000 Received: from unknown (HELO OrlinPC) (83.228.35.24) by 0 with SMTP; 4 Mar 2012 13:21:10 -0000 From: "Pavel Popov" To: "freebsd-fs@freebsd.org" Date: Sun, 04 Mar 2012 11:18:03 +0200 Message-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Discover new job opportunities in the EU X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: job@getjobs.eu List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Mar 2012 13:47:50 -0000 <=21DOCTYPE HTML PUBLIC =22-//W3C//DTD HTML 4=2E0 Transitional//EN=22>=0D= =0A=0D=0A=0D=0A=0D=0A=0D=0A Hello , =0D=0A =26nbsp; =0D=0A We would like to =0D=0Aprese= nt=26nbsp; to You our portal= www=2Egetjobs=2Eeu Discover = new job opportunities in the European Union from leading =0D=0Acompanies=2E= Register and post your resume to be seen by employers registered with =0D= =0Aus=2E Posting of ads by employers is free, always Up-to date information about job =0D=0Avacancies in the= EU=2E =0D=0A =26nbsp;<= /P>=0D=0A Thanks for Your time =0D=0A=2E =0D= =0A =26nbsp; =0D=0A With respect : =0D=0A =0D= =0A P=2EPopov =0D=0A GetJobs <= /HTML>=0D=0A From owner-freebsd-fs@FreeBSD.ORG Sun Mar 4 15:12:44 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4D1EC106564A for ; Sun, 4 Mar 2012 15:12:44 +0000 (UTC) (envelope-from freebsdlists@bsdunix.ch) Received: from conversation.bsdunix.ch (ns1.bsdunix.ch [82.220.1.90]) by mx1.freebsd.org (Postfix) with ESMTP id EA7578FC13 for ; Sun, 4 Mar 2012 15:12:43 +0000 (UTC) Received: from conversation.bsdunix.ch (localhost [127.0.0.1]) by conversation.bsdunix.ch (Postfix) with ESMTP id 15394B20E for ; Sun, 4 Mar 2012 14:55:48 +0000 (UTC) X-Virus-Scanned: by amavisd-new at mail.bsdunix.ch Received: from conversation.bsdunix.ch ([127.0.0.1]) by conversation.bsdunix.ch (conversation.bsdunix.ch [127.0.0.1]) (amavisd-new, port 10024) with LMTP id NyvIRboNDAts for ; Sun, 4 Mar 2012 14:55:47 +0000 (UTC) Received: from Flachrechner.local (dmhd.bsdunix.ch [82.220.17.25]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by conversation.bsdunix.ch (Postfix) with ESMTPSA id 37897B20A for ; Sun, 4 Mar 2012 14:55:47 +0000 (UTC) Message-ID: <4F538272.6070002@bsdunix.ch> Date: Sun, 04 Mar 2012 15:55:46 +0100 From: Thomas User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@FreeBSD.org Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: Subject: FreeBSD 8.2 NFS client crashes with a FreeBSD 9 Server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Mar 2012 15:12:44 -0000 Hello My FreeBSD 9.0-STABLE (February 2nd 2012) NFS server (amd64) does not work well wit FreeBSD 8.2-p6 NFS clients (amd64). All clients are generating a lot of syslog messages like: M ar 4 13:29:10 hosting06 kernel: NLM: failed to contact remote rpcbind, stat = 0, port = 0 Mar 4 13:29:14 hosting06 last message repeated 5933 times Mar 4 13:39:18 hosting01 kernel: <<33>N>LMN:L M:f aiflaeilde d tto coon taccto renmtoatcet rermpotcebi nrd, psctbaitnd ,= st0a,t =po rt0 =, p0 Mar 4 13:39:18 hosting01 kernel: o Mar 4 13:39:18 hosting01 kernel: rt = 0 After a few hours the nfs client is crashing like this: Mar 4 13:55:33 hosting01 kernel: Fatal trap 12: page fault while in kernel mode Mar 4 13:55:33 hosting01 kernel: cpuid = 1; Mar 4 13:55:33 hosting01 kernel: NLM: failed to contact remote rpcbind, stat = 0, port = 0 Mar 4 13:55:33 hosting01 kernel: apic id = 07 Mar 4 13:55:33 hosting01 kernel: fault virtual address = 0x18 Mar 4 13:55:33 hosting01 kernel: fault code = supervisor write data, page not present Mar 4 13:55:33 hosting01 kernel: instruction pointer = 0x20:0xffffffff807c4b0b Mar 4 13:55:33 hosting01 kernel: Mar 4 13:55:33 hosting01 kernel: stack pointer = 0x28:0xffffff84916c1ec0 Mar 4 13:55:33 hosting01 kernel: NLM: failed to contact remote rpcbind, stat = 0, port = 0 Mar 4 13:55:33 hosting01 kernel: Mar 4 13:55:33 hosting01 kernel: frame pointer = 0x28:0xffffff84916c2070 Mar 4 13:55:33 hosting01 kernel: code segment = base 0x0, limit 0xfffff, type 0x1b Mar 4 13:55:33 hosting01 kernel: = DPL 0, pres 1, long 1, def32 0, gran 1 Mar 4 13:55:33 hosting01 kernel: Mar 4 13:55:33 hosting01 kernel: processor eflags = interrupt enabled, Any ideas how i can fix this? I appreciate any hint or fix. About the systems: FreeBSD 9 NFS server: rpcbind_enable="YES" nfs_server_enable="YES" nfs_server_flags="-u -t -n 32" nfs_reserved_port_only="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" mountd_enable="YES rpc is running on the server: root 1509 0.0 0.0 274296 320 ?? Ss Fri11PM 0:00.10 /usr/sbin/rpc.statd root 1528 0.0 0.0 14264 500 ?? Ss Fri11PM 7:41.35 /usr/sbin/rpcbind exports: root@openstorage1:~# more /etc/exports /rz2pool/export/tmp -maproot=root 10.10.0.2 10.10.0.3 10.10.0.4 10.10.0.5 /var/webs -maproot=root 10.10.0.2 10.10.0.3 10.10.0.4 10.10.0.5 FreeBSD 9 is a ZFS only system. FreeBSD 8.2-p6 NFS client rpcbind_enable="YES" nfs_client_enable="YES" nfs_reserved_port_only="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES" nfs client fstab: 10.0.0.12:/var/webs /var/webs nfs rw,nfsv3,rdirplus,noatime,hard,intr,tcp,wsize=65536,rsize=65536,bg 0 0 10.0.0.12:/rz2pool/export/tmp /var/tmp nfs rw,nfsv3,rdirplus,noexec,acregmin=30,acregmax=120,noatime,hard,intr,tcp,wsize=65536,rsize=65536,bg 0 0 FreeBSD 8.2-p6 is a UFS only system. From owner-freebsd-fs@FreeBSD.ORG Sun Mar 4 15:24:47 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B00091065675 for ; Sun, 4 Mar 2012 15:24:47 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6B1788FC15 for ; Sun, 4 Mar 2012 15:24:47 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EAG+IU0+DaFvO/2dsb2JhbABDhUGwFIF9AQEBAwEBAQEgKyALGxgCAg0ZAikBCSYGCAcEARwEh2EFC60qiTSBL4Elh0MEhSyBFgSIUIpJgiWQEoMBgTcH X-IronPort-AV: E=Sophos;i="4.73,529,1325480400"; d="scan'208";a="162013178" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 04 Mar 2012 10:24:46 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 81628B3F94; Sun, 4 Mar 2012 10:24:46 -0500 (EST) Date: Sun, 4 Mar 2012 10:24:46 -0500 (EST) From: Rick Macklem To: Thomas Message-ID: <1248805481.323864.1330874686489.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4F538272.6070002@bsdunix.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@FreeBSD.org Subject: Re: FreeBSD 8.2 NFS client crashes with a FreeBSD 9 Server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Mar 2012 15:24:47 -0000 Thomas wrote: > Hello > > My FreeBSD 9.0-STABLE (February 2nd 2012) NFS server (amd64) does not > work well wit FreeBSD 8.2-p6 NFS clients (amd64). All clients are > generating a lot of syslog messages like: > M > ar 4 13:29:10 hosting06 kernel: NLM: failed to contact remote rpcbind, > stat = 0, port = 0 > Mar 4 13:29:14 hosting06 last message repeated 5933 times > Mar 4 13:39:18 hosting01 kernel: <<33>N>LMN:L M:f aiflaeilde d tto > coon > taccto renmtoatcet rermpotcebi nrd, psctbaitnd ,= st0a,t =po rt0 =, p0 > Mar 4 13:39:18 hosting01 kernel: o > Mar 4 13:39:18 hosting01 kernel: rt = 0 > I'd suggest you upgrade to a newer stable/9 kernel. The NLM (rpc.lockd) wouldn't start up because it had a problem contacting the NSM (rpc.statd) for a while, but I don't see that with a Feb.29 kernel. (I don't know exactly which commits broke/fixed it, but they are post-9.0, I'm pretty sure.) While here, I will mention that if you don't need multiple clients to see each others byte range locks (ie. they are sharing/locking the same files concurrently), then using the "nolockd" option to avoid the NLM is a good idea. rick > > After a few hours the nfs client is crashing like this: > > > Mar 4 13:55:33 hosting01 kernel: Fatal trap 12: page fault while in > kernel mode > Mar 4 13:55:33 hosting01 kernel: cpuid = 1; > Mar 4 13:55:33 hosting01 kernel: NLM: failed to contact remote > rpcbind, > stat = 0, port = 0 > Mar 4 13:55:33 hosting01 kernel: apic id = 07 > Mar 4 13:55:33 hosting01 kernel: fault virtual address = 0x18 > Mar 4 13:55:33 hosting01 kernel: fault code = supervisor > write data, page not present > Mar 4 13:55:33 hosting01 kernel: instruction pointer = > 0x20:0xffffffff807c4b0b > Mar 4 13:55:33 hosting01 kernel: > Mar 4 13:55:33 hosting01 kernel: stack pointer = > 0x28:0xffffff84916c1ec0 > Mar 4 13:55:33 hosting01 kernel: NLM: failed to contact remote > rpcbind, > stat = 0, port = 0 > Mar 4 13:55:33 hosting01 kernel: > Mar 4 13:55:33 hosting01 kernel: frame pointer = > 0x28:0xffffff84916c2070 > Mar 4 13:55:33 hosting01 kernel: code segment = base 0x0, > limit 0xfffff, type 0x1b > Mar 4 13:55:33 hosting01 kernel: = DPL 0, pres 1, long 1, def32 0, > gran 1 > Mar 4 13:55:33 hosting01 kernel: > Mar 4 13:55:33 hosting01 kernel: processor eflags = interrupt > enabled, > > > Any ideas how i can fix this? I appreciate any hint or fix. > > > > About the systems: > > FreeBSD 9 NFS server: > rpcbind_enable="YES" > nfs_server_enable="YES" > nfs_server_flags="-u -t -n 32" > nfs_reserved_port_only="YES" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > mountd_enable="YES > > rpc is running on the server: > root 1509 0.0 0.0 274296 320 ?? Ss Fri11PM 0:00.10 > /usr/sbin/rpc.statd > root 1528 0.0 0.0 14264 500 ?? Ss Fri11PM 7:41.35 > /usr/sbin/rpcbind > > exports: > root@openstorage1:~# more /etc/exports > /rz2pool/export/tmp -maproot=root 10.10.0.2 10.10.0.3 10.10.0.4 > 10.10.0.5 > /var/webs -maproot=root 10.10.0.2 10.10.0.3 10.10.0.4 10.10.0.5 > > FreeBSD 9 is a ZFS only system. > > > FreeBSD 8.2-p6 NFS client > rpcbind_enable="YES" > nfs_client_enable="YES" > nfs_reserved_port_only="YES" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > > nfs client fstab: > 10.0.0.12:/var/webs /var/webs nfs > rw,nfsv3,rdirplus,noatime,hard,intr,tcp,wsize=65536,rsize=65536,bg 0 > 0 > 10.0.0.12:/rz2pool/export/tmp /var/tmp nfs > rw,nfsv3,rdirplus,noexec,acregmin=30,acregmax=120,noatime,hard,intr,tcp,wsize=65536,rsize=65536,bg > 0 0 > > FreeBSD 8.2-p6 is a UFS only system. > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Sun Mar 4 15:48:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 075F6106564A for ; Sun, 4 Mar 2012 15:48:07 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id B643D8FC0C for ; Sun, 4 Mar 2012 15:48:06 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap8EADKOU0+DaFvO/2dsb2JhbABDhTSwFIF9AQEBAwEBAQEgKyALGxgCAg0ZAikBCSYGCAcEARwEh2EFC6ZikRiBL4Elh0MEhSyBFgSIUIpJgiWQEoMBgTcH X-IronPort-AV: E=Sophos;i="4.73,529,1325480400"; d="scan'208";a="162014357" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-jnhn-pri.mail.uoguelph.ca with ESMTP; 04 Mar 2012 10:48:05 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id E635EB3FB5; Sun, 4 Mar 2012 10:48:05 -0500 (EST) Date: Sun, 4 Mar 2012 10:48:05 -0500 (EST) From: Rick Macklem To: Thomas Message-ID: <535781444.324493.1330876085926.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4F538272.6070002@bsdunix.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.201] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@FreeBSD.org Subject: Re: FreeBSD 8.2 NFS client crashes with a FreeBSD 9 Server X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 04 Mar 2012 15:48:07 -0000 Thomas wrote: > Hello > > My FreeBSD 9.0-STABLE (February 2nd 2012) NFS server (amd64) does not > work well wit FreeBSD 8.2-p6 NFS clients (amd64). All clients are > generating a lot of syslog messages like: > M > ar 4 13:29:10 hosting06 kernel: NLM: failed to contact remote rpcbind, > stat = 0, port = 0 > Mar 4 13:29:14 hosting06 last message repeated 5933 times > Mar 4 13:39:18 hosting01 kernel: <<33>N>LMN:L M:f aiflaeilde d tto > coon > taccto renmtoatcet rermpotcebi nrd, psctbaitnd ,= st0a,t =po rt0 =, p0 > Mar 4 13:39:18 hosting01 kernel: o > Mar 4 13:39:18 hosting01 kernel: rt = 0 > Oh, I realize that I was referring to head/current in the previous post. Since I don't know exactly which commits broke/fixed it in head, I don't know if the fix has been MFC'd to stable/9 yet? If you try an up to date stable/9 kernel, just look to see if rpc.lockd and rpc.statd are running after you boot it on the server. rick > > After a few hours the nfs client is crashing like this: > > > Mar 4 13:55:33 hosting01 kernel: Fatal trap 12: page fault while in > kernel mode > Mar 4 13:55:33 hosting01 kernel: cpuid = 1; > Mar 4 13:55:33 hosting01 kernel: NLM: failed to contact remote > rpcbind, > stat = 0, port = 0 > Mar 4 13:55:33 hosting01 kernel: apic id = 07 > Mar 4 13:55:33 hosting01 kernel: fault virtual address = 0x18 > Mar 4 13:55:33 hosting01 kernel: fault code = supervisor > write data, page not present > Mar 4 13:55:33 hosting01 kernel: instruction pointer = > 0x20:0xffffffff807c4b0b > Mar 4 13:55:33 hosting01 kernel: > Mar 4 13:55:33 hosting01 kernel: stack pointer = > 0x28:0xffffff84916c1ec0 > Mar 4 13:55:33 hosting01 kernel: NLM: failed to contact remote > rpcbind, > stat = 0, port = 0 > Mar 4 13:55:33 hosting01 kernel: > Mar 4 13:55:33 hosting01 kernel: frame pointer = > 0x28:0xffffff84916c2070 > Mar 4 13:55:33 hosting01 kernel: code segment = base 0x0, > limit 0xfffff, type 0x1b > Mar 4 13:55:33 hosting01 kernel: = DPL 0, pres 1, long 1, def32 0, > gran 1 > Mar 4 13:55:33 hosting01 kernel: > Mar 4 13:55:33 hosting01 kernel: processor eflags = interrupt > enabled, > > > Any ideas how i can fix this? I appreciate any hint or fix. > > > > About the systems: > > FreeBSD 9 NFS server: > rpcbind_enable="YES" > nfs_server_enable="YES" > nfs_server_flags="-u -t -n 32" > nfs_reserved_port_only="YES" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > mountd_enable="YES > > rpc is running on the server: > root 1509 0.0 0.0 274296 320 ?? Ss Fri11PM 0:00.10 > /usr/sbin/rpc.statd > root 1528 0.0 0.0 14264 500 ?? Ss Fri11PM 7:41.35 > /usr/sbin/rpcbind > > exports: > root@openstorage1:~# more /etc/exports > /rz2pool/export/tmp -maproot=root 10.10.0.2 10.10.0.3 10.10.0.4 > 10.10.0.5 > /var/webs -maproot=root 10.10.0.2 10.10.0.3 10.10.0.4 10.10.0.5 > > FreeBSD 9 is a ZFS only system. > > > FreeBSD 8.2-p6 NFS client > rpcbind_enable="YES" > nfs_client_enable="YES" > nfs_reserved_port_only="YES" > rpc_lockd_enable="YES" > rpc_statd_enable="YES" > > nfs client fstab: > 10.0.0.12:/var/webs /var/webs nfs > rw,nfsv3,rdirplus,noatime,hard,intr,tcp,wsize=65536,rsize=65536,bg 0 > 0 > 10.0.0.12:/rz2pool/export/tmp /var/tmp nfs > rw,nfsv3,rdirplus,noexec,acregmin=30,acregmax=120,noatime,hard,intr,tcp,wsize=65536,rsize=65536,bg > 0 0 > > FreeBSD 8.2-p6 is a UFS only system. > > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Mon Mar 5 11:07:07 2012 Return-Path: Delivered-To: freebsd-fs@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 030FB106567E for ; Mon, 5 Mar 2012 11:07:07 +0000 (UTC) (envelope-from owner-bugmaster@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id DBFB58FC1F for ; Mon, 5 Mar 2012 11:07:06 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.5/8.14.5) with ESMTP id q25B76qI034844 for ; Mon, 5 Mar 2012 11:07:06 GMT (envelope-from owner-bugmaster@FreeBSD.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.5/8.14.5/Submit) id q25B76TE034842 for freebsd-fs@FreeBSD.org; Mon, 5 Mar 2012 11:07:06 GMT (envelope-from owner-bugmaster@FreeBSD.org) Date: Mon, 5 Mar 2012 11:07:06 GMT Message-Id: <201203051107.q25B76TE034842@freefall.freebsd.org> X-Authentication-Warning: freefall.freebsd.org: gnats set sender to owner-bugmaster@FreeBSD.org using -f From: FreeBSD bugmaster To: freebsd-fs@FreeBSD.org Cc: Subject: Current problem reports assigned to freebsd-fs@FreeBSD.org X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2012 11:07:07 -0000 Note: to view an individual PR, use: http://www.freebsd.org/cgi/query-pr.cgi?pr=(number). The following is a listing of current problems submitted by FreeBSD users. These represent problem reports covering all versions including experimental development code and obsolete releases. S Tracker Resp. Description -------------------------------------------------------------------------------- o kern/165392 fs Multiple mkdir/rmdir fails with errno 31 o kern/165087 fs [unionfs] lock violation in unionfs o kern/164472 fs [ufs] fsck -B panics on particular data inconsistency o kern/164370 fs [zfs] zfs destroy for snapshot fails on i386 and sparc o kern/164261 fs [nullfs] [patch] fix panic with NFS served from NULLFS o kern/164256 fs [zfs] device entry for volume is not created after zfs o kern/164184 fs [ufs] [panic] Kernel panic with ufs_makeinode o kern/163801 fs [md] [request] allow mfsBSD legacy installed in 'swap' o kern/163770 fs [zfs] [hang] LOR between zfs&syncer + vnlru leading to o kern/163501 fs [nfs] NFS exporting a dir and a subdir in that dir to o kern/162944 fs [coda] Coda file system module looks broken in 9.0 o kern/162860 fs [zfs] Cannot share ZFS filesystem to hosts with a hyph o kern/162751 fs [zfs] [panic] kernel panics during file operations o kern/162591 fs [nullfs] cross-filesystem nullfs does not work as expe o kern/162519 fs [zfs] "zpool import" relies on buggy realpath() behavi o kern/162362 fs [snapshots] [panic] ufs with snapshot(s) panics when g o kern/162083 fs [zfs] [panic] zfs unmount -f pool o kern/161968 fs [zfs] [hang] renaming snapshot with -r including a zvo o kern/161897 fs [zfs] [patch] zfs partition probing causing long delay o kern/161864 fs [ufs] removing journaling from UFS partition fails on o bin/161807 fs [patch] add option for explicitly specifying metadata o kern/161579 fs [smbfs] FreeBSD sometimes panics when an smb share is o kern/161533 fs [zfs] [panic] zfs receive panic: system ioctl returnin o kern/161511 fs [unionfs] Filesystem deadlocks when using multiple uni o kern/161438 fs [zfs] [panic] recursed on non-recursive spa_namespace_ o kern/161424 fs [nullfs] __getcwd() calls fail when used on nullfs mou o kern/161280 fs [zfs] Stack overflow in gptzfsboot o kern/161205 fs [nfs] [pfsync] [regression] [build] Bug report freebsd o kern/161169 fs [zfs] [panic] ZFS causes kernel panic in dbuf_dirty o kern/161112 fs [ufs] [lor] filesystem LOR in FreeBSD 9.0-BETA3 o kern/160893 fs [zfs] [panic] 9.0-BETA2 kernel panic o kern/160860 fs [ufs] Random UFS root filesystem corruption with SU+J o kern/160801 fs [zfs] zfsboot on 8.2-RELEASE fails to boot from root-o o kern/160790 fs [fusefs] [panic] VPUTX: negative ref count with FUSE o kern/160777 fs [zfs] [hang] RAID-Z3 causes fatal hang upon scrub/impo o kern/160706 fs [zfs] zfs bootloader fails when a non-root vdev exists o kern/160591 fs [zfs] Fail to boot on zfs root with degraded raidz2 [r o kern/160410 fs [smbfs] [hang] smbfs hangs when transferring large fil o kern/160283 fs [zfs] [patch] 'zfs list' does abort in make_dataset_ha o kern/159930 fs [ufs] [panic] kernel core o kern/159663 fs [socket] [nullfs] sockets don't work though nullfs mou o kern/159402 fs [zfs][loader] symlinks cause I/O errors o kern/159357 fs [zfs] ZFS MAXNAMELEN macro has confusing name (off-by- o kern/159356 fs [zfs] [patch] ZFS NAME_ERR_DISKLIKE check is Solaris-s o kern/159351 fs [nfs] [patch] - divide by zero in mountnfs() o kern/159251 fs [zfs] [request]: add FLETCHER4 as DEDUP hash option o kern/159077 fs [zfs] Can't cd .. with latest zfs version o kern/159048 fs [smbfs] smb mount corrupts large files o kern/159045 fs [zfs] [hang] ZFS scrub freezes system o kern/158839 fs [zfs] ZFS Bootloader Fails if there is a Dead Disk o kern/158802 fs amd(8) ICMP storm and unkillable process. o kern/158231 fs [nullfs] panic on unmounting nullfs mounted over ufs o f kern/157929 fs [nfs] NFS slow read o kern/157399 fs [zfs] trouble with: mdconfig force delete && zfs strip o kern/157179 fs [zfs] zfs/dbuf.c: panic: solaris assert: arc_buf_remov o kern/156797 fs [zfs] [panic] Double panic with FreeBSD 9-CURRENT and o kern/156781 fs [zfs] zfs is losing the snapshot directory, p kern/156545 fs [ufs] mv could break UFS on SMP systems o kern/156193 fs [ufs] [hang] UFS snapshot hangs && deadlocks processes o kern/156039 fs [nullfs] [unionfs] nullfs + unionfs do not compose, re o kern/155615 fs [zfs] zfs v28 broken on sparc64 -current o kern/155587 fs [zfs] [panic] kernel panic with zfs f kern/155411 fs [regression] [8.2-release] [tmpfs]: mount: tmpfs : No o kern/155199 fs [ext2fs] ext3fs mounted as ext2fs gives I/O errors o bin/155104 fs [zfs][patch] use /dev prefix by default when importing o kern/154930 fs [zfs] cannot delete/unlink file from full volume -> EN o kern/154828 fs [msdosfs] Unable to create directories on external USB o kern/154491 fs [smbfs] smb_co_lock: recursive lock for object 1 p kern/154228 fs [md] md getting stuck in wdrain state o kern/153996 fs [zfs] zfs root mount error while kernel is not located o kern/153753 fs [zfs] ZFS v15 - grammatical error when attempting to u o kern/153716 fs [zfs] zpool scrub time remaining is incorrect o kern/153695 fs [patch] [zfs] Booting from zpool created on 4k-sector o kern/153680 fs [xfs] 8.1 failing to mount XFS partitions o kern/153520 fs [zfs] Boot from GPT ZFS root on HP BL460c G1 unstable o kern/153418 fs [zfs] [panic] Kernel Panic occurred writing to zfs vol o kern/153351 fs [zfs] locking directories/files in ZFS o bin/153258 fs [patch][zfs] creating ZVOLs requires `refreservation' s kern/153173 fs [zfs] booting from a gzip-compressed dataset doesn't w o kern/153126 fs [zfs] vdev failure, zpool=peegel type=vdev.too_small o kern/152022 fs [nfs] nfs service hangs with linux client [regression] o kern/151942 fs [zfs] panic during ls(1) zfs snapshot directory o kern/151905 fs [zfs] page fault under load in /sbin/zfs o bin/151713 fs [patch] Bug in growfs(8) with respect to 32-bit overfl o kern/151648 fs [zfs] disk wait bug o kern/151629 fs [fs] [patch] Skip empty directory entries during name o kern/151330 fs [zfs] will unshare all zfs filesystem after execute a o kern/151326 fs [nfs] nfs exports fail if netgroups contain duplicate o kern/151251 fs [ufs] Can not create files on filesystem with heavy us o kern/151226 fs [zfs] can't delete zfs snapshot o kern/151111 fs [zfs] vnodes leakage during zfs unmount o kern/150503 fs [zfs] ZFS disks are UNAVAIL and corrupted after reboot o kern/150501 fs [zfs] ZFS vdev failure vdev.bad_label on amd64 o kern/150390 fs [zfs] zfs deadlock when arcmsr reports drive faulted o kern/150336 fs [nfs] mountd/nfsd became confused; refused to reload n o kern/149208 fs mksnap_ffs(8) hang/deadlock o kern/149173 fs [patch] [zfs] make OpenSolaris installa o kern/149015 fs [zfs] [patch] misc fixes for ZFS code to build on Glib o kern/149014 fs [zfs] [patch] declarations in ZFS libraries/utilities o kern/149013 fs [zfs] [patch] make ZFS makefiles use the libraries fro o kern/148504 fs [zfs] ZFS' zpool does not allow replacing drives to be o kern/148490 fs [zfs]: zpool attach - resilver bidirectionally, and re o kern/148368 fs [zfs] ZFS hanging forever on 8.1-PRERELEASE o kern/148138 fs [zfs] zfs raidz pool commands freeze o kern/147903 fs [zfs] [panic] Kernel panics on faulty zfs device o kern/147881 fs [zfs] [patch] ZFS "sharenfs" doesn't allow different " o kern/147560 fs [zfs] [boot] Booting 8.1-PRERELEASE raidz system take o kern/147420 fs [ufs] [panic] ufs_dirbad, nullfs, jail panic (corrupt o kern/146941 fs [zfs] [panic] Kernel Double Fault - Happens constantly o kern/146786 fs [zfs] zpool import hangs with checksum errors o kern/146708 fs [ufs] [panic] Kernel panic in softdep_disk_write_compl o kern/146528 fs [zfs] Severe memory leak in ZFS on i386 o kern/146502 fs [nfs] FreeBSD 8 NFS Client Connection to Server s kern/145712 fs [zfs] cannot offline two drives in a raidz2 configurat o kern/145411 fs [xfs] [panic] Kernel panics shortly after mounting an f bin/145309 fs bsdlabel: Editing disk label invalidates the whole dev o kern/145272 fs [zfs] [panic] Panic during boot when accessing zfs on o kern/145246 fs [ufs] dirhash in 7.3 gratuitously frees hashes when it o kern/145238 fs [zfs] [panic] kernel panic on zpool clear tank o kern/145229 fs [zfs] Vast differences in ZFS ARC behavior between 8.0 o kern/145189 fs [nfs] nfsd performs abysmally under load o kern/144929 fs [ufs] [lor] vfs_bio.c + ufs_dirhash.c p kern/144447 fs [zfs] sharenfs fsunshare() & fsshare_main() non functi o kern/144416 fs [panic] Kernel panic on online filesystem optimization s kern/144415 fs [zfs] [panic] kernel panics on boot after zfs crash o kern/144234 fs [zfs] Cannot boot machine with recent gptzfsboot code o kern/143825 fs [nfs] [panic] Kernel panic on NFS client o bin/143572 fs [zfs] zpool(1): [patch] The verbose output from iostat o kern/143212 fs [nfs] NFSv4 client strange work ... o kern/143184 fs [zfs] [lor] zfs/bufwait LOR o kern/142878 fs [zfs] [vfs] lock order reversal o kern/142597 fs [ext2fs] ext2fs does not work on filesystems with real o kern/142489 fs [zfs] [lor] allproc/zfs LOR o kern/142466 fs Update 7.2 -> 8.0 on Raid 1 ends with screwed raid [re o kern/142306 fs [zfs] [panic] ZFS drive (from OSX Leopard) causes two o kern/142068 fs [ufs] BSD labels are got deleted spontaneously o kern/141897 fs [msdosfs] [panic] Kernel panic. msdofs: file name leng o kern/141463 fs [nfs] [panic] Frequent kernel panics after upgrade fro o kern/141305 fs [zfs] FreeBSD ZFS+sendfile severe performance issues ( o kern/141091 fs [patch] [nullfs] fix panics with DIAGNOSTIC enabled o kern/141086 fs [nfs] [panic] panic("nfs: bioread, not dir") on FreeBS o kern/141010 fs [zfs] "zfs scrub" fails when backed by files in UFS2 o kern/140888 fs [zfs] boot fail from zfs root while the pool resilveri o kern/140661 fs [zfs] [patch] /boot/loader fails to work on a GPT/ZFS- o kern/140640 fs [zfs] snapshot crash o kern/140068 fs [smbfs] [patch] smbfs does not allow semicolon in file o kern/139725 fs [zfs] zdb(1) dumps core on i386 when examining zpool c o kern/139715 fs [zfs] vfs.numvnodes leak on busy zfs p bin/139651 fs [nfs] mount(8): read-only remount of NFS volume does n o kern/139597 fs [patch] [tmpfs] tmpfs initializes va_gen but doesn't u o kern/139564 fs [zfs] [panic] 8.0-RC1 - Fatal trap 12 at end of shutdo o kern/139407 fs [smbfs] [panic] smb mount causes system crash if remot o kern/138662 fs [panic] ffs_blkfree: freeing free block o kern/138421 fs [ufs] [patch] remove UFS label limitations o kern/138202 fs mount_msdosfs(1) see only 2Gb o kern/136968 fs [ufs] [lor] ufs/bufwait/ufs (open) o kern/136945 fs [ufs] [lor] filedesc structure/ufs (poll) o kern/136944 fs [ffs] [lor] bufwait/snaplk (fsync) o kern/136873 fs [ntfs] Missing directories/files on NTFS volume o kern/136865 fs [nfs] [patch] NFS exports atomic and on-the-fly atomic p kern/136470 fs [nfs] Cannot mount / in read-only, over NFS o kern/135546 fs [zfs] zfs.ko module doesn't ignore zpool.cache filenam o kern/135469 fs [ufs] [panic] kernel crash on md operation in ufs_dirb o kern/135050 fs [zfs] ZFS clears/hides disk errors on reboot o kern/134491 fs [zfs] Hot spares are rather cold... o kern/133676 fs [smbfs] [panic] umount -f'ing a vnode-based memory dis o kern/132960 fs [ufs] [panic] panic:ffs_blkfree: freeing free frag o kern/132397 fs reboot causes filesystem corruption (failure to sync b o kern/132331 fs [ufs] [lor] LOR ufs and syncer o kern/132237 fs [msdosfs] msdosfs has problems to read MSDOS Floppy o kern/132145 fs [panic] File System Hard Crashes o kern/131441 fs [unionfs] [nullfs] unionfs and/or nullfs not combineab o kern/131360 fs [nfs] poor scaling behavior of the NFS server under lo o kern/131342 fs [nfs] mounting/unmounting of disks causes NFS to fail o bin/131341 fs makefs: error "Bad file descriptor" on the mount poin o kern/130920 fs [msdosfs] cp(1) takes 100% CPU time while copying file o kern/130210 fs [nullfs] Error by check nullfs o kern/129760 fs [nfs] after 'umount -f' of a stale NFS share FreeBSD l o kern/129488 fs [smbfs] Kernel "bug" when using smbfs in smbfs_smb.c: o kern/129231 fs [ufs] [patch] New UFS mount (norandom) option - mostly o kern/129152 fs [panic] non-userfriendly panic when trying to mount(8) o kern/127787 fs [lor] [ufs] Three LORs: vfslock/devfs/vfslock, ufs/vfs o bin/127270 fs fsck_msdosfs(8) may crash if BytesPerSec is zero o kern/127029 fs [panic] mount(8): trying to mount a write protected zi o kern/126287 fs [ufs] [panic] Kernel panics while mounting an UFS file o kern/125895 fs [ffs] [panic] kernel: panic: ffs_blkfree: freeing free s kern/125738 fs [zfs] [request] SHA256 acceleration in ZFS o kern/123939 fs [msdosfs] corrupts new files f sparc/123566 fs [zfs] zpool import issue: EOVERFLOW o kern/122380 fs [ffs] ffs_valloc:dup alloc (Soekris 4801/7.0/USB Flash o bin/122172 fs [fs]: amd(8) automount daemon dies on 6.3-STABLE i386, o bin/121898 fs [nullfs] pwd(1)/getcwd(2) fails with Permission denied o bin/121072 fs [smbfs] mount_smbfs(8) cannot normally convert the cha o kern/120483 fs [ntfs] [patch] NTFS filesystem locking changes o kern/120482 fs [ntfs] [patch] Sync style changes between NetBSD and F o kern/118912 fs [2tb] disk sizing/geometry problem with large array o kern/118713 fs [minidump] [patch] Display media size required for a k o bin/118249 fs [ufs] mv(1): moving a directory changes its mtime o kern/118126 fs [nfs] [patch] Poor NFS server write performance o kern/118107 fs [ntfs] [panic] Kernel panic when accessing a file at N o kern/117954 fs [ufs] dirhash on very large directories blocks the mac o bin/117315 fs [smbfs] mount_smbfs(8) and related options can't mount o kern/117158 fs [zfs] zpool scrub causes panic if geli vdevs detach on o bin/116980 fs [msdosfs] [patch] mount_msdosfs(8) resets some flags f o conf/116931 fs lack of fsck_cd9660 prevents mounting iso images with o kern/116583 fs [ffs] [hang] System freezes for short time when using o bin/115361 fs [zfs] mount(8) gets into a state where it won't set/un o kern/114955 fs [cd9660] [patch] [request] support for mask,dirmask,ui o kern/114847 fs [ntfs] [patch] [request] dirmask support for NTFS ala o kern/114676 fs [ufs] snapshot creation panics: snapacct_ufs2: bad blo o bin/114468 fs [patch] [request] add -d option to umount(8) to detach o kern/113852 fs [smbfs] smbfs does not properly implement DFS referral o bin/113838 fs [patch] [request] mount(8): add support for relative p o bin/113049 fs [patch] [request] make quot(8) use getopt(3) and show o kern/112658 fs [smbfs] [patch] smbfs and caching problems (resolves b o kern/111843 fs [msdosfs] Long Names of files are incorrectly created o kern/111782 fs [ufs] dump(8) fails horribly for large filesystems s bin/111146 fs [2tb] fsck(8) fails on 6T filesystem o kern/109024 fs [msdosfs] [iconv] mount_msdosfs: msdosfs_iconv: Operat o kern/109010 fs [msdosfs] can't mv directory within fat32 file system o bin/107829 fs [2TB] fdisk(8): invalid boundary checking in fdisk / w o kern/106107 fs [ufs] left-over fsck_snapshot after unfinished backgro o kern/104406 fs [ufs] Processes get stuck in "ufs" state under persist o kern/104133 fs [ext2fs] EXT2FS module corrupts EXT2/3 filesystems o kern/103035 fs [ntfs] Directories in NTFS mounted disc images appear o kern/101324 fs [smbfs] smbfs sometimes not case sensitive when it's s o kern/99290 fs [ntfs] mount_ntfs ignorant of cluster sizes s bin/97498 fs [request] newfs(8) has no option to clear the first 12 o kern/97377 fs [ntfs] [patch] syntax cleanup for ntfs_ihash.c o kern/95222 fs [cd9660] File sections on ISO9660 level 3 CDs ignored o kern/94849 fs [ufs] rename on UFS filesystem is not atomic o bin/94810 fs fsck(8) incorrectly reports 'file system marked clean' o kern/94769 fs [ufs] Multiple file deletions on multi-snapshotted fil o kern/94733 fs [smbfs] smbfs may cause double unlock o kern/93942 fs [vfs] [patch] panic: ufs_dirbad: bad dir (patch from D o kern/92272 fs [ffs] [hang] Filling a filesystem while creating a sna o kern/91134 fs [smbfs] [patch] Preserve access and modification time a kern/90815 fs [smbfs] [patch] SMBFS with character conversions somet o kern/88657 fs [smbfs] windows client hang when browsing a samba shar o kern/88555 fs [panic] ffs_blkfree: freeing free frag on AMD 64 o kern/88266 fs [smbfs] smbfs does not implement UIO_NOCOPY and sendfi o bin/87966 fs [patch] newfs(8): introduce -A flag for newfs to enabl o kern/87859 fs [smbfs] System reboot while umount smbfs. o kern/86587 fs [msdosfs] rm -r /PATH fails with lots of small files o bin/85494 fs fsck_ffs: unchecked use of cg_inosused macro etc. o kern/80088 fs [smbfs] Incorrect file time setting on NTFS mounted vi o bin/74779 fs Background-fsck checks one filesystem twice and omits o kern/73484 fs [ntfs] Kernel panic when doing `ls` from the client si o bin/73019 fs [ufs] fsck_ufs(8) cannot alloc 607016868 bytes for ino o kern/71774 fs [ntfs] NTFS cannot "see" files on a WinXP filesystem o bin/70600 fs fsck(8) throws files away when it can't grow lost+foun o kern/68978 fs [panic] [ufs] crashes with failing hard disk, loose po o kern/65920 fs [nwfs] Mounted Netware filesystem behaves strange o kern/65901 fs [smbfs] [patch] smbfs fails fsx write/truncate-down/tr o kern/61503 fs [smbfs] mount_smbfs does not work as non-root o kern/55617 fs [smbfs] Accessing an nsmb-mounted drive via a smb expo o kern/51685 fs [hang] Unbounded inode allocation causes kernel to loc o kern/51583 fs [nullfs] [patch] allow to work with devices and socket o kern/36566 fs [smbfs] System reboot with dead smb mount and umount o bin/27687 fs fsck(8) wrapper is not properly passing options to fsc o kern/18874 fs [2TB] 32bit NFS servers export wrong negative values t 261 problems total. From owner-freebsd-fs@FreeBSD.ORG Mon Mar 5 15:54:21 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DE8FF1065670 for ; Mon, 5 Mar 2012 15:54:21 +0000 (UTC) (envelope-from maxim.konovalov@gmail.com) Received: from mp2.macomnet.net (ipv6.irc.int.ru [IPv6:2a02:28:1:2::1b:2]) by mx1.freebsd.org (Postfix) with ESMTP id 4C32D8FC25 for ; Mon, 5 Mar 2012 15:54:21 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mp2.macomnet.net (8.14.5/8.14.5) with ESMTP id q25Fs3ns045848; Mon, 5 Mar 2012 19:54:04 +0400 (MSK) (envelope-from maxim.konovalov@gmail.com) Date: Mon, 5 Mar 2012 19:54:03 +0400 (MSK) From: Maxim Konovalov To: Kirk McKusick In-Reply-To: <201112212312.pBLNCj2a054427@chez.mckusick.com> Message-ID: References: <201112212312.pBLNCj2a054427@chez.mckusick.com> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Cc: freebsd-fs@freebsd.org, Dieter BSD Subject: Re: Maximum blocksize for FFS? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2012 15:54:22 -0000 Hi Kirk, > > On Tue, Dec 13, 2011 at 7:18 PM, Kirk McKusick wrote: > > > The default blocksize in FreeBSD 9.0 is 32K/4K. We have been > > > running with this size in -current for a almost a year with no > > > reported problems. > > > > Hi, > > > > There is a reported problem: > > The number of inode was divided by two with FreeBSD 9.0 (PR > > bin/162659) and this create some problems because "the number of > > fragments per inode (NFPI) was not adapted to the new default block > > size" (Bruce Evans'explanation [1]). > > > > Regards, > > > > Olivier > > > > [1] http://lists.freebsd.org/pipermail/freebsd-bugs/2011-December/046713.html > > Thanks for bringing your report to my attention. I have applied the > suggested change (reducing NFPI from 4 to 2) so as to keep the default > number of inodes for a 32K/4K filesystem the same as were created on > a 16K/2K filesystem. > Does it make sense to commit the following diff as well (as Bruce pointed out the comment is not true anymore)? Index: param.h =================================================================== --- param.h (revision 231431) +++ param.h (working copy) @@ -241,9 +241,6 @@ * make it too big the kernel will not be able to optimally use * the KVM memory reserved for the buffer cache and will wind * up with too-few buffers. - * - * The default is 16384, roughly 2x the block size used by a - * normal UFS filesystem. */ #define MAXBSIZE 65536 /* must be power of 2 */ #define BKVASIZE 16384 /* must be power of 2 */ %%% -- Maxim Konovalov From owner-freebsd-fs@FreeBSD.ORG Mon Mar 5 17:42:32 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6ED1D106566C for ; Mon, 5 Mar 2012 17:42:32 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id E6AE18FC15 for ; Mon, 5 Mar 2012 17:42:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; d=contactlab.com; s=s768; c=relaxed/relaxed; q=dns/txt; i=@contactlab.com; t=1330969343; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=eRUx/9P3EXjZcFMsG+J2ZGa3nn0=; b=IHP28u0G99rlPH1cQfMzQDco8I8A3gyr939LJ+7BAv9uXBDsGI6oMVS3YdLPj0n7 XavO0WeZld6jYs3y0mBhWEQLoQqgJsOriTlMobe0PP+QlQJWEin3I8CU6ii/yUZ/; Received: from [213.92.90.12] ([213.92.90.12:27892] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.2.3.43244 r(43244)) with ESMTP id 20/13-28515-FFAF45F4; Mon, 05 Mar 2012 18:42:23 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S4bvT-000Edu-Hp for freebsd-fs@freebsd.org; Mon, 05 Mar 2012 18:42:23 +0100 Received: (qmail 56286 invoked by uid 89); 5 Mar 2012 17:42:23 -0000 Received: from fw1.contactlab.it (HELO imac-casadamico.local) (213.92.90.4) by mx3-master.housing.tomato.lan with SMTP; 5 Mar 2012 17:42:23 -0000 Message-ID: <4F54FAFD.9070101@contactlab.com> Date: Mon, 05 Mar 2012 18:42:21 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: kpneal@pobox.com References: <20120305143609.GA76430@neutralgood.org> In-Reply-To: <20120305143609.GA76430@neutralgood.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 8.2-p5 and Perc6/i X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 05 Mar 2012 17:42:32 -0000 Il 05/03/12 15:36, kpneal@pobox.com ha scritto: > On Sat, Mar 03, 2012 at 10:00:21AM +0100, Davide D'Amico wrote: >> Hi all, >> I've a couple of dell r410 servers (smtp1 and smtp2) in production with >> the same hw config: > >> Mar 2 22:29:59.056 smtp2 mfi0: COMMAND 0xffffff80009aa7f8 TIMEOUT >> AFTER 1784 SECONDS > I've got an R610 running 8.2-RELEASE with the same card. I was seeing the > same errors until I changed BIOS settings. The settings need to be for > "high performance" and all the fancy power saving settings like the C1E > states need to be disabled. > > I'm sorry I can't be more specific. The machine in question is running in > a datacenter so I can't check. Very thanks, after having turned off "Turbo mode", "C1E", "C-State" in Bios -> Processor everything went fine. Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Tue Mar 6 19:51:59 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B2A9E106564A; Tue, 6 Mar 2012 19:51:59 +0000 (UTC) (envelope-from luke@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns226322.hybrid-sites.com [176.31.229.137]) by mx1.freebsd.org (Postfix) with ESMTP id 655098FC15; Tue, 6 Mar 2012 19:51:58 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S4zpG-0007hn-Pt; Tue, 06 Mar 2012 19:13:36 +0000 Received: from [176.31.225.127] (helo=ewes by ns226322.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Tue, 06 Mar 2012 19:13:34 -0000 Received: from [193.37.225.212] (helo=[10.0.126.148] by ns225413.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Tue, 06 Mar 2012 19:13:34 -0000 From: Luke Marsden To: freebsd-stable@freebsd.org, freebsd-fs@freebsd.org Content-Type: text/plain; charset="UTF-8" Date: Tue, 06 Mar 2012 19:13:23 +0000 Message-ID: <1331061203.2218.38.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: team@hybrid-logic.co.uk Subject: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Mar 2012 19:51:59 -0000 Hi all, I'm having some trouble with some production 8.2-RELEASE servers where the 'Active' and 'Inact' memory values reported by top don't seem to correspond with the processes which are running on the machine. I have two near-identical machines (with slightly different workloads); on one, let's call it A, active + free is small (6.5G) and on the other (B) active + free is large (13.6G), even though they have almost identical sums-of-resident memory (8.3G on A and 9.3G on B). The only difference is that A has a smaller number of quite long-running processes (it's hosting a small number of busy sites) and B has a larger number of more frequently killed/recycled processes (it's hosting a larger number of quiet sites, so the FastCGI processes get killed and restarted frequently). Notably B has many more ZFS filesystems mounted than A (around 4,000 versus 100). The machines are otherwise under similar amounts of load. I hoped that the community could please help me understand what's going on with respect to the worryingly large amount of active + free memory on B. Both machines are ZFS-on-root with FreeBSD 8.2-RELEASE with uptimes around 5-6 days. I have recently reduced the ARC cache on both machines since my previous thread [1] and Wired memory usage is now stable at 6G on A and 7G on B with an arc_max of 4G on both machines. Neither of the machines have any swap in use: Swap: 10G Total, 10G Free My current (probably quite simplistic) understanding of the FreeBSD virtual memory system is that, for each process as reported by top: * Size corresponds to the total size of all the text pages for the process (those belonging to code in the binary itself and linked libraries) plus data pages (including stack and malloc()'d but not-yet-written-to memory segments). * Resident corresponds to a subset of the pages above: those pages which actually occupy physical/core memory. Notably pages may appear in size but not appear in resident for read-only text pages from libraries which have not been used yet or which have been malloc()'d but not yet written-to. My understanding for the values for the system as a whole (at the top in 'top') is as follows: * Active / inactive memory is the same thing: resident memory from processes in use. Being in the inactive as opposed to active list simply indicates that the pages in question are less recently used and therefore more likely to get swapped out if the machine comes under memory pressure. * Wired is mostly kernel memory. * Cache is freed memory which the kernel has decided to keep in case it correspond to a useful page in future; it can be cheaply evicted into the free list. * Free memory is actually not being used for anything. It seems that pages which occur in the active + inactive lists must occur in the resident memory of one or more processes ("or more" since processes can share pages in e.g. read-only shared libs or COW forked address space). Conversely, if a page *does not* occur in the resident memory of any process, it must not occupy any space in the active + inactive lists. Therefore the active + inactive memory should always be less than or equal to the sum of the resident memory of all the processes on the system, right? But it's not. So, I wrote a very simple Python script to add up the resident memory values in the output from 'top' and, on machine A: Mem: 3388M Active, 3209M Inact, 6066M Wired, 196K Cache, 11G Free There were 246 processes totalling 8271 MB resident memory Whereas on machine B: Mem: 11G Active, 2598M Inact, 7177M Wired, 733M Cache, 1619M Free There were 441 processes totalling 9297 MB resident memory Now, on machine A: 3388M active + 3209M inactive - 8271M sum-of-resident = -1674M I can attribute this negative value to shared libraries between the running processes (which the sum-of-res is double-counting but active + inactive is not). But on machine B: 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M I'm struggling to explain how, when there are only 9.2G (worst case, discounting shared pages) of resident processes, the system is using 11G + 2598M = 13.8G of memory! This "missing memory" is scary, because it seems to be increasing over time, and eventually when the system runs out of free memory, I'm certain it will crash in the same way described in my previous thread [1]. Is my understanding of the virtual memory system badly broken - in which case please educate me ;-) or is there a real problem here? If so how can I dig deeper to help uncover/fix it? Best Regards, Luke Marsden [1] lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html [2] https://gist.github.com/1988153 -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Tue Mar 6 20:34:18 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 07ED0106564A for ; Tue, 6 Mar 2012 20:34:18 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-lpp01m010-f54.google.com (mail-lpp01m010-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 76C838FC08 for ; Tue, 6 Mar 2012 20:34:16 +0000 (UTC) Received: by lagv3 with SMTP id v3so9136009lag.13 for ; Tue, 06 Mar 2012 12:34:16 -0800 (PST) Received-SPF: pass (google.com: domain of asmrookie@gmail.com designates 10.112.10.41 as permitted sender) client-ip=10.112.10.41; Authentication-Results: mr.google.com; spf=pass (google.com: domain of asmrookie@gmail.com designates 10.112.10.41 as permitted sender) smtp.mail=asmrookie@gmail.com; dkim=pass header.i=asmrookie@gmail.com Received: from mr.google.com ([10.112.10.41]) by 10.112.10.41 with SMTP id f9mr12062464lbb.8.1331066056187 (num_hops = 1); Tue, 06 Mar 2012 12:34:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=7c7TxVuV5fQtrLqX+Yf9wNa/vC+OA9Q90X91vtKsw1s=; b=bQdjm5OowhZvCjztMjC8s2l0MOuApKc5SJoTDArDfypqQ3tRV0ZLkkNUqq4VFTrT7G dv4fhPfJRamgKZDohHwSI2lPSKbQqr8tb4To1c4cVuUn4tT3c1sMouQEFmzq2EtIm1Ts ciM27GbSyYwT8enYvgRzQZvBQlQeod9AytP4Fio4iPFAZt2JeA9uLCOKUJIZUO5Cdhwn 3dECUksMIS6kGZXFOu2oR70sKTGyj5Pcf9V4zxe8wfqYzATMk4BM6BaO7uyaxks07Opj ck/TwqoX8Y84w5trhzqgcQIplTetWpgMUEkhNJjWSu9SmzGF9dDcaQ4uc1sGmg/00TYg 9bsQ== MIME-Version: 1.0 Received: by 10.112.10.41 with SMTP id f9mr9858767lbb.8.1331064459688; Tue, 06 Mar 2012 12:07:39 -0800 (PST) Sender: asmrookie@gmail.com Received: by 10.112.41.5 with HTTP; Tue, 6 Mar 2012 12:07:39 -0800 (PST) In-Reply-To: <201203062001.q26K1Q7R055245@svn.freebsd.org> References: <201203062001.q26K1Q7R055245@svn.freebsd.org> Date: Tue, 6 Mar 2012 20:07:39 +0000 X-Google-Sender-Auth: kp6NxKA7YxMGX8_xFkoKKM8tvuA Message-ID: From: Attilio Rao To: FreeBSD Arch , "freebsd-current@freebsd.org" , FreeBSD FS Content-Type: text/plain; charset=UTF-8 Cc: Subject: Re: svn commit: r232619 - in head: . sys/amd64/conf sys/arm/conf sys/i386/conf sys/ia64/conf sys/mips/conf sys/pc98/conf sys/powerpc/conf sys/sparc64/conf X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Mar 2012 20:34:18 -0000 2012/3/6, Attilio Rao : > Author: attilio > Date: Tue Mar 6 20:01:25 2012 > New Revision: 232619 > URL: http://svn.freebsd.org/changeset/base/232619 > > Log: > Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported > platforms. > This will make every attempt to mount a non-mpsafe filesystem to the > kernel forbidden, unless it is expressely compiled with > VFS_ALLOW_NONMPSAFE option. This is just a gentle reminder in order to point you further to the "official" page: http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS and to mention that the time for removing non-mpsafe filesystem is approaching. In 6 months we will disconnect from the tree the non-mpsafe filesystems and will remove the whole non-mpsafe handling infrastructure in the VFS and the buffer cache, thus please think about stepping up and convert your favourite filesystem. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein From owner-freebsd-fs@FreeBSD.ORG Tue Mar 6 21:03:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 68522106566B; Tue, 6 Mar 2012 21:03:06 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from noop.in-addr.com (mail.in-addr.com [IPv6:2001:470:8:162::1]) by mx1.freebsd.org (Postfix) with ESMTP id 304618FC0C; Tue, 6 Mar 2012 21:03:06 +0000 (UTC) Received: from gjp by noop.in-addr.com with local (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S51Ww-00011k-7w; Tue, 06 Mar 2012 16:02:46 -0500 Date: Tue, 6 Mar 2012 16:02:46 -0500 From: Gary Palmer To: Attilio Rao Message-ID: <20120306210246.GA80168@in-addr.com> References: <201203062001.q26K1Q7R055245@svn.freebsd.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on noop.in-addr.com); SAEximRunCond expanded to false Cc: FreeBSD FS , FreeBSD Arch , "freebsd-current@freebsd.org" Subject: Re: svn commit: r232619 - in head: . sys/amd64/conf sys/arm/conf sys/i386/conf sys/ia64/conf sys/mips/conf sys/pc98/conf sys/powerpc/conf sys/sparc64/conf X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 06 Mar 2012 21:03:06 -0000 On Tue, Mar 06, 2012 at 08:07:39PM +0000, Attilio Rao wrote: > 2012/3/6, Attilio Rao : > > Author: attilio > > Date: Tue Mar 6 20:01:25 2012 > > New Revision: 232619 > > URL: http://svn.freebsd.org/changeset/base/232619 > > > > Log: > > Disable the option VFS_ALLOW_NONMPSAFE by default on all the supported > > platforms. > > This will make every attempt to mount a non-mpsafe filesystem to the > > kernel forbidden, unless it is expressely compiled with > > VFS_ALLOW_NONMPSAFE option. > > This is just a gentle reminder in order to point you further to the > "official" page: > http://wiki.freebsd.org/NONMPSAFE_DEORBIT_VFS > > and to mention that the time for removing non-mpsafe filesystem is approaching. > In 6 months we will disconnect from the tree the non-mpsafe > filesystems and will remove the whole non-mpsafe handling > infrastructure in the VFS and the buffer cache, thus please think > about stepping up and convert your favourite filesystem. Given that the wiki page still has: > Technical notes on locking Filesystems > > TBD it would be useful to people who aren't filesystem locking experts to have a little more information on the wiki Thanks, Gary From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 00:36:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id B7CE01065678; Wed, 7 Mar 2012 00:36:27 +0000 (UTC) (envelope-from luke@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns226322.hybrid-sites.com [176.31.229.137]) by mx1.freebsd.org (Postfix) with ESMTP id 741958FC22; Wed, 7 Mar 2012 00:36:26 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S54rf-000JaQ-8o; Wed, 07 Mar 2012 00:36:24 +0000 Received: from [78.105.122.99] (helo=[192.168.1.23] by ns226322.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Wed, 07 Mar 2012 00:36:23 -0000 From: Luke Marsden To: Chuck Swiger In-Reply-To: <4F569DFF.8040807@mac.com> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Mar 2012 00:36:21 +0000 Message-ID: <1331080581.2589.28.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk, freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 00:36:27 -0000 Thanks for your email, Chuck. > > Conversely, if a page *does not* occur in the resident > > memory of any process, it must not occupy any space in the active + > > inactive lists. > > Hmm...if a process gets swapped out entirely, the pages for it will be moved > to the cache list, flushed, and then reused as soon as the disk I/O completes. > But there is a window where the process can be marked as swapped out (and > considered no longer resident), but still has some of it's pages in physical > memory. There's no swapping happening on these machines (intentionally so, because as soon as we hit swap everything goes tits up), so this window doesn't concern me. I'm trying to confirm that, on a system with no pages swapped out, that the following is a true statement: a page is accounted for in active + inactive if and only if it corresponds to one or more of the pages accounted for in the resident memory lists of all the processes on the system (as per the output of 'top' and 'ps') > > Therefore the active + inactive memory should always be less than or > > equal to the sum of the resident memory of all the processes on the > > system, right? > > No. If you've got a lot of process pages shared (ie, a webserver with lots of > httpd children, or a database pulling in a large common shmem area), then your > process resident sizes can be very large compared to the system-wide > active+inactive count. But that's what I'm saying... sum(process resident sizes) >= active + inactive Or as I said it above, equivalently: active + inactive <= sum(process resident sizes) The data I've got from this system, and what's killing us, shows the opposite: active + inactive > sum(process resident sizes) - by over 5GB now and growing, which is what keeps causing these machines to crash. In particular: Mem: 13G Active, 1129M Inact, 7543M Wired, 120M Cache, 1553M Free But the total sum of resident memories is 9457M (according to summing the output from ps or top). 13G + 1129M = 14441M (active + inact) > 9457M (sum of res) That's 4984M out, and that's almost enough to push us over the edge. If my understanding of VM is correct, I don't see how this can happen. But it's happening, and it's causing real trouble here because our free memory keeps hitting zero and then we swap-spiral. What can I do to investigate this discrepancy? Are there some tools that I can use to debug the memory allocated in "active" to find out where it's going, if not to resident process memory? Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 01:02:07 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 6AA291065672 for ; Wed, 7 Mar 2012 01:02:07 +0000 (UTC) (envelope-from freebsd@damnhippie.dyndns.org) Received: from qmta08.emeryville.ca.mail.comcast.net (qmta08.emeryville.ca.mail.comcast.net [76.96.30.80]) by mx1.freebsd.org (Postfix) with ESMTP id 4B8F58FC0A for ; Wed, 7 Mar 2012 01:02:06 +0000 (UTC) Received: from omta22.emeryville.ca.mail.comcast.net ([76.96.30.89]) by qmta08.emeryville.ca.mail.comcast.net with comcast id iQgw1i0071vN32cA8Qoruq; Wed, 07 Mar 2012 00:48:51 +0000 Received: from damnhippie.dyndns.org ([24.8.232.202]) by omta22.emeryville.ca.mail.comcast.net with comcast id iQoq1i0084NgCEG8iQoq9H; Wed, 07 Mar 2012 00:48:51 +0000 Received: from [172.22.42.240] (revolution.hippie.lan [172.22.42.240]) by damnhippie.dyndns.org (8.14.3/8.14.3) with ESMTP id q270mlJM015471; Tue, 6 Mar 2012 17:48:48 -0700 (MST) (envelope-from freebsd@damnhippie.dyndns.org) From: Ian Lepore To: Luke Marsden In-Reply-To: <1331061203.2218.38.camel@pow> References: <1331061203.2218.38.camel@pow> Content-Type: text/plain; charset="us-ascii" Date: Tue, 06 Mar 2012 17:48:47 -0700 Message-ID: <1331081327.32194.19.camel@revolution.hippie.lan> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 FreeBSD GNOME Team Port Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk, freebsd-stable@freebsd.org Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 01:02:07 -0000 On Tue, 2012-03-06 at 19:13 +0000, Luke Marsden wrote: > Hi all, > > I'm having some trouble with some production 8.2-RELEASE servers where > the 'Active' and 'Inact' memory values reported by top don't seem to > correspond with the processes which are running on the machine. I have > two near-identical machines (with slightly different workloads); on one, > let's call it A, active + free is small (6.5G) and on the other (B) > active + free is large (13.6G), even though they have almost identical > sums-of-resident memory (8.3G on A and 9.3G on B). > > The only difference is that A has a smaller number of quite long-running > processes (it's hosting a small number of busy sites) and B has a larger > number of more frequently killed/recycled processes (it's hosting a > larger number of quiet sites, so the FastCGI processes get killed and > restarted frequently). Notably B has many more ZFS filesystems mounted > than A (around 4,000 versus 100). The machines are otherwise under > similar amounts of load. I hoped that the community could please help > me understand what's going on with respect to the worryingly large > amount of active + free memory on B. > > Both machines are ZFS-on-root with FreeBSD 8.2-RELEASE with uptimes > around 5-6 days. I have recently reduced the ARC cache on both machines > since my previous thread [1] and Wired memory usage is now stable at 6G > on A and 7G on B with an arc_max of 4G on both machines. > > Neither of the machines have any swap in use: > > Swap: 10G Total, 10G Free > > My current (probably quite simplistic) understanding of the FreeBSD > virtual memory system is that, for each process as reported by top: > > * Size corresponds to the total size of all the text pages for the > process (those belonging to code in the binary itself and linked > libraries) plus data pages (including stack and malloc()'d but > not-yet-written-to memory segments). > * Resident corresponds to a subset of the pages above: those pages > which actually occupy physical/core memory. Notably pages may > appear in size but not appear in resident for read-only text > pages from libraries which have not been used yet or which have > been malloc()'d but not yet written-to. > > My understanding for the values for the system as a whole (at the top in > 'top') is as follows: > > * Active / inactive memory is the same thing: resident memory from > processes in use. Being in the inactive as opposed to active > list simply indicates that the pages in question are less > recently used and therefore more likely to get swapped out if > the machine comes under memory pressure. > * Wired is mostly kernel memory. > * Cache is freed memory which the kernel has decided to keep in > case it correspond to a useful page in future; it can be cheaply > evicted into the free list. > * Free memory is actually not being used for anything. > > It seems that pages which occur in the active + inactive lists must > occur in the resident memory of one or more processes ("or more" since > processes can share pages in e.g. read-only shared libs or COW forked > address space). Conversely, if a page *does not* occur in the resident > memory of any process, it must not occupy any space in the active + > inactive lists. > > Therefore the active + inactive memory should always be less than or > equal to the sum of the resident memory of all the processes on the > system, right? > > But it's not. So, I wrote a very simple Python script to add up the > resident memory values in the output from 'top' and, on machine A: > > Mem: 3388M Active, 3209M Inact, 6066M Wired, 196K Cache, 11G > Free > There were 246 processes totalling 8271 MB resident memory > > Whereas on machine B: > > Mem: 11G Active, 2598M Inact, 7177M Wired, 733M Cache, 1619M > Free > There were 441 processes totalling 9297 MB resident memory > > Now, on machine A: > > 3388M active + 3209M inactive - 8271M sum-of-resident = -1674M > > I can attribute this negative value to shared libraries between the > running processes (which the sum-of-res is double-counting but active + > inactive is not). But on machine B: > > 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M > > I'm struggling to explain how, when there are only 9.2G (worst case, > discounting shared pages) of resident processes, the system is using 11G > + 2598M = 13.8G of memory! > > This "missing memory" is scary, because it seems to be increasing over > time, and eventually when the system runs out of free memory, I'm > certain it will crash in the same way described in my previous thread > [1]. > > Is my understanding of the virtual memory system badly broken - in which > case please educate me ;-) or is there a real problem here? If so how > can I dig deeper to help uncover/fix it? > > Best Regards, > Luke Marsden > > [1] lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html > [2] https://gist.github.com/1988153 > In my experience, the bulk of the memory in the inactive category is cached disk blocks, at least for ufs (I think zfs does things differently). On this desktop machine I have 12G physical and typically have roughly 11G inactive, and I can unmount one particular filesystem where most of my work is done and instantly I have almost no inactive and roughly 11G free. -- Ian From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 08:23:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 01024106564A; Wed, 7 Mar 2012 08:23:56 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 63BCE8FC16; Wed, 7 Mar 2012 08:23:54 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q278NcHi027854; Wed, 7 Mar 2012 10:23:38 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q278NcNj086714; Wed, 7 Mar 2012 10:23:38 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q278NcDP086713; Wed, 7 Mar 2012 10:23:38 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 7 Mar 2012 10:23:38 +0200 From: Konstantin Belousov To: Luke Marsden Message-ID: <20120307082338.GD75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="6fxres9eYHxJ4LJN" Content-Disposition: inline In-Reply-To: <1331080581.2589.28.camel@pow> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, PLING_QUERY autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, Chuck Swiger , freebsd-stable@freebsd.org, freebsd-questions@freebsd.org, team@hybrid-logic.co.uk Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 08:23:56 -0000 --6fxres9eYHxJ4LJN Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 07, 2012 at 12:36:21AM +0000, Luke Marsden wrote: > Thanks for your email, Chuck. >=20 > > > Conversely, if a page *does not* occur in the resident > > > memory of any process, it must not occupy any space in the active + > > > inactive lists. > >=20 > > Hmm...if a process gets swapped out entirely, the pages for it will be = moved=20 > > to the cache list, flushed, and then reused as soon as the disk I/O com= pletes.=20 > > But there is a window where the process can be marked as swapped out = (and=20 > > considered no longer resident), but still has some of it's pages in phy= sical=20 > > memory. >=20 > There's no swapping happening on these machines (intentionally so, > because as soon as we hit swap everything goes tits up), so this window > doesn't concern me. >=20 > I'm trying to confirm that, on a system with no pages swapped out, that > the following is a true statement: >=20 > a page is accounted for in active + inactive if and only if it > corresponds to one or more of the pages accounted for in the > resident memory lists of all the processes on the system (as per > the output of 'top' and 'ps') No. The pages belonging to vnode vm object can be active or inactive or cached but not mapped into any process address space. --6fxres9eYHxJ4LJN Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9XGwkACgkQC3+MBN1Mb4gEEgCfeS6aA0sX9T+NgXGhplLSE3DA 7xEAnRS1EdCLMcsOI8u3ADhCURXYNhyh =kFBk -----END PGP SIGNATURE----- --6fxres9eYHxJ4LJN-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 09:26:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 82C7B1065672; Wed, 7 Mar 2012 09:26:15 +0000 (UTC) (envelope-from luke@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns226322.hybrid-sites.com [176.31.229.137]) by mx1.freebsd.org (Postfix) with ESMTP id 3C7A08FC1B; Wed, 7 Mar 2012 09:26:14 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S5D8K-0006JQ-Ra; Wed, 07 Mar 2012 09:26:13 +0000 Received: from [78.105.122.99] (helo=[192.168.1.23] by ns226322.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Wed, 07 Mar 2012 09:26:08 -0000 From: Luke Marsden To: Konstantin Belousov In-Reply-To: <20120307082338.GD75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> <20120307082338.GD75778@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Mar 2012 09:26:06 +0000 Message-ID: <1331112366.2589.51.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: freebsd-fs@freebsd.org, Ian Lepore , team@hybrid-logic.co.uk, freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 09:26:15 -0000 On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote: > On Wed, Mar 07, 2012 at 12:36:21AM +0000, Luke Marsden wrote: > > I'm trying to confirm that, on a system with no pages swapped out, that > > the following is a true statement: > > > > a page is accounted for in active + inactive if and only if it > > corresponds to one or more of the pages accounted for in the > > resident memory lists of all the processes on the system (as per > > the output of 'top' and 'ps') > No. > > The pages belonging to vnode vm object can be active or inactive or cached > but not mapped into any process address space. Thank you, Konstantin. Does the number of vnodes we've got open on this machine (272011) fully explain away the memory gap? Memory gap: 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M Active vnodes: vfs.numvnodes: 272011 That gives a lower bound at 17.18Kb per vode (or higher if we take into account shared libs, etc); that seems a bit high for a vnode vm object doesn't it? If that doesn't fully explain it, what else might be chewing through active memory? Also, when are vnodes freed? This system does have some tuning... kern.maxfiles: 1000000 vm.pmap.pv_entry_max: 73296250 Could that be contributing to so much active + inactive memory (5GB+ more than expected), or do PV entries live in wired e.g. kernel memory? On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote: > In my experience, the bulk of the memory in the inactive category is > cached disk blocks, at least for ufs (I think zfs does things > differently). On this desktop machine I have 12G physical and > typically have roughly 11G inactive, and I can unmount one particular > filesystem where most of my work is done and instantly I have almost > no inactive and roughly 11G free. Okay, so this could be UFS disk cache, except the system is ZFS-on-root with no UFS filesystems active or mounted. Can I confirm that no double-caching of ZFS data is happening in active + inactive (+ cache) memory? Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 09:28:51 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 03C471065672; Wed, 7 Mar 2012 09:28:51 +0000 (UTC) (envelope-from luke-lists@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns225413.hybrid-sites.com [176.31.225.127]) by mx1.freebsd.org (Postfix) with ESMTP id B3F448FC0C; Wed, 7 Mar 2012 09:28:49 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S5DAj-000CuX-NS; Wed, 07 Mar 2012 09:28:43 +0000 Received: from [78.105.122.99] (helo=[192.168.1.23] by ns225413.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Wed, 07 Mar 2012 09:28:37 -0000 From: Luke Marsden To: freebsd-questions@freebsd.org, freebsd-stable@freebsd.org, freebsd-fs@freebsd.org In-Reply-To: <20120307082338.GD75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> <20120307082338.GD75778@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Mar 2012 09:28:35 +0000 Message-ID: <1331112515.2589.52.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 09:28:51 -0000 On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote: > On Wed, Mar 07, 2012 at 12:36:21AM +0000, Luke Marsden wrote: > > I'm trying to confirm that, on a system with no pages swapped out, that > > the following is a true statement: > > > > a page is accounted for in active + inactive if and only if it > > corresponds to one or more of the pages accounted for in the > > resident memory lists of all the processes on the system (as per > > the output of 'top' and 'ps') > No. > > The pages belonging to vnode vm object can be active or inactive or cached > but not mapped into any process address space. Thank you, Konstantin. Does the number of vnodes we've got open on this machine (272011) fully explain away the memory gap? Memory gap: 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M Active vnodes: vfs.numvnodes: 272011 That gives a lower bound at 17.18Kb per vode (or higher if we take into account shared libs, etc); that seems a bit high for a vnode vm object doesn't it? If that doesn't fully explain it, what else might be chewing through active memory? Also, when are vnodes freed? This system does have some tuning... kern.maxfiles: 1000000 vm.pmap.pv_entry_max: 73296250 Could that be contributing to so much active + inactive memory (5GB+ more than expected), or do PV entries live in wired e.g. kernel memory? On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote: > In my experience, the bulk of the memory in the inactive category is > cached disk blocks, at least for ufs (I think zfs does things > differently). On this desktop machine I have 12G physical and > typically have roughly 11G inactive, and I can unmount one particular > filesystem where most of my work is done and instantly I have almost > no inactive and roughly 11G free. Okay, so this could be UFS disk cache, except the system is ZFS-on-root with no UFS filesystems active or mounted. Can I confirm that no double-caching of ZFS data is happening in active + inactive (+ cache) memory? Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 09:31:26 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0588D106566C for ; Wed, 7 Mar 2012 09:31:26 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 6E0118FC16 for ; Wed, 7 Mar 2012 09:31:25 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q279VAmK033221; Wed, 7 Mar 2012 11:31:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q279VAGd087094; Wed, 7 Mar 2012 11:31:10 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q279V9lN087093; Wed, 7 Mar 2012 11:31:09 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 7 Mar 2012 11:31:09 +0200 From: Konstantin Belousov To: Luke Marsden Message-ID: <20120307093109.GF75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> <20120307082338.GD75778@deviant.kiev.zoral.com.ua> <1331112366.2589.51.camel@pow> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Rr/zTE2kOLnfAECX" Content-Disposition: inline In-Reply-To: <1331112366.2589.51.camel@pow> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, PLING_QUERY autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, Ian Lepore , team@hybrid-logic.co.uk Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 09:31:26 -0000 --Rr/zTE2kOLnfAECX Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 07, 2012 at 09:26:06AM +0000, Luke Marsden wrote: > On Wed, 2012-03-07 at 10:23 +0200, Konstantin Belousov wrote: > > On Wed, Mar 07, 2012 at 12:36:21AM +0000, Luke Marsden wrote: > > > I'm trying to confirm that, on a system with no pages swapped out, th= at > > > the following is a true statement: > > >=20 > > > a page is accounted for in active + inactive if and only if it > > > corresponds to one or more of the pages accounted for in the > > > resident memory lists of all the processes on the system (as = per > > > the output of 'top' and 'ps') > > No. > >=20 > > The pages belonging to vnode vm object can be active or inactive or cac= hed > > but not mapped into any process address space. >=20 > Thank you, Konstantin. Does the number of vnodes we've got open on this > machine (272011) fully explain away the memory gap? >=20 > Memory gap: > 11264M active + 2598M inactive - 9297M sum-of-resident =3D 4565M > =20 > Active vnodes: > vfs.numvnodes: 272011 >=20 > That gives a lower bound at 17.18Kb per vode (or higher if we take into > account shared libs, etc); that seems a bit high for a vnode vm object > doesn't it? Vnode vm object keeps the set of pages belonging to the vnode. There is nothing bad (or good) there. >=20 > If that doesn't fully explain it, what else might be chewing through > active memory? >=20 > Also, when are vnodes freed? >=20 > This system does have some tuning... > kern.maxfiles: 1000000 > vm.pmap.pv_entry_max: 73296250 >=20 > Could that be contributing to so much active + inactive memory (5GB+ > more than expected), or do PV entries live in wired e.g. kernel memory? pv entries are accounted as wired memory. >=20 >=20 > On Tue, 2012-03-06 at 17:48 -0700, Ian Lepore wrote: > > In my experience, the bulk of the memory in the inactive category is > > cached disk blocks, at least for ufs (I think zfs does things > > differently). On this desktop machine I have 12G physical and > > typically have roughly 11G inactive, and I can unmount one particular > > filesystem where most of my work is done and instantly I have almost > > no inactive and roughly 11G free. >=20 > Okay, so this could be UFS disk cache, except the system is ZFS-on-root > with no UFS filesystems active or mounted. Can I confirm that no > double-caching of ZFS data is happening in active + inactive (+ cache) > memory? ZFS double-buffers the mmaped files. --Rr/zTE2kOLnfAECX Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9XKt0ACgkQC3+MBN1Mb4jA/ACg86XbRffmpRAUBECh0y9DGiz5 GNgAoLXNzE8YTJ/lX70JieLwe0CDm9UQ =2dBb -----END PGP SIGNATURE----- --Rr/zTE2kOLnfAECX-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 09:53:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BBE7A1065673 for ; Wed, 7 Mar 2012 09:53:17 +0000 (UTC) (envelope-from luke@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns226322.hybrid-sites.com [176.31.229.137]) by mx1.freebsd.org (Postfix) with ESMTP id 7758C8FC18 for ; Wed, 7 Mar 2012 09:53:16 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S5DYU-0004li-Uy; Wed, 07 Mar 2012 09:53:15 +0000 Received: from [78.105.122.99] (helo=[192.168.1.23] by ns226322.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Wed, 07 Mar 2012 09:53:10 -0000 From: Luke Marsden To: Konstantin Belousov In-Reply-To: <20120307093109.GF75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> <20120307082338.GD75778@deviant.kiev.zoral.com.ua> <1331112366.2589.51.camel@pow> <20120307093109.GF75778@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Mar 2012 09:53:08 +0000 Message-ID: <1331113988.2589.64.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 09:53:17 -0000 On Wed, 2012-03-07 at 11:31 +0200, Konstantin Belousov wrote: > > > > > > The pages belonging to vnode vm object can be active or inactive or cached > > > but not mapped into any process address space. > > > > Thank you, Konstantin. Does the number of vnodes we've got open on this > > machine (272011) fully explain away the memory gap? > > > > Memory gap: > > 11264M active + 2598M inactive - 9297M sum-of-resident = 4565M > > > > Active vnodes: > > vfs.numvnodes: 272011 > > > > That gives a lower bound at 17.18Kb per vode (or higher if we take into > > account shared libs, etc); that seems a bit high for a vnode vm object > > doesn't it? > Vnode vm object keeps the set of pages belonging to the vnode. There is > nothing bad (or good) there. Thanks. My question is, as an estimate, how large should I expect these vnode objects to be, in terms of the active + inactive memory they consume? I'm trying to explain 5GB+ of memory which has "gone missing" on this system. Active memory usage is currently at 13G (and inactive at 1G) even though only the sum of the resident memory sizes in the output of 'ps' comes only to 8557MB. Can 5779M of memory be explained by 272011 vnodes entries? > > Okay, so this could be UFS disk cache, except the system is ZFS-on-root > > with no UFS filesystems active or mounted. Can I confirm that no > > double-caching of ZFS data is happening in active + inactive (+ cache) > > memory? > > ZFS double-buffers the mmaped files. The only mmap on this system, to my knowledge, is done in Apache's scoreboard, which is relatively small and doesn't explain the 5G discrepancy. Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 10:06:05 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8AF6810656A6 for ; Wed, 7 Mar 2012 10:06:05 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 010558FC1C for ; Wed, 7 Mar 2012 10:06:04 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q27A5vaW036402; Wed, 7 Mar 2012 12:05:57 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q27A5vr4087265; Wed, 7 Mar 2012 12:05:57 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q27A5vGQ087264; Wed, 7 Mar 2012 12:05:57 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Wed, 7 Mar 2012 12:05:57 +0200 From: Konstantin Belousov To: Luke Marsden Message-ID: <20120307100557.GG75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> <20120307082338.GD75778@deviant.kiev.zoral.com.ua> <1331112366.2589.51.camel@pow> <20120307093109.GF75778@deviant.kiev.zoral.com.ua> <1331113988.2589.64.camel@pow> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="CEiQS9p18X4Il5Lz" Content-Disposition: inline In-Reply-To: <1331113988.2589.64.camel@pow> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-3.3 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, PLING_QUERY autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 10:06:05 -0000 --CEiQS9p18X4Il5Lz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Mar 07, 2012 at 09:53:08AM +0000, Luke Marsden wrote: > On Wed, 2012-03-07 at 11:31 +0200, Konstantin Belousov wrote: > > > >=20 > > > > The pages belonging to vnode vm object can be active or inactive or= cached > > > > but not mapped into any process address space. > > >=20 > > > Thank you, Konstantin. Does the number of vnodes we've got open on t= his > > > machine (272011) fully explain away the memory gap? > > >=20 > > > Memory gap: > > > 11264M active + 2598M inactive - 9297M sum-of-resident =3D 45= 65M > > > =20 > > > Active vnodes: > > > vfs.numvnodes: 272011 > > >=20 > > > That gives a lower bound at 17.18Kb per vode (or higher if we take in= to > > > account shared libs, etc); that seems a bit high for a vnode vm object > > > doesn't it? > > Vnode vm object keeps the set of pages belonging to the vnode. There is > > nothing bad (or good) there. >=20 > Thanks. My question is, as an estimate, how large should I expect these > vnode objects to be, in terms of the active + inactive memory they > consume? >=20 > I'm trying to explain 5GB+ of memory which has "gone missing" on this > system. Active memory usage is currently at 13G (and inactive at 1G) > even though only the sum of the resident memory sizes in the output of > 'ps' comes only to 8557MB. >=20 > Can 5779M of memory be explained by 272011 vnodes entries? It can be explained why whatever count of vnodes. This is cached vnode pages. >=20 > > > Okay, so this could be UFS disk cache, except the system is ZFS-on-ro= ot > > > with no UFS filesystems active or mounted. Can I confirm that no > > > double-caching of ZFS data is happening in active + inactive (+ cache) > > > memory? > >=20 > > ZFS double-buffers the mmaped files. >=20 > The only mmap on this system, to my knowledge, is done in Apache's > scoreboard, which is relatively small and doesn't explain the 5G > discrepancy. Any executed binary is mmaped, as well as shared libraries. --CEiQS9p18X4Il5Lz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9XMwUACgkQC3+MBN1Mb4hiEgCeNDQOrgZHzGxmIlDS8l/HpUFd xz0AnRfPWm/yxVPstuyLt0L0UiZEMoL3 =pPfD -----END PGP SIGNATURE----- --CEiQS9p18X4Il5Lz-- From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 10:49:24 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id BDC90106566C for ; Wed, 7 Mar 2012 10:49:24 +0000 (UTC) (envelope-from luke@hybrid-logic.co.uk) Received: from hybrid-sites.com (ns226322.hybrid-sites.com [176.31.229.137]) by mx1.freebsd.org (Postfix) with ESMTP id 75FA88FC12 for ; Wed, 7 Mar 2012 10:49:23 +0000 (UTC) Received: from [127.0.0.1] (helo=ewes) by hybrid-sites.com with esmtp (Exim 4.72 (FreeBSD)) (envelope-from ) id 1S5EQl-0003fi-Si; Wed, 07 Mar 2012 10:49:21 +0000 Received: from [78.105.122.99] (helo=[192.168.1.23] by ns226322.hybrid-sites.com with esmtp (Hybrid Web Cluster distributed mail proxy) (envelope-from ); Wed, 07 Mar 2012 10:49:15 -0000 From: Luke Marsden To: Konstantin Belousov In-Reply-To: <20120307100557.GG75778@deviant.kiev.zoral.com.ua> References: <1331061203.2218.38.camel@pow> <4F569DFF.8040807@mac.com> <1331080581.2589.28.camel@pow> <20120307082338.GD75778@deviant.kiev.zoral.com.ua> <1331112366.2589.51.camel@pow> <20120307093109.GF75778@deviant.kiev.zoral.com.ua> <1331113988.2589.64.camel@pow> <20120307100557.GG75778@deviant.kiev.zoral.com.ua> Content-Type: text/plain; charset="UTF-8" Date: Wed, 07 Mar 2012 10:49:13 +0000 Message-ID: <1331117353.2589.88.camel@pow> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit X-Spam-bar: + Cc: freebsd-fs@freebsd.org, team@hybrid-logic.co.uk Subject: Re: FreeBSD 8.2 - active plus inactive memory leak!? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 10:49:24 -0000 On Wed, 2012-03-07 at 12:05 +0200, Konstantin Belousov wrote: > > I'm trying to explain 5GB+ of memory which has "gone missing" on this > > system. Active memory usage is currently at 13G (and inactive at 1G) > > even though only the sum of the resident memory sizes in the output of > > 'ps' comes only to 8557MB. > > > > Can 5779M of memory be explained by 272011 vnodes entries? > It can be explained why whatever count of vnodes. This is cached vnode > pages. Right, actual disk data pages associated with open files (vnodes) are being cached in active/inactive memory? So if I open a large file, read some data from it, and close it (the refcount on the vnode reaches zero), and there is free memory on the system, then those data pages will go both into the ARC cache (maybe) and into cached vnode pages? Or does the system only cache the vnode pages if they are *not* cached in the ARC? I understand that the vnode page cache and the ARC cache are contending for system memory, is this correct? But perhaps the vnode page cache is better at evicting cache when the system starts seeing VM pressure? I have limited the ARC cache, is there a way to limit how much active/inactive memory is used for cached vnode pages? (Why? Because the system is more stable when it has plenty of free memory, see lists.freebsd.org/pipermail/freebsd-fs/2012-February/013775.html). Or, do I not need to worry, because unlike the ARC cache, eviction of cached vnode pages will occur before the system starts swapping/paging? Cached vnode pages will never be swapped/paged to disk, right? (This seems that it would be foolish, because it would take just as long to reconstruct the vnode from the data on disk as it would to page it back in.) If all of the above is correct is correct, then I can be confident that even with low Free values of memory and large values of Active and Inact due to cached vnode pages, that the system will be stable as long as it has a limited ARC cache. Right? Thank you for your help! > > > > Okay, so this could be UFS disk cache, except the system is ZFS-on-root > > > > with no UFS filesystems active or mounted. Can I confirm that no > > > > double-caching of ZFS data is happening in active + inactive (+ cache) > > > > memory? > > > > > > ZFS double-buffers the mmaped files. > > > > The only mmap on this system, to my knowledge, is done in Apache's > > scoreboard, which is relatively small and doesn't explain the 5G > > discrepancy. > Any executed binary is mmaped, as well as shared libraries. That should be okay, since the total size of the binaries and shared libraries across the system is small compared to the amount of system memory: 360M /usr/jails/phpapache/usr/local/lib/ 41M /usr/jails/phpapache/usr/local/bin/ And all application jails have nullfs mounts to this basejail, and so are sharing the mmap'ed pages (I have confirmed this by inspecting /proc//map for processes running in two different jails). Thanks, Luke -- CTO, Hybrid Logic +447791750420 | +1-415-449-1165 | www.hybrid-cluster.com From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 14:14:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C31C710656A5 for ; Wed, 7 Mar 2012 14:14:31 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id 240BC8FC1C for ; Wed, 7 Mar 2012 14:14:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; d=contactlab.com; s=s768; c=relaxed/relaxed; q=dns/txt; i=@contactlab.com; t=1331129662; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=2bewcdvVug3OkeKxJhUEZJR2Dd0=; b=UFDzKdNyYFmpXr9V2Kjf/pZxevi3IoGonKppj7TUbLBaJscccZ/kWW8sWp4GnTCy oHdz3WqFcqVGN7B5wla4ml50jFkLzNayAjjuuAo0TxR9loiyhQqfVQxWVP9OY8iv; Received: from [213.92.90.12] ([213.92.90.12:54145] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.2.3.43244 r(43244)) with ESMTP id 2B/53-28515-E3D675F4; Wed, 07 Mar 2012 15:14:22 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S5HdG-000O9r-8W for freebsd-fs@freebsd.org; Wed, 07 Mar 2012 15:14:22 +0100 Received: (qmail 92865 invoked by uid 89); 7 Mar 2012 14:14:22 -0000 Received: from fast.tomato.it (HELO davepro.local) (62.101.64.91) by mx3-master.housing.tomato.lan with SMTP; 7 Mar 2012 14:14:22 -0000 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: freebsd-fs@freebsd.org Date: Wed, 07 Mar 2012 15:14:21 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Davide D'Amico" Organization: ContactLab Message-ID: User-Agent: Opera Mail/11.61 (MacIntel) Subject: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 14:14:31 -0000 Hi, I have a server DELL R210 with two sata drives: ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: ATA-8 SATA 2.x device ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) ada0: Previously was known as ad4 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: ATA-8 SATA 2.x device ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) ada1: Previously was known as ad6 Previously (8.x) - when I didn't use a hw raid - I used to install freebsd in a drive (i.e. ad4), boot from it and then: # sysctl kern.geom.debugflags=16 # gmirror label -v -b round-robin data ad4 # gmirror insert data ad6 // Modification to /etc/fstab // reboot How could accomplish to the same task with 9.0-RELEASE? Thanks, -- d. From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 19:07:11 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 82D2E106564A; Wed, 7 Mar 2012 19:07:11 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 42B5F8FC19; Wed, 7 Mar 2012 19:07:11 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id CE7A846B8E; Wed, 7 Mar 2012 14:07:05 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 400F0B911; Wed, 7 Mar 2012 14:07:05 -0500 (EST) From: John Baldwin To: fs@freebsd.org Date: Wed, 7 Mar 2012 13:18:07 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) MIME-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <201203071318.08241.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 07 Mar 2012 14:07:05 -0500 (EST) Cc: pho@freebsd.org, kib@freebsd.org Subject: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 19:07:11 -0000 So I ran into this problem at work. Suppose you have a process that opens a read-write file descriptor with O_EXLOCK (so it has an flock()). It then writes out a binary into that file. Another process wants to execve() the file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and will call execve() once it has locked the file. In theory, what should happen is that the second process should wait until the first process has finished and called close(). In practice what happens is that I occasionally see the second process fail with ETXTBUSY. The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file separately from the call to vn_close() which drops the writecount. Thus, the second process can do an open() and flock() of the file and subsequently call execve() after the first process has done the VOP_ADVLOCK(), but before it calls into vn_close(). In fact, since vn_close() requires a write lock on the vnode, this turns out to not be too hard to reproduce at all. Below is a simple test program that reproduces this constantly. To use, copy /bin/test to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run ./flock_close_race /tmp/foo. The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock until after vn_close() executes. However, even with that fix applied, my test case still fails. Now it is because open() with a given lock flag is non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' process has the fd locked, the writecount can still be bumped. One gross hack would be to defer the bump of the writecount to the caller of vn_open() if the caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus it doesn't actually work. I ended up moving acquiring the lock into vn_open_cred(). The current patch I'm testing has both of these approaches, but the first one is #if 0'd out, and the second is #if 1'd. http://www.freebsd.org/~jhb/patches/flock_open_close.patch #include #include #include #include #include #include #include #include #include static void usage(void) { fprintf(stderr, "Usage: flock_close_race [args]\n"); exit(1); } static void child(const char *binary) { int fd; /* Exit as soon as our parent exits. */ while (getppid() != 1) { fd = open(binary, O_RDWR | O_EXLOCK); if (fd < 0) err(1, "can't open %s", binary); close(fd); } exit(0); } static void exec_child(char **av) { int fd; fd = open(av[0], O_RDONLY | O_SHLOCK); execv(av[0], av); err(127, "execv"); } int main(int ac, char **av) { struct stat sb; pid_t pid; if (ac < 2) usage(); if (stat(av[1], &sb) != 0) err(1, "stat(%s)", av[1]); if (!S_ISREG(sb.st_mode)) errx(1, "%s not an executable", av[1]); pid = fork(); if (pid < 0) err(1, "fork"); if (pid == 0) child(av[1]); for (;;) { pid = fork(); if (pid < 0) err(1, "vfork"); if (pid == 0) exec_child(av + 1); wait(NULL); } return (0); } -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Wed Mar 7 23:02:21 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B6668106566B for ; Wed, 7 Mar 2012 23:02:21 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 7477F8FC15 for ; Wed, 7 Mar 2012 23:02:21 +0000 (UTC) Received: by vcmm1 with SMTP id m1so5765650vcm.13 for ; Wed, 07 Mar 2012 15:02:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=AUBjVkk63mLEgDQ8OCPJdyWUbqPeloX5fdJSJruKYng=; b=bKZjiIEWR/zvkR1WU0Yh/T5ElNWgjz7QNs4Gl/lA6E4OZqBN+54Og6Zvhgq0pYUoid ACcMT/uVTnlhpzy1R1SvR+XZJqip7d08NNKGkagb/AQdOF51erTQJAPi6XtRVNTEm5BT 4TO8dNq+7z+nzs5sEZRcK05Mv1itPh+T9PYe2/4hgsC07U05Jyk+4QL+nca2X/VcwMS9 MixBVN072E98giUzyAYMZ2RW+zqGG+aQhoyHvtMdnGs2XOth3xHpFFZehjEnq1inhogP h1wuXRVCkjsYXmcYGg5jV3R4qtAzQf1MCgjkcVn+l7bosHOI9V2LyAABCAW00y+kqK/O q0oQ== MIME-Version: 1.0 Received: by 10.52.93.77 with SMTP id cs13mr6269347vdb.71.1331161340803; Wed, 07 Mar 2012 15:02:20 -0800 (PST) Received: by 10.220.178.74 with HTTP; Wed, 7 Mar 2012 15:02:20 -0800 (PST) In-Reply-To: References: Date: Wed, 7 Mar 2012 15:02:20 -0800 Message-ID: From: Freddie Cash To: "Davide D'Amico" Content-Type: text/plain; charset=UTF-8 Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 07 Mar 2012 23:02:21 -0000 On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico wrote: > Hi, I have a server DELL R210 with two sata drives: > ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > ada0: ATA-8 SATA 2.x device > ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada0: Command Queueing enabled > ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) > ada0: Previously was known as ad4 > ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > ada1: ATA-8 SATA 2.x device > ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > ada1: Command Queueing enabled > ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) > ada1: Previously was known as ad6 > > Previously (8.x) - when I didn't use a hw raid - I used to install freebsd > in a drive (i.e. ad4), boot from it and then: > > # sysctl kern.geom.debugflags=16 > # gmirror label -v -b round-robin data ad4 > # gmirror insert data ad6 > // Modification to /etc/fstab > // reboot > > How could accomplish to the same task with 9.0-RELEASE? http://people.freebsd.org/~rse/mirror/ -- Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 04:58:15 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8F1021065672 for ; Thu, 8 Mar 2012 04:58:15 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id DEFCD8FC13 for ; Thu, 8 Mar 2012 04:58:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; d=contactlab.com; s=s768; c=relaxed/relaxed; q=dns/txt; i=@contactlab.com; t=1331182692; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=/sIhZGt4JzZjHnLvIb76LgNBqBI=; b=noVd2lK4XOkSHEfAPfAepqMZsBr3z6QrYfXYihZlcPexacYApUoD5onfsw/O5DW1 Gm67SoKdjnCj+wwmSkTTar43Y3LBusMIbQKHPtYIqdVSB9ZDt1mnT06NNCOIkoFk; Received: from [213.92.90.12] ([213.92.90.12:23755] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.2.3.43244 r(43244)) with ESMTP id 9C/F5-28515-46C385F4; Thu, 08 Mar 2012 05:58:12 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S5VQa-000OQJ-TZ for freebsd-fs@freebsd.org; Thu, 08 Mar 2012 05:58:13 +0100 Received: (qmail 93883 invoked by uid 89); 8 Mar 2012 04:58:12 -0000 Received: from dynamic-adsl-94-36-132-49.clienti.tiscali.it (HELO imac-casadamico.local) (94.36.132.49) by mx3-master.housing.tomato.lan with SMTP; 8 Mar 2012 04:58:12 -0000 Message-ID: <4F583C63.60809@contactlab.com> Date: Thu, 08 Mar 2012 05:58:11 +0100 From: Davide D'Amico User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: Freddie Cash References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 04:58:15 -0000 Il 08/03/12 00:02, Freddie Cash ha scritto: > On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico > wrote: >> Hi, I have a server DELL R210 with two sata drives: >> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >> ada0: ATA-8 SATA 2.x device >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada0: Command Queueing enabled >> ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >> ada0: Previously was known as ad4 >> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 >> ada1: ATA-8 SATA 2.x device >> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> ada1: Command Queueing enabled >> ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >> ada1: Previously was known as ad6 >> >> Previously (8.x) - when I didn't use a hw raid - I used to install freebsd >> in a drive (i.e. ad4), boot from it and then: >> >> # sysctl kern.geom.debugflags=16 >> # gmirror label -v -b round-robin data ad4 >> # gmirror insert data ad6 >> // Modification to /etc/fstab >> // reboot >> >> How could accomplish to the same task with 9.0-RELEASE? > http://people.freebsd.org/~rse/mirror/ > Hi Freddie, and thanks for your link. I followed that procedure until 9.0-RELEASE but the new installer uses GPT as the default partition schema, which seems incompatible with gmirror. I noticed in the handbook (http://www.freebsd.org/doc/handbook/geom-mirror.html): "The following procedure is also incompatible with the default installation settings of FreeBSD 9./X/which use the newGPTpartition scheme. GEOM will overwriteGPTmetadata, causing data loss and possibly an unbootable system." So I think that the old procedure is no more useful. Thanks, d. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 06:02:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id D2EAA106566B; Thu, 8 Mar 2012 06:02:38 +0000 (UTC) (envelope-from wollman@hergotha.csail.mit.edu) Received: from hergotha.csail.mit.edu (wollman-1-pt.tunnel.tserv4.nyc4.ipv6.he.net [IPv6:2001:470:1f06:ccb::2]) by mx1.freebsd.org (Postfix) with ESMTP id 73E3F8FC08; Thu, 8 Mar 2012 06:02:38 +0000 (UTC) Received: from hergotha.csail.mit.edu (localhost [127.0.0.1]) by hergotha.csail.mit.edu (8.14.4/8.14.4) with ESMTP id q2862b19064512; Thu, 8 Mar 2012 01:02:37 -0500 (EST) (envelope-from wollman@hergotha.csail.mit.edu) Received: (from wollman@localhost) by hergotha.csail.mit.edu (8.14.4/8.14.4/Submit) id q2862blY064509; Thu, 8 Mar 2012 01:02:37 -0500 (EST) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <20312.19325.130822.520853@hergotha.csail.mit.edu> Date: Thu, 8 Mar 2012 01:02:37 -0500 From: Garrett Wollman To: freebsd-fs@freebsd.org, freebsd-stable@freebsd.org X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.7 (hergotha.csail.mit.edu [127.0.0.1]); Thu, 08 Mar 2012 01:02:37 -0500 (EST) X-Spam-Status: No, score=-1.0 required=5.0 tests=ALL_TRUSTED autolearn=disabled version=3.3.2 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on hergotha.csail.mit.edu Cc: Subject: Deadlock (?) with ZFS, NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 06:02:38 -0000 This is unfortunately a very difficult issue to report (particularly because I don't have console access to the machine until I get into the office to reboot it). My server was happily serving NFS on top of a huge ZFS pool when it ground to a halt -- but only partially. All of the nfsd threads got stuck in ZFS, and there were a large number of pending I/Os, but there was nothing apparently wrong with the storage system itself. (At a minimum, smartctl could talk to the drives, so CAM and the mps driver were working enough to get commands to them.) ssh logins worked fine, but anything that required writing to that zpool (such as zpool scrub, sync, and reboot) would get stuck somewhere in ZFS. zpool status and zfs-stats reported no issues; netstat -p tcp reported many NFS connections with large unhandled receive buffers (owing to the nfsds being unable to complete the request they were working on). Nothing in the kernel message buffer to indicate a problem. Eventually, the machine stopped responding to network requests as well, although for a while after sshd stopped working, it still responded to pings. Here's a snapshot of top(1). Note that the zfskern{txg_thread_enter} thread is getting some CPU, although I can't tell if it's making any progress or just spinning. zfskern{l2arc_feed_thread} would occasionally get some CPU as well, but appeared to do nothing. PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 155 ki31 0K 128K CPU6 6 45.7H 100.00% idle{idle: cpu6} 11 root 155 ki31 0K 128K CPU3 3 45.7H 100.00% idle{idle: cpu3} 11 root 155 ki31 0K 128K CPU5 5 44.9H 100.00% idle{idle: cpu5} 11 root 155 ki31 0K 128K CPU7 7 44.7H 100.00% idle{idle: cpu7} 11 root 155 ki31 0K 128K CPU2 2 44.3H 100.00% idle{idle: cpu2} 11 root 155 ki31 0K 128K CPU1 1 44.1H 100.00% idle{idle: cpu1} 11 root 155 ki31 0K 128K RUN 4 43.7H 100.00% idle{idle: cpu4} 11 root 155 ki31 0K 128K CPU0 0 43.1H 99.46% idle{idle: cpu0} 5 root -8 - 0K 128K zio->i 5 67:25 0.98% zfskern{txg_thread_enter} 12 root -92 - 0K 800K WAIT 7 297:15 0.00% intr{irq264: ix0:que } 0 root -16 0 0K 6144K - 1 297:11 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 2 297:09 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 3 297:07 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 7 296:58 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 6 296:57 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 5 296:54 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 0 296:53 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 4 296:40 0.00% kernel{zio_write_issue_} 5 root -8 - 0K 128K l2arc_ 4 220:18 0.00% zfskern{l2arc_feed_threa} 12 root -92 - 0K 800K WAIT 2 163:56 0.00% intr{irq259: ix0:que } 13 root -8 - 0K 48K - 2 93:35 0.00% geom{g_down} 12 root -92 - 0K 800K WAIT 3 85:26 0.00% intr{irq260: ix0:que } 0 root -92 0 0K 6144K - 0 77:53 0.00% kernel{ix0 que} 8 root -16 - 0K 16K ipmire 5 74:03 0.00% ipmi1: kcs 1815 root 20 0 10052K 456K zfs 4 72:07 0.00% nfsd{nfsd: master} 1815 root 20 0 10052K 456K zfs 0 71:51 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K tx->tx 5 71:45 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 2 71:43 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 1 71:31 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 0 71:25 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 6 71:23 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 1 71:18 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 4 71:15 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 1 71:13 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 7 71:10 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 0 71:10 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 6 71:07 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 2 71:02 0.00% nfsd{nfsd: service} 1815 root 20 0 10052K 456K zfs 5 70:58 0.00% nfsd{nfsd: service} 12 root -68 - 0K 800K WAIT 2 70:50 0.00% intr{swi2: cambio} 1815 root 20 0 10052K 456K zfs 7 70:33 0.00% nfsd{nfsd: service} 0 root -16 0 0K 6144K - 3 67:16 0.00% kernel{zio_write_intr_7} 0 root -16 0 0K 6144K - 3 67:14 0.00% kernel{zio_write_intr_4} 0 root -16 0 0K 6144K - 7 67:13 0.00% kernel{zio_write_intr_0} 0 root -16 0 0K 6144K - 4 67:13 0.00% kernel{zio_write_intr_3} 0 root -16 0 0K 6144K - 5 67:12 0.00% kernel{zio_write_intr_6} 0 root -16 0 0K 6144K - 6 67:11 0.00% kernel{zio_write_intr_1} 0 root -16 0 0K 6144K - 2 67:11 0.00% kernel{zio_write_intr_5} 0 root -16 0 0K 6144K - 1 67:10 0.00% kernel{zio_write_intr_2} 0 root -16 0 0K 6144K - 6 63:38 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 5 63:37 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 2 63:36 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 3 63:32 0.00% kernel{zio_write_issue_} 0 root -16 0 0K 6144K - 0 63:31 0.00% kernel{zio_write_issue_} 13 root -8 - 0K 48K - 6 62:49 0.00% geom{g_up} 12 root -88 - 0K 800K WAIT 7 52:19 0.00% intr{irq266: mps0} 0 root -92 0 0K 6144K - 0 46:25 0.00% kernel{ix0 que} 12 root -92 - 0K 800K WAIT 5 42:43 0.00% intr{irq262: ix0:que } This is a 9.0-RELEASE system with the mps driver backported from 9-stable. Hourly and daily snapshots were enabled. It had been working extremely well up to this point, and we were looking at possibly replacing our existing NFS servers with this architecture. -GAWollman From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 07:37:29 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E4CF91065673 for ; Thu, 8 Mar 2012 07:37:29 +0000 (UTC) (envelope-from peter.maloney@brockmann-consult.de) Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.17.10]) by mx1.freebsd.org (Postfix) with ESMTP id 727818FC17 for ; Thu, 8 Mar 2012 07:37:29 +0000 (UTC) Received: from [10.3.0.26] ([141.4.215.32]) by mrelayeu.kundenserver.de (node=mreu3) with ESMTP (Nemesis) id 0MLVSQ-1S66941kYM-000Yu4; Thu, 08 Mar 2012 08:37:28 +0100 Message-ID: <4F5861B7.7010201@brockmann-consult.de> Date: Thu, 08 Mar 2012 08:37:27 +0100 From: Peter Maloney User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: <20312.19325.130822.520853@hergotha.csail.mit.edu> In-Reply-To: <20312.19325.130822.520853@hergotha.csail.mit.edu> X-Provags-ID: V02:K0:kiMqujtanIV4/DSgQjbO95fDDs3qflMyqgSBQbxcsK4 cCjiqtT3JyfSGiiSVuX3RJ2LXNFp4S/pUbZsPGS240m2IhtgD0 H9qf+LcM5XZVDhoxZz04EGbv5Kp3eOX+HGlDsn2OcnyC9IhwiN pZVRFIh1wdo1wE6qcJJdWpgqxOsEAy+h3M9xffWWMB8fMdT2Dx wkghedAhLb5Ct4LlXa17JTNBGVYLZwfRJrdb3nl/wED/Uz+dlt iWt4KlvX4XpAOCeU7OUvYdVWMCmgAnfP8zzYPU2jQ9mKEKH0wA akGAsgnJXfY+9pFUKba/3dJzXjYgOpaGq+JVmUTT35R7RCqmvo NS0mORmHv2sj1KkkRYPOSWsbjbAwCQL6ZUU/FfDVo Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Re: Deadlock (?) with ZFS, NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 07:37:30 -0000 On 03/08/2012 07:02 AM, Garrett Wollman wrote: > This is a 9.0-RELEASE system with the mps driver backported from > 9-stable. Hourly and daily snapshots were enabled. It had been > working extremely well up to this point, and we were looking at > possibly replacing our existing NFS servers with this architecture. On one system, (haven't tried it lately), it will hang a single dataset if a Linux *client* mounts the zfs dataset and does: cd /mount/point ls .zfs/snapshot So try that and see if it reproduces the problem. Setting snapdir=hidden doesn't prevent accessing it, only hides it from "ls -a" output. The problem did not occur back when I had few or no snapshots. It also doesn't happen on the replicated backup server, with all the same software, data and snapshots. So far, my hack solution is to mount /var/empty on top of every .zfs directory on the client side. Another idea is to never export a whole dataset from the root of it, because that is the only place that contains the .zfs directory, other than if you have subdatasets inside that one. -- -------------------------------------------- Peter Maloney Brockmann Consult Max-Planck-Str. 2 21502 Geesthacht Germany Tel: +49 4152 889 300 Fax: +49 4152 889 333 E-mail: peter.maloney@brockmann-consult.de Internet: http://www.brockmann-consult.de -------------------------------------------- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 11:35:57 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 81034106564A for ; Thu, 8 Mar 2012 11:35:57 +0000 (UTC) (envelope-from martin@lispworks.com) Received: from lwfs1-cam.cam.lispworks.com (mail.lispworks.com [193.34.186.230]) by mx1.freebsd.org (Postfix) with ESMTP id EE9A68FC13 for ; Thu, 8 Mar 2012 11:35:56 +0000 (UTC) Received: from higson.cam.lispworks.com (higson [192.168.1.7]) by lwfs1-cam.cam.lispworks.com (8.14.3/8.14.3) with ESMTP id q28BZmGh077839; Thu, 8 Mar 2012 11:35:48 GMT (envelope-from martin@lispworks.com) Received: from higson.cam.lispworks.com (localhost.localdomain [127.0.0.1]) by higson.cam.lispworks.com (8.14.4) id q28BZmTn027239; Thu, 8 Mar 2012 11:35:48 GMT Received: (from martin@localhost) by higson.cam.lispworks.com (8.14.4/8.14.4/Submit) id q28BZmVJ027236; Thu, 8 Mar 2012 11:35:48 GMT Date: Thu, 8 Mar 2012 11:35:48 GMT Message-Id: <201203081135.q28BZmVJ027236@higson.cam.lispworks.com> From: Martin Simmons To: freebsd-fs@freebsd.org In-reply-to: <4F583C63.60809@contactlab.com> (davide.damico@contactlab.com) References: <4F583C63.60809@contactlab.com> Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 11:35:57 -0000 >>>>> On Thu, 08 Mar 2012 05:58:11 +0100, Davide D'Amico said: > > Il 08/03/12 00:02, Freddie Cash ha scritto: > > On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico > > wrote: > >> Hi, I have a server DELL R210 with two sata drives: > >> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > >> ada0: ATA-8 SATA 2.x device > >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > >> ada0: Command Queueing enabled > >> ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) > >> ada0: Previously was known as ad4 > >> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > >> ada1: ATA-8 SATA 2.x device > >> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > >> ada1: Command Queueing enabled > >> ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) > >> ada1: Previously was known as ad6 > >> > >> Previously (8.x) - when I didn't use a hw raid - I used to install freebsd > >> in a drive (i.e. ad4), boot from it and then: > >> > >> # sysctl kern.geom.debugflags=16 > >> # gmirror label -v -b round-robin data ad4 > >> # gmirror insert data ad6 > >> // Modification to /etc/fstab > >> // reboot > >> > >> How could accomplish to the same task with 9.0-RELEASE? > > http://people.freebsd.org/~rse/mirror/ > > > Hi Freddie, and thanks for your link. > > I followed that procedure until 9.0-RELEASE but the new installer uses > GPT as the default partition schema, which seems incompatible with gmirror. > > I noticed in the handbook > (http://www.freebsd.org/doc/handbook/geom-mirror.html): > "The following procedure is also incompatible with the default > installation settings of FreeBSD 9./X/which use the newGPTpartition > scheme. GEOM will overwriteGPTmetadata, causing data loss and possibly > an unbootable system." > > So I think that the old procedure is no more useful. I think the problem with the old procedure is that it mirrors the whole disk. This fails because both gmirror and GPT store data in the last few blocks of the disk. If you create gmirror(s) at the GPT partition level instead, then it should be safe. __Martin From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 12:10:42 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 652FD1065675 for ; Thu, 8 Mar 2012 12:10:42 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id A0E938FC18 for ; Thu, 8 Mar 2012 12:10:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; d=contactlab.com; s=s768; c=relaxed/relaxed; q=dns/txt; i=@contactlab.com; t=1331208640; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=dgXEWg2tZNl1dMTN4TzPe2loD3M=; b=aV1igVNnbt7guLYWn7Cs+jwcrXRTb5/m4kAHSRRG6WP95fFy73B/I1fG2bC68vKY FLozqaZhMCw9cgaoS+4PU11wjWaTGUvsPvM0Ugnzjzbc+m4TvuYJoTnmy3KbSP4b; Received: from [213.92.90.12] ([213.92.90.12:11876] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.2.3.43244 r(43244)) with ESMTP id 39/25-28515-0C1A85F4; Thu, 08 Mar 2012 13:10:40 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S5cB5-000G30-UD for freebsd-fs@freebsd.org; Thu, 08 Mar 2012 13:10:40 +0100 Received: (qmail 61688 invoked by uid 89); 8 Mar 2012 12:10:39 -0000 Received: from fast.tomato.it (HELO davepro.local) (62.101.64.91) by mx3-master.housing.tomato.lan with SMTP; 8 Mar 2012 12:10:39 -0000 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: <20120308120032.C9FF11065672@hub.freebsd.org> Date: Thu, 08 Mar 2012 13:10:38 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Davide D'Amico" Organization: ContactLab Message-ID: In-Reply-To: <20120308120032.C9FF11065672@hub.freebsd.org> User-Agent: Opera Mail/11.61 (MacIntel) Subject: Re: freebsd-fs Digest, Vol 455, Issue 4 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 12:10:42 -0000 In data 08 marzo 2012 alle ore 13:00:32, ha scritto: > ------------------------------ > Message: 7 > Date: Thu, 8 Mar 2012 11:35:48 GMT > From: Martin Simmons > Subject: Re: FreeBSD 9 and gmirror /geom raid > To: freebsd-fs@freebsd.org > Message-ID: <201203081135.q28BZmVJ027236@higson.cam.lispworks.com> > >>>>>> On Thu, 08 Mar 2012 05:58:11 +0100, Davide D'Amico said: >> >> Il 08/03/12 00:02, Freddie Cash ha scritto: >> > On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico >> > wrote: >> >> Hi, I have a server DELL R210 with two sata drives: >> >> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >> >> ada0: ATA-8 SATA 2.x device >> >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> >> ada0: Command Queueing enabled >> >> ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >> >> ada0: Previously was known as ad4 >> >> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 >> >> ada1: ATA-8 SATA 2.x device >> >> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >> >> ada1: Command Queueing enabled >> >> ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >> >> ada1: Previously was known as ad6 >> >> >> >> Previously (8.x) - when I didn't use a hw raid - I used to install >> freebsd >> >> in a drive (i.e. ad4), boot from it and then: >> >> >> >> # sysctl kern.geom.debugflags=16 >> >> # gmirror label -v -b round-robin data ad4 >> >> # gmirror insert data ad6 >> >> // Modification to /etc/fstab >> >> // reboot >> >> >> >> How could accomplish to the same task with 9.0-RELEASE? >> > http://people.freebsd.org/~rse/mirror/ >> > >> Hi Freddie, and thanks for your link. >> >> I followed that procedure until 9.0-RELEASE but the new installer uses >> GPT as the default partition schema, which seems incompatible with >> gmirror. >> >> I noticed in the handbook >> (http://www.freebsd.org/doc/handbook/geom-mirror.html): >> "The following procedure is also incompatible with the default >> installation settings of FreeBSD 9./X/which use the newGPTpartition >> scheme. GEOM will overwriteGPTmetadata, causing data loss and possibly >> an unbootable system." >> >> So I think that the old procedure is no more useful. > I think the problem with the old procedure is that it mirrors the whole > disk. > This fails because both gmirror and GPT store data in the last few > blocks of > the disk. > If you create gmirror(s) at the GPT partition level instead, then it > should be > safe. Yes, indeed but I would continue to mirror the entire disk, rather than single partitions. d. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 13:13:56 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 21F4D106564A for ; Thu, 8 Mar 2012 13:13:56 +0000 (UTC) (envelope-from rainer@ultra-secure.de) Received: from mail.ultra-secure.de (mail.ultra-secure.de [78.47.114.122]) by mx1.freebsd.org (Postfix) with ESMTP id 708C58FC13 for ; Thu, 8 Mar 2012 13:13:54 +0000 (UTC) Received: (qmail 31583 invoked by uid 89); 8 Mar 2012 13:13:48 -0000 Received: by simscan 1.4.0 ppid: 31575, pid: 31580, t: 0.5370s scanners: attach: 1.4.0 clamav: 0.97.3/m:54/d:14611 Received: from unknown (HELO suse2.iptech.internal) (rainer@ultra-secure.de@212.71.117.70) by mail.ultra-secure.de with ESMTPA; 8 Mar 2012 13:13:48 -0000 Date: Thu, 8 Mar 2012 14:13:47 +0100 From: Rainer Duffner To: Martin Simmons Message-ID: <20120308141347.50b06ea5@suse2.iptech.internal> In-Reply-To: <201203081135.q28BZmVJ027236@higson.cam.lispworks.com> References: <4F583C63.60809@contactlab.com> <201203081135.q28BZmVJ027236@higson.cam.lispworks.com> X-Mailer: Claws Mail 3.7.10 (GTK+ 2.22.1; x86_64-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 13:13:56 -0000 Am Thu, 8 Mar 2012 11:35:48 GMT schrieb Martin Simmons : > >>>>> On Thu, 08 Mar 2012 05:58:11 +0100, Davide D'Amico said: > > > > Il 08/03/12 00:02, Freddie Cash ha scritto: > > > On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico > > > wrote: > > >> Hi, I have a server DELL R210 with two sata drives: > > >> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 > > >> ada0: ATA-8 SATA 2.x device > > >> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > > >> ada0: Command Queueing enabled > > >> ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) > > >> ada0: Previously was known as ad4 > > >> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 > > >> ada1: ATA-8 SATA 2.x device > > >> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) > > >> ada1: Command Queueing enabled > > >> ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) > > >> ada1: Previously was known as ad6 > > >> > > >> Previously (8.x) - when I didn't use a hw raid - I used to > > >> install freebsd in a drive (i.e. ad4), boot from it and then: > > >> > > >> # sysctl kern.geom.debugflags=16 > > >> # gmirror label -v -b round-robin data ad4 > > >> # gmirror insert data ad6 > > >> // Modification to /etc/fstab > > >> // reboot > > >> > > >> How could accomplish to the same task with 9.0-RELEASE? > > > http://people.freebsd.org/~rse/mirror/ > > > > > Hi Freddie, and thanks for your link. > > > > I followed that procedure until 9.0-RELEASE but the new installer > > uses GPT as the default partition schema, which seems incompatible > > with gmirror. > > > > I noticed in the handbook > > (http://www.freebsd.org/doc/handbook/geom-mirror.html): > > "The following procedure is also incompatible with the default > > installation settings of FreeBSD 9./X/which use the newGPTpartition > > scheme. GEOM will overwriteGPTmetadata, causing data loss and > > possibly an unbootable system." > > > > So I think that the old procedure is no more useful. > > I think the problem with the old procedure is that it mirrors the > whole disk. This fails because both gmirror and GPT store data in the > last few blocks of the disk. > > If you create gmirror(s) at the GPT partition level instead, then it > should be safe. maybe like this: http://blather.michaelwlucas.com/archives/1071 Haven't tried it, but I may need it at some point... From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 15:50:04 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E3C0A1065676 for ; Thu, 8 Mar 2012 15:50:04 +0000 (UTC) (envelope-from xi@borderworlds.dk) Received: from kazon.borderworlds.dk (kazon.borderworlds.dk [IPv6:2a01:4f8:101:4201::1:1]) by mx1.freebsd.org (Postfix) with ESMTP id 780838FC0A for ; Thu, 8 Mar 2012 15:50:04 +0000 (UTC) Received: from borg.borderworlds.dk (localhost [127.0.0.1]) by kazon.borderworlds.dk (Postfix) with ESMTP id 2175C5C3B for ; Thu, 8 Mar 2012 16:49:56 +0100 (CET) Message-ID: <4F58D523.2070100@borderworlds.dk> Date: Thu, 08 Mar 2012 16:49:55 +0100 From: Christian Laursen Organization: The Border Worlds User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.2) Gecko/20120302 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 15:50:05 -0000 On 03/07/12 15:14, Davide D'Amico wrote: > Hi, I have a server DELL R210 with two sata drives: > ada0: ATA-8 SATA 2.x device > ada1: ATA-8 SATA 2.x device > > Previously (8.x) - when I didn't use a hw raid - I used to install > freebsd in a drive (i.e. ad4), boot from it and then: > > # sysctl kern.geom.debugflags=16 > # gmirror label -v -b round-robin data ad4 > # gmirror insert data ad6 > // Modification to /etc/fstab > // reboot > > How could accomplish to the same task with 9.0-RELEASE? I wrote a short howto about doing that. By following that you get a gmirror covering the whole disks with GPT partitioning inside. http://borderworlds.dk/notes/gmirror.html -- Christian Laursen From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 16:01:39 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 33DA21065670 for ; Thu, 8 Mar 2012 16:01:39 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.mail.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id DDD408FC0C for ; Thu, 8 Mar 2012 16:01:38 +0000 (UTC) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AqAEAF7XWE+DaFvO/2dsb2JhbABDFoUfsHCCCgEBAQQBAQEgKyALGxIGAgINGQIpAQkYDgYIBwQBGgIEh2kLqCWSLYEviGQVhTCBFgSIUopLgiiQGIMBgT4 X-IronPort-AV: E=Sophos;i="4.73,552,1325480400"; d="scan'208";a="159585448" Received: from erie.cs.uoguelph.ca (HELO zcs3.mail.uoguelph.ca) ([131.104.91.206]) by esa-annu-pri.mail.uoguelph.ca with ESMTP; 08 Mar 2012 11:01:32 -0500 Received: from zcs3.mail.uoguelph.ca (localhost.localdomain [127.0.0.1]) by zcs3.mail.uoguelph.ca (Postfix) with ESMTP id 6CC00B3F67; Thu, 8 Mar 2012 11:01:32 -0500 (EST) Date: Thu, 8 Mar 2012 11:01:32 -0500 (EST) From: Rick Macklem To: Peter Maloney Message-ID: <1116960020.624724.1331222492424.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <4F5861B7.7010201@brockmann-consult.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.91.202] X-Mailer: Zimbra 6.0.10_GA_2692 (ZimbraWebClient - FF3.0 (Win)/6.0.10_GA_2692) Cc: freebsd-fs@freebsd.org Subject: Re: Deadlock (?) with ZFS, NFS X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 16:01:39 -0000 Peter Maloney wrote: > On 03/08/2012 07:02 AM, Garrett Wollman wrote: > > This is a 9.0-RELEASE system with the mps driver backported from > > 9-stable. Hourly and daily snapshots were enabled. It had been > > working extremely well up to this point, and we were looking at > > possibly replacing our existing NFS servers with this architecture. > On one system, (haven't tried it lately), it will hang a single > dataset > if a Linux *client* mounts the zfs dataset and does: > > cd /mount/point > ls .zfs/snapshot > > So try that and see if it reproduces the problem. > > Setting snapdir=hidden doesn't prevent accessing it, only hides it > from > "ls -a" output. > > The problem did not occur back when I had few or no snapshots. It also > doesn't happen on the replicated backup server, with all the same > software, data and snapshots. > > So far, my hack solution is to mount /var/empty on top of every .zfs > directory on the client side. Another idea is to never export a whole > dataset from the root of it, because that is the only place that > contains the .zfs directory, other than if you have subdatasets inside > that one. > There was a patch specifically for readdirplus, where it avoids doing a VFS_VGET() when EOPNOTSUPP is replied by VFS_VGET(). This apparently happens for ZFS snapshots. The patch went into head as r220507 almost a year ago (Apr. 9, 2011), so Garrett will have it, but you might not? Also, note that, if all your clients are FreeBSD and none of the mounts specify the "rdirplus" mount option, the patch isn't relevant, since the clients will never do a ReaddirPlus RPC. Although I am highly doubtful that it would be the cause of the above hang, you should apply this patch, which went into head recently: http://people.freebsd.org/~rmacklem/nfsd-enoent.patch Without it, Lookup RPCs are being done with ni_topdir uninitialized. For NFSv4 mounts to a UFS volume, it resulted in spurious ENOENT replies to Lookup. Since the ZFS code doesn't appear to use ni_topdir, I really doubt it would cause a hang, but strange things can occur when variables aren't properly initialized and the patch should be safe to use. Good luck with it. I don't know anything about ZFS, so I can't really help much, rick > > > -- > > -------------------------------------------- > Peter Maloney > Brockmann Consult > Max-Planck-Str. 2 > 21502 Geesthacht > Germany > Tel: +49 4152 889 300 > Fax: +49 4152 889 333 > E-mail: peter.maloney@brockmann-consult.de > Internet: http://www.brockmann-consult.de > -------------------------------------------- > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 16:27:31 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2E250106564A for ; Thu, 8 Mar 2012 16:27:31 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-vw0-f54.google.com (mail-vw0-f54.google.com [209.85.212.54]) by mx1.freebsd.org (Postfix) with ESMTP id D302D8FC0C for ; Thu, 8 Mar 2012 16:27:30 +0000 (UTC) Received: by vbmv11 with SMTP id v11so631210vbm.13 for ; Thu, 08 Mar 2012 08:27:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=cLDaETpyIdKKY2Dmh9kPtCM6hEASqDlNI3l+9krApfY=; b=Okit5DvBJUjYdKeyFNhKikXsoro1p/vWFnCg++8TWce7SuEyPhkfbKzgzch+nDGpKa lNmYSv0z0Hkc0eLnYEcdKsWQoKr3Esd+I9wEG/bwCkTcQe3TFJePzbJX5HRoG9h2+mxE g1jLJThykjFkh8X/rk+ew+bszo3ZZR3iKNyO73tmj4gtz5dXbJQF6yMF313j63rpToFp ef0SHvjq5R7DsRgqeDEGpEhONWD6awplMrFgbhlDKYbq9iwZeyKhGyc7SJ71+tEmz0pM WzPtg9gsFB6as4dQ4Zf06UKHJmCiBdQug+Qj8cCoCBb4uIszU+wUIwL2iHF7eBb6zNca 2FNg== MIME-Version: 1.0 Received: by 10.52.25.107 with SMTP id b11mr10969768vdg.37.1331224050220; Thu, 08 Mar 2012 08:27:30 -0800 (PST) Received: by 10.220.178.74 with HTTP; Thu, 8 Mar 2012 08:27:30 -0800 (PST) In-Reply-To: <4F583C63.60809@contactlab.com> References: <4F583C63.60809@contactlab.com> Date: Thu, 8 Mar 2012 08:27:30 -0800 Message-ID: From: Freddie Cash To: "Davide D'Amico" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 16:27:31 -0000 On Wed, Mar 7, 2012 at 8:58 PM, Davide D'Amico wrote: > Il 08/03/12 00:02, Freddie Cash ha scritto: >> >> On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico >> =C2=A0wrote: >>> >>> Hi, I have a server DELL R210 with two sata drives: >>> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >>> ada0: =C2=A0ATA-8 SATA 2.x device >>> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>> ada0: Command Queueing enabled >>> ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >>> ada0: Previously was known as ad4 >>> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 >>> ada1: =C2=A0ATA-8 SATA 2.x device >>> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>> ada1: Command Queueing enabled >>> ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >>> ada1: Previously was known as ad6 >>> >>> Previously (8.x) - when I didn't use a hw raid - I used to install >>> freebsd >>> in a drive (i.e. ad4), boot from it and then: >>> >>> # sysctl kern.geom.debugflags=3D16 >>> # gmirror label -v -b round-robin data ad4 >>> # gmirror insert data ad6 >>> // Modification to /etc/fstab >>> // reboot >>> >>> How could accomplish to the same task with 9.0-RELEASE? >> >> http://people.freebsd.org/~rse/mirror/ >> > Hi Freddie, and thanks for your link. > > I followed that procedure until 9.0-RELEASE but the new installer uses GP= T > as the default partition schema, which seems incompatible with gmirror. > > I noticed in the handbook > (http://www.freebsd.org/doc/handbook/geom-mirror.html): > "The following procedure is also incompatible with the default installati= on > settings of FreeBSD 9./X/which use the newGPTpartition scheme. GEOM will > overwriteGPTmetadata, causing data loss and possibly an unbootable system= ." > > So I think that the old procedure is no more useful. There's nothing preventing you from using MBR partitioning with BSD labels, just like in previous releases. Especially since you only have 250 GB harddrives (GPT is only required for disks over 2 TB). If your primary goal is to have mirrored disks, then your best setup is to gmirror the disks, MBR partition the gm0 device, then BSD label the slices. There's nothing carved in stone anywhere that requires you to take the defaults as gospel. :) --=20 Freddie Cash fjwcash@gmail.com From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 16:34:06 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8159A106564A for ; Thu, 8 Mar 2012 16:34:06 +0000 (UTC) (envelope-from davide.damico@contactlab.com) Received: from mail2.shared.smtp.contactlab.it (mail2.shared.smtp.contactlab.it [93.94.37.7]) by mx1.freebsd.org (Postfix) with ESMTP id C56968FC15 for ; Thu, 8 Mar 2012 16:34:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; d=contactlab.com; s=s768; c=relaxed/relaxed; q=dns/txt; i=@contactlab.com; t=1331224438; h=From:Subject:Date:To:MIME-Version:Content-Type; bh=X9PzwD8Va99n7CFa0Ri3z5VkN/Y=; b=k2oGDXYdDvqwPPkSqzgbdZfDB08HYoAhZe7kEy7125oBmTrh2iyCR7QmtJS6iBEm p/6CNDNt+VMdW2WuCAxog5Ibu3CrUo90y/c/LUend0QWtC+h+HvR2kXQ7peLVjCp; Received: from [213.92.90.12] ([213.92.90.12:63970] helo=mail3.tomato.it) by t.contactlab.it (envelope-from ) (ecelerity 3.2.3.43244 r(43244)) with ESMTP id FE/C7-28515-57FD85F4; Thu, 08 Mar 2012 17:33:58 +0100 Received: from mx3-master.housing.tomato.lan ([172.16.7.55]) by mail3.tomato.it with smtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S5gHt-0001tq-S7 for freebsd-fs@freebsd.org; Thu, 08 Mar 2012 17:33:57 +0100 Received: (qmail 7302 invoked by uid 89); 8 Mar 2012 16:33:57 -0000 Received: from fast.tomato.it (HELO davepro.local) (62.101.64.91) by mx3-master.housing.tomato.lan with SMTP; 8 Mar 2012 16:33:57 -0000 Content-Type: text/plain; charset=iso-8859-15; format=flowed; delsp=yes To: "Freddie Cash" References: <4F583C63.60809@contactlab.com> Date: Thu, 08 Mar 2012 17:33:56 +0100 MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: "Davide D'Amico" Organization: ContactLab Message-ID: In-Reply-To: User-Agent: Opera Mail/11.61 (MacIntel) Cc: freebsd-fs@freebsd.org Subject: Re: FreeBSD 9 and gmirror /geom raid X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 16:34:06 -0000 In data 08 marzo 2012 alle ore 17:27:30, Freddie Cash ha scritto: > On Wed, Mar 7, 2012 at 8:58 PM, Davide D'Amico > wrote: >> Il 08/03/12 00:02, Freddie Cash ha scritto: >>> >>> On Wed, Mar 7, 2012 at 6:14 AM, Davide D'Amico >>> wrote: >>>> >>>> Hi, I have a server DELL R210 with two sata drives: >>>> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 >>>> ada0: ATA-8 SATA 2.x device >>>> ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>>> ada0: Command Queueing enabled >>>> ada0: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >>>> ada0: Previously was known as ad4 >>>> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 >>>> ada1: ATA-8 SATA 2.x device >>>> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) >>>> ada1: Command Queueing enabled >>>> ada1: 238418MB (488281250 512 byte sectors: 16H 63S/T 16383C) >>>> ada1: Previously was known as ad6 >>>> >>>> Previously (8.x) - when I didn't use a hw raid - I used to install >>>> freebsd >>>> in a drive (i.e. ad4), boot from it and then: >>>> >>>> # sysctl kern.geom.debugflags=16 >>>> # gmirror label -v -b round-robin data ad4 >>>> # gmirror insert data ad6 >>>> // Modification to /etc/fstab >>>> // reboot >>>> >>>> How could accomplish to the same task with 9.0-RELEASE? >>> >>> http://people.freebsd.org/~rse/mirror/ >>> >> Hi Freddie, and thanks for your link. >> >> I followed that procedure until 9.0-RELEASE but the new installer uses >> GPT >> as the default partition schema, which seems incompatible with gmirror. >> >> I noticed in the handbook >> (http://www.freebsd.org/doc/handbook/geom-mirror.html): >> "The following procedure is also incompatible with the default >> installation >> settings of FreeBSD 9./X/which use the newGPTpartition scheme. GEOM will >> overwriteGPTmetadata, causing data loss and possibly an unbootable >> system." >> >> So I think that the old procedure is no more useful. > > There's nothing preventing you from using MBR partitioning with BSD > labels, just like in previous releases. Especially since you only > have 250 GB harddrives (GPT is only required for disks over 2 TB). > > If your primary goal is to have mirrored disks, then your best setup > is to gmirror the disks, MBR partition the gm0 device, then BSD label > the slices. > > There's nothing carved in stone anywhere that requires you to take the > defaults as gospel. :) Ok, thanks. d. From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 16:43:17 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7DC401065672 for ; Thu, 8 Mar 2012 16:43:17 +0000 (UTC) (envelope-from drue@therub.org) Received: from mail-iy0-f182.google.com (mail-iy0-f182.google.com [209.85.210.182]) by mx1.freebsd.org (Postfix) with ESMTP id 4BC3B8FC19 for ; Thu, 8 Mar 2012 16:43:17 +0000 (UTC) Received: by iahk25 with SMTP id k25so1204964iah.13 for ; Thu, 08 Mar 2012 08:43:17 -0800 (PST) Received: by 10.50.184.199 with SMTP id ew7mr7599732igc.37.1331223585819; Thu, 08 Mar 2012 08:19:45 -0800 (PST) Received: from therub.org (173-8-105-230-Minnesota.hfc.comcastbusiness.net. [173.8.105.230]) by mx.google.com with ESMTPS id mk10sm14875354igc.4.2012.03.08.08.19.43 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 08 Mar 2012 08:19:44 -0800 (PST) Date: Thu, 8 Mar 2012 10:19:41 -0600 From: Dan Rue To: freebsd-fs@freebsd.org Message-ID: <20120308161940.GA71851@therub.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) X-Gm-Message-State: ALoCoQkqmVI3P00rv+vR0GOCSeSwqnBirKhkZ0/TkR4cPejzGDEQ2i3ZlchDnUE37/hXVsrFHWIg Subject: ZFS and mdconfig -t vnode - Unexpected behavior X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 16:43:17 -0000 I have discovered an unexpected behavior when using ZFS against a vnode. When running mdconfig -d against a vnode, there is a long delay before the blocks are fully written to the backing store file. Consider the following test script: #!/bin/sh rm foo dd if=/dev/zero of=foo bs=4096 count=0 seek=1955584 du -k foo md=`mdconfig -a -t vnode -f foo` dd if=/dev/random of=/dev/${md} bs=4096 count=100 mdconfig -d -u $md while [ 1 ]; do fsync foo du -k foo blocks=`du -k foo | awk '{print $1; }'` if [ $blocks -gt 100 ]; then exit fi sleep 1 done A good run looks like this (UFS): # time sh t.sh 0+0 records in 0+0 records out 0 bytes transferred in 0.000017 secs (0 bytes/sec) 48 foo 100+0 records in 100+0 records out 409600 bytes transferred in 0.179925 secs (2276505 bytes/sec) 464 foo real 0m0.203s user 0m0.001s sys 0m0.018s A bad run looks like this (ZFS): # time sh t.sh 0+0 records in 0+0 records out 0 bytes transferred in 0.000015 secs (0 bytes/sec) 1 foo 100+0 records in 100+0 records out 409600 bytes transferred in 0.089111 secs (4596522 bytes/sec) 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 1 foo 515 foo real 0m27.370s user 0m0.009s Under ZFS, it can take as long as 30 seconds before the block size as reported by du -k has been updated. The fsync appears to be a noop. Under UFS, du -k shows the correct block size every time, immediately after mdconfig -d. This is the expected behavior. This has been tested against ZFS on FreeBSD 8.1, 8.2, and 9 stable, in several different environments. Are there any ZFS tunables that could be related to this? What could be the cause of this behavior? Thanks, drue From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 17:50:52 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BC197106566B for ; Thu, 8 Mar 2012 17:50:52 +0000 (UTC) (envelope-from tevans.uk@googlemail.com) Received: from mail-vx0-f182.google.com (mail-vx0-f182.google.com [209.85.220.182]) by mx1.freebsd.org (Postfix) with ESMTP id 745898FC0C for ; Thu, 8 Mar 2012 17:50:52 +0000 (UTC) Received: by vcmm1 with SMTP id m1so730761vcm.13 for ; Thu, 08 Mar 2012 09:50:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=IanDw0v+VwiK0xMtP4SdTaN3Bbw89Qls+BaUTDPdRDs=; b=F/60HKCkCiaDTbQ2ulkmA9yDIAN+M7tYGrpG4Ds6sxrBRI0tzGwEhJtMIffIuBTCs1 mcsJ05rDDVcfW2S7hQJozIiRJCIp6azzm0y2MFSALueClTRlEXWoCyuNspiwo1404G28 fuER0G3sdsN75SKJiauZno1DEizProqJIv+iAUTGQD4TXdj6SPzvspXB+8BxLzZMT2j2 Y82tlCfP0GBblC4AjIlglhLYRh8b/iIqX4T1S+64txPXw7IgbrEwA9310hVLvKjpt/Vu iQJ+ahC9ahYF/ThYV2q5C9GMmHPJWZVhAonWe9z7cxgG/2t7iUa2fApyOsuKD8VMGttb gMdQ== MIME-Version: 1.0 Received: by 10.52.93.138 with SMTP id cu10mr11370723vdb.86.1331229051687; Thu, 08 Mar 2012 09:50:51 -0800 (PST) Received: by 10.52.110.100 with HTTP; Thu, 8 Mar 2012 09:50:51 -0800 (PST) In-Reply-To: <20120308161940.GA71851@therub.org> References: <20120308161940.GA71851@therub.org> Date: Thu, 8 Mar 2012 17:50:51 +0000 Message-ID: From: Tom Evans To: Dan Rue Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and mdconfig -t vnode - Unexpected behavior X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 17:50:52 -0000 On Thu, Mar 8, 2012 at 4:19 PM, Dan Rue wrote: > I have discovered an unexpected behavior when using ZFS against a vnode. > When running mdconfig -d against a vnode, there is a long delay before > the blocks are fully written to the backing store file. Consider the > following test script: > > [=E2=80=A6] > > Are there any ZFS tunables that could be related to this? What could be > the cause of this behavior? > The tunable vfs.zfs.txg.timeout (in seconds) controls how bursty ZFS is. I run with a high value (30), so my disks don't get constantly hit, but if you lowered it, data will be written much more frequently. Cheers Tom From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 20:39:10 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 980691065675; Thu, 8 Mar 2012 20:39:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7044F8FC14; Thu, 8 Mar 2012 20:39:10 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 2933446B95; Thu, 8 Mar 2012 15:39:10 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 88355B965; Thu, 8 Mar 2012 15:39:09 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Date: Thu, 8 Mar 2012 15:39:07 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201203071318.08241.jhb@freebsd.org> In-Reply-To: <201203071318.08241.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201203081539.07711.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 08 Mar 2012 15:39:09 -0500 (EST) Cc: pho@freebsd.org, kib@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 20:39:10 -0000 On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > So I ran into this problem at work. Suppose you have a process that opens a > read-write file descriptor with O_EXLOCK (so it has an flock()). It then > writes out a binary into that file. Another process wants to execve() the > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and > will call execve() once it has locked the file. In theory, what should happen > is that the second process should wait until the first process has finished > and called close(). In practice what happens is that I occasionally see the > second process fail with ETXTBUSY. > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file > separately from the call to vn_close() which drops the writecount. Thus, the > second process can do an open() and flock() of the file and subsequently call > execve() after the first process has done the VOP_ADVLOCK(), but before it > calls into vn_close(). In fact, since vn_close() requires a write lock on the > vnode, this turns out to not be too hard to reproduce at all. Below is a > simple test program that reproduces this constantly. To use, copy /bin/test > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run > ./flock_close_race /tmp/foo. > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock > until after vn_close() executes. However, even with that fix applied, my test > case still fails. Now it is because open() with a given lock flag is > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' > process has the fd locked, the writecount can still be bumped. One gross hack > would be to defer the bump of the writecount to the caller of vn_open() if the > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus > it doesn't actually work. I ended up moving acquiring the lock into > vn_open_cred(). The current patch I'm testing has both of these approaches, > but the first one is #if 0'd out, and the second is #if 1'd. > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch Based on some feedback from Konstantin, I've fixed some issues in the failure path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned above, so the patch is now the actual change that I'm testing. So far it handles both my workload at work and my test program without any issues. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 20:39:10 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 980691065675; Thu, 8 Mar 2012 20:39:10 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 7044F8FC14; Thu, 8 Mar 2012 20:39:10 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id 2933446B95; Thu, 8 Mar 2012 15:39:10 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 88355B965; Thu, 8 Mar 2012 15:39:09 -0500 (EST) From: John Baldwin To: freebsd-fs@freebsd.org Date: Thu, 8 Mar 2012 15:39:07 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201203071318.08241.jhb@freebsd.org> In-Reply-To: <201203071318.08241.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <201203081539.07711.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Thu, 08 Mar 2012 15:39:09 -0500 (EST) Cc: pho@freebsd.org, kib@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 20:39:10 -0000 On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > So I ran into this problem at work. Suppose you have a process that opens a > read-write file descriptor with O_EXLOCK (so it has an flock()). It then > writes out a binary into that file. Another process wants to execve() the > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and > will call execve() once it has locked the file. In theory, what should happen > is that the second process should wait until the first process has finished > and called close(). In practice what happens is that I occasionally see the > second process fail with ETXTBUSY. > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file > separately from the call to vn_close() which drops the writecount. Thus, the > second process can do an open() and flock() of the file and subsequently call > execve() after the first process has done the VOP_ADVLOCK(), but before it > calls into vn_close(). In fact, since vn_close() requires a write lock on the > vnode, this turns out to not be too hard to reproduce at all. Below is a > simple test program that reproduces this constantly. To use, copy /bin/test > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run > ./flock_close_race /tmp/foo. > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock > until after vn_close() executes. However, even with that fix applied, my test > case still fails. Now it is because open() with a given lock flag is > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' > process has the fd locked, the writecount can still be bumped. One gross hack > would be to defer the bump of the writecount to the caller of vn_open() if the > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus > it doesn't actually work. I ended up moving acquiring the lock into > vn_open_cred(). The current patch I'm testing has both of these approaches, > but the first one is #if 0'd out, and the second is #if 1'd. > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch Based on some feedback from Konstantin, I've fixed some issues in the failure path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned above, so the patch is now the actual change that I'm testing. So far it handles both my workload at work and my test program without any issues. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 22:35:25 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2D4E21065673 for ; Thu, 8 Mar 2012 22:35:25 +0000 (UTC) (envelope-from bfriesen@simple.dallas.tx.us) Received: from blade.simplesystems.org (blade.simplesystems.org [65.66.246.74]) by mx1.freebsd.org (Postfix) with ESMTP id E94358FC08 for ; Thu, 8 Mar 2012 22:35:24 +0000 (UTC) Received: from freddy.simplesystems.org (freddy.simplesystems.org [65.66.246.65]) by blade.simplesystems.org (8.14.4+Sun/8.14.4) with ESMTP id q28MZ0pX007783; Thu, 8 Mar 2012 16:35:00 -0600 (CST) Date: Thu, 8 Mar 2012 16:35:00 -0600 (CST) From: Bob Friesenhahn X-X-Sender: bfriesen@freddy.simplesystems.org To: Dan Rue In-Reply-To: <20120308161940.GA71851@therub.org> Message-ID: References: <20120308161940.GA71851@therub.org> User-Agent: Alpine 2.01 (GSO 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.2 (blade.simplesystems.org [65.66.246.90]); Thu, 08 Mar 2012 16:35:01 -0600 (CST) Cc: freebsd-fs@freebsd.org Subject: Re: ZFS and mdconfig -t vnode - Unexpected behavior X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 22:35:25 -0000 On Thu, 8 Mar 2012, Dan Rue wrote: > > Under ZFS, it can take as long as 30 seconds before the block size as > reported by du -k has been updated. The fsync appears to be a noop. Zfs under Solaris has the same behavior. This data is only assured to be available after the current zfs TXG has been synced, which may take a long time. It also becomes available after the 'sync' system call has completed (because the current TXG is flushed). > Are there any ZFS tunables that could be related to this? What could be > the cause of this behavior? You could adjust the tunings for zfs transaction groups but this will decrease system performance and increase pool fragmentation. Bob -- Bob Friesenhahn bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 22:39:24 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 48D35106564A; Thu, 8 Mar 2012 22:39:24 +0000 (UTC) (envelope-from gnn@FreeBSD.org) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 1968F8FC13; Thu, 8 Mar 2012 22:39:23 +0000 (UTC) Received: from [209.249.190.124] (helo=[10.2.212.229]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1S5kl5-0008Gq-P3; Thu, 08 Mar 2012 16:20:23 -0500 From: George Neville-Neil Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Thu, 8 Mar 2012 16:20:24 -0500 Message-Id: To: fs@freebsd.org Mime-Version: 1.0 (Apple Message framework v1257) X-Mailer: Apple Mail (2.1257) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - FreeBSD.org Cc: current@freebsd.org Subject: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 22:39:24 -0000 Howdy, I've taken the GSoC work done with the FUSE kernel module, and created a = patch against HEAD which I have now subjected to testing using tools/regression/fsx. The patch is here: http://people.freebsd.org/~gnn/head-fuse-1.diff I would like to commit this patch in the next few days, so, please, if = you care about this take a look and get back to me. Thanks, George From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 22:39:24 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id AF9781065675; Thu, 8 Mar 2012 22:39:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 3E8DA8FC14; Thu, 8 Mar 2012 22:39:23 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q28MdKbW082679; Fri, 9 Mar 2012 00:39:20 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q28MdJqR097363; Fri, 9 Mar 2012 00:39:19 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q28MdJHL097362; Fri, 9 Mar 2012 00:39:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 9 Mar 2012 00:39:19 +0200 From: Konstantin Belousov To: John Baldwin Message-ID: <20120308223919.GU75778@deviant.kiev.zoral.com.ua> References: <201203071318.08241.jhb@freebsd.org> <201203081539.07711.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DmaKRSoEg59HPfHz" Content-Disposition: inline In-Reply-To: <201203081539.07711.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, pho@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 22:39:24 -0000 --DmaKRSoEg59HPfHz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote: > On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > > So I ran into this problem at work. Suppose you have a process that op= ens a=20 > > read-write file descriptor with O_EXLOCK (so it has an flock()). It th= en=20 > > writes out a binary into that file. Another process wants to execve() = the=20 > > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK)= , and=20 > > will call execve() once it has locked the file. In theory, what should= happen=20 > > is that the second process should wait until the first process has fini= shed=20 > > and called close(). In practice what happens is that I occasionally se= e the=20 > > second process fail with ETXTBUSY. > >=20 > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the= file=20 > > separately from the call to vn_close() which drops the writecount. Thu= s, the=20 > > second process can do an open() and flock() of the file and subsequentl= y call > > execve() after the first process has done the VOP_ADVLOCK(), but before= it=20 > > calls into vn_close(). In fact, since vn_close() requires a write lock= on the=20 > > vnode, this turns out to not be too hard to reproduce at all. Below is= a=20 > > simple test program that reproduces this constantly. To use, copy /bin= /test=20 > > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), th= en run=20 > > ./flock_close_race /tmp/foo. > >=20 > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release t= he lock=20 > > until after vn_close() executes. However, even with that fix applied, = my test > > case still fails. Now it is because open() with a given lock flag is > > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writ= ecount > > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_= child'=20 > > process has the fd locked, the writecount can still be bumped. One gro= ss hack > > would be to defer the bump of the writecount to the caller of vn_open()= if the > > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge= , plus > > it doesn't actually work. I ended up moving acquiring the lock into=20 > > vn_open_cred(). The current patch I'm testing has both of these approa= ches, > > but the first one is #if 0'd out, and the second is #if 1'd. > >=20 > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch >=20 > Based on some feedback from Konstantin, I've fixed some issues in the fai= lure > path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code ment= ioned > above, so the patch is now the actual change that I'm testing. So far it > handles both my workload at work and my test program without any issues. I think a comment is needed for a reason to call vn_writechk() second time. Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHL= OCK case in the patched kernel ? --DmaKRSoEg59HPfHz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9ZNRUACgkQC3+MBN1Mb4g4iQCggp5r8WSezZNIVwxLO9/gp5v0 ZywAoPUpSUbOWJYiX/8EEcLzfhQEgQL7 =aoVo -----END PGP SIGNATURE----- --DmaKRSoEg59HPfHz-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 22:54:12 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 74D1A1065670; Thu, 8 Mar 2012 22:54:12 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id DEF638FC0A; Thu, 8 Mar 2012 22:54:11 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q28Ms7xT084087; Fri, 9 Mar 2012 00:54:07 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q28Ms7a4097442; Fri, 9 Mar 2012 00:54:07 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q28Ms71r097441; Fri, 9 Mar 2012 00:54:07 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 9 Mar 2012 00:54:07 +0200 From: Konstantin Belousov To: George Neville-Neil Message-ID: <20120308225407.GV75778@deviant.kiev.zoral.com.ua> References: Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xY23VHacxCa7rROP" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 22:54:12 -0000 --xY23VHacxCa7rROP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 08, 2012 at 04:20:24PM -0500, George Neville-Neil wrote: > Howdy, >=20 > I've taken the GSoC work done with the FUSE kernel module, and created > a patch against HEAD which I have now subjected to testing using > tools/regression/fsx. > > The patch is here: http://people.freebsd.org/~gnn/head-fuse-1.diff > > I would like to commit this patch in the next few days, so, please, if > you care about this take a look and get back to me. I just took a very quick look, and the code has all usual bugs. E.g., the filesystem is marked mpsafe, while insmntque() is performed before new vnode is initialized. The fuse was known to cause random kernel memory corruption, were the issues identified and fixed ? Who is going to maintain the code ? I once objected strongly for throwing the fuse into svn without first fixing bugs, and having a maintainer. --xY23VHacxCa7rROP Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9ZOI4ACgkQC3+MBN1Mb4iBvQCbBiWZWhTI8JrheJIDf48ZrEnF YG0Anjsa8jmoWzSeLEkbeRJFQVyA6azP =Ur65 -----END PGP SIGNATURE----- --xY23VHacxCa7rROP-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 23:19:01 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 84722106566B for ; Thu, 8 Mar 2012 23:19:01 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id 114B98FC1A for ; Thu, 8 Mar 2012 23:19:00 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q28MdKbW082679; Fri, 9 Mar 2012 00:39:20 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q28MdJqR097363; Fri, 9 Mar 2012 00:39:19 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q28MdJHL097362; Fri, 9 Mar 2012 00:39:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 9 Mar 2012 00:39:19 +0200 From: Konstantin Belousov To: John Baldwin Message-ID: <20120308223919.GU75778@deviant.kiev.zoral.com.ua> References: <201203071318.08241.jhb@freebsd.org> <201203081539.07711.jhb@freebsd.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DmaKRSoEg59HPfHz" Content-Disposition: inline In-Reply-To: <201203081539.07711.jhb@freebsd.org> User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: freebsd-fs@freebsd.org, pho@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 23:19:01 -0000 --DmaKRSoEg59HPfHz Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote: > On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > > So I ran into this problem at work. Suppose you have a process that op= ens a=20 > > read-write file descriptor with O_EXLOCK (so it has an flock()). It th= en=20 > > writes out a binary into that file. Another process wants to execve() = the=20 > > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK)= , and=20 > > will call execve() once it has locked the file. In theory, what should= happen=20 > > is that the second process should wait until the first process has fini= shed=20 > > and called close(). In practice what happens is that I occasionally se= e the=20 > > second process fail with ETXTBUSY. > >=20 > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the= file=20 > > separately from the call to vn_close() which drops the writecount. Thu= s, the=20 > > second process can do an open() and flock() of the file and subsequentl= y call > > execve() after the first process has done the VOP_ADVLOCK(), but before= it=20 > > calls into vn_close(). In fact, since vn_close() requires a write lock= on the=20 > > vnode, this turns out to not be too hard to reproduce at all. Below is= a=20 > > simple test program that reproduces this constantly. To use, copy /bin= /test=20 > > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), th= en run=20 > > ./flock_close_race /tmp/foo. > >=20 > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release t= he lock=20 > > until after vn_close() executes. However, even with that fix applied, = my test > > case still fails. Now it is because open() with a given lock flag is > > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writ= ecount > > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_= child'=20 > > process has the fd locked, the writecount can still be bumped. One gro= ss hack > > would be to defer the bump of the writecount to the caller of vn_open()= if the > > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge= , plus > > it doesn't actually work. I ended up moving acquiring the lock into=20 > > vn_open_cred(). The current patch I'm testing has both of these approa= ches, > > but the first one is #if 0'd out, and the second is #if 1'd. > >=20 > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch >=20 > Based on some feedback from Konstantin, I've fixed some issues in the fai= lure > path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code ment= ioned > above, so the patch is now the actual change that I'm testing. So far it > handles both my workload at work and my test program without any issues. I think a comment is needed for a reason to call vn_writechk() second time. Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHL= OCK case in the patched kernel ? --DmaKRSoEg59HPfHz Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9ZNRUACgkQC3+MBN1Mb4g4iQCggp5r8WSezZNIVwxLO9/gp5v0 ZywAoPUpSUbOWJYiX/8EEcLzfhQEgQL7 =aoVo -----END PGP SIGNATURE----- --DmaKRSoEg59HPfHz-- From owner-freebsd-fs@FreeBSD.ORG Thu Mar 8 23:54:24 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 80598106566C; Thu, 8 Mar 2012 23:54:24 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from mail.zoral.com.ua (mx0.zoral.com.ua [91.193.166.200]) by mx1.freebsd.org (Postfix) with ESMTP id DAA4A8FC17; Thu, 8 Mar 2012 23:54:23 +0000 (UTC) Received: from skuns.kiev.zoral.com.ua (localhost [127.0.0.1]) by mail.zoral.com.ua (8.14.2/8.14.2) with ESMTP id q28NsKDo089807; Fri, 9 Mar 2012 01:54:20 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: from deviant.kiev.zoral.com.ua (kostik@localhost [127.0.0.1]) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5) with ESMTP id q28NsJU9097787; Fri, 9 Mar 2012 01:54:19 +0200 (EET) (envelope-from kostikbel@gmail.com) Received: (from kostik@localhost) by deviant.kiev.zoral.com.ua (8.14.5/8.14.5/Submit) id q28NsJQW097786; Fri, 9 Mar 2012 01:54:19 +0200 (EET) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: deviant.kiev.zoral.com.ua: kostik set sender to kostikbel@gmail.com using -f Date: Fri, 9 Mar 2012 01:54:19 +0200 From: Konstantin Belousov To: Adrian Chadd Message-ID: <20120308235419.GX75778@deviant.kiev.zoral.com.ua> References: <20120308225407.GV75778@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="+kSsdclgAHMo6YVP" Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Virus-Scanned: clamav-milter 0.95.2 at skuns.kiev.zoral.com.ua X-Virus-Status: Clean X-Spam-Status: No, score=-4.0 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on skuns.kiev.zoral.com.ua Cc: current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 08 Mar 2012 23:54:24 -0000 --+kSsdclgAHMo6YVP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Mar 08, 2012 at 03:43:42PM -0800, Adrian Chadd wrote: > Hi, >=20 > Is there any reason why we shouldn't throw this into contrib/ and > treating it like a vendor thing? > Or is this explicitly not being treated as a vendor FS? No idea. This should be a decision of the person who imports the filesystem. >=20 > Other than that, sure, let's get it into the tree and then thrash it > until it's stable.. No, please fix at least the dreadful bugs before committing. We have enough dead and buggy filesystems in the tree already. The problem with fusefs code is the lack of maintainer, and not the location of the code. --+kSsdclgAHMo6YVP Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAk9ZRqsACgkQC3+MBN1Mb4g6lgCgnu1X77uFOKvQZTy5Gfe2UZ1k 3wcAoOZOFuYbvX9l4XE5/pcpwFovYW8y =Ohar -----END PGP SIGNATURE----- --+kSsdclgAHMo6YVP-- From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 00:10:07 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id B55DA106564A; Fri, 9 Mar 2012 00:10:07 +0000 (UTC) (envelope-from adrian.chadd@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id 7EC648FC14; Fri, 9 Mar 2012 00:10:07 +0000 (UTC) Received: by dald2 with SMTP id d2so1068508dal.13 for ; Thu, 08 Mar 2012 16:10:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=tHLxYbDOaA/H3vABgNCrXyPdDgt8ntKw/ZJLcWDtOjI=; b=iCbAvC5hChYD1YLuc/lSLHbPsnM7ELQIaXQy5AF8B4uJmi7jDEXlRuojtyaDvYj2ui Xl2iv1elOnwa84+Mcd7UgkJwLRWXDYuRaQ+tAxhBFVf+gbooAWpsHxsvUfLGJTjJyf3b dH0NLscvvdE7KuSiDIiSHaBCj9RvTFIL6Dp7CyZLA8aemmEEmOsTSNKpI63EyVgY1rZ/ B7Lawa2CbPjfNxjaasurTG8Z2lLJehkBmbQ7F3lHoPHOggj3YgXmtIr41MpH6OIpF0aL iH+HUzFAeiRFUMh/tELXQ371JMZ8MDj25VmYK29CBEtP9iY6C52QE7eaeqdRicQUh+iA C4pw== MIME-Version: 1.0 Received: by 10.68.232.162 with SMTP id tp2mr879599pbc.165.1331250222553; Thu, 08 Mar 2012 15:43:42 -0800 (PST) Sender: adrian.chadd@gmail.com Received: by 10.142.72.11 with HTTP; Thu, 8 Mar 2012 15:43:42 -0800 (PST) In-Reply-To: <20120308225407.GV75778@deviant.kiev.zoral.com.ua> References: <20120308225407.GV75778@deviant.kiev.zoral.com.ua> Date: Thu, 8 Mar 2012 15:43:42 -0800 X-Google-Sender-Auth: GwwJ6CnBROi3nP3g7f9l736A7RQ Message-ID: From: Adrian Chadd To: Konstantin Belousov Content-Type: text/plain; charset=ISO-8859-1 Cc: current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 00:10:07 -0000 Hi, Is there any reason why we shouldn't throw this into contrib/ and treating it like a vendor thing? Or is this explicitly not being treated as a vendor FS? Other than that, sure, let's get it into the tree and then thrash it until it's stable.. adrian From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 10:13:27 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 2BA71106566B; Fri, 9 Mar 2012 10:13:27 +0000 (UTC) (envelope-from cochard@gmail.com) Received: from mail-pz0-f54.google.com (mail-pz0-f54.google.com [209.85.210.54]) by mx1.freebsd.org (Postfix) with ESMTP id EF1978FC16; Fri, 9 Mar 2012 10:13:26 +0000 (UTC) Received: by dald2 with SMTP id d2so1485234dal.13 for ; Fri, 09 Mar 2012 02:13:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:x-google-sender-auth:message-id :subject:to:content-type; bh=4oE0VYu1eJ1tpGsN3EVYUL6cBYlhQmzZPceC+ewwB6o=; b=blcTeUcqrWVferAFUqlrhXsIOQ0Z0yBT+Rl1PWc2GlV5lLVSnU2Wm5ExU+uKXzXcyp Ufyfk8i9ihOq4w1ayJpAsvejLLagvThj5Yrr8fSHw/5MzowOjFNNEiO5VOorKWmYsrgB FyJPjjoRGTetfoYO8tmA+RMicQWk3Chvt8hj7g//3NHHv0fCe6Lap+0CA13r6zqg67pb vuXW2QbpvSA5m2eb1A23ZUfxFuTlo0xYXm4e8yFxI2/+lkAwltAIVcXxIWV19X4QyUzD 1wdzeKxReCt/v/4Kc65AzsS4JLwdm6X4azf/8jljFt6JpWMVPnZ3PdpDQzcGHTKUJfgE a2fQ== Received: by 10.68.239.195 with SMTP id vu3mr3522338pbc.49.1331286284295; Fri, 09 Mar 2012 01:44:44 -0800 (PST) MIME-Version: 1.0 Sender: cochard@gmail.com Received: by 10.143.58.4 with HTTP; Fri, 9 Mar 2012 01:44:24 -0800 (PST) From: =?ISO-8859-1?Q?Olivier_Cochard=2DLabb=E9?= Date: Fri, 9 Mar 2012 10:44:24 +0100 X-Google-Sender-Auth: pntLUA-N5Usc016rAObpnvFfHeI Message-ID: To: freebsd-current@freebsd.org, freebsd-fs@freebsd.org Content-Type: text/plain; charset=ISO-8859-1 Cc: Subject: growfs remove ufs/label and can't reset it with tunefs X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 10:13:27 -0000 Hi all, once run growfs on a partition that had an UFS label, this label is removed and it's no more possible to re-set it with tunefs. Here is how to reproduce (tested on 8.3 and 9.0): mdconfig -a -t malloc -s 10MB gpart create -s mbr /dev/md0 gpart add -t freebsd -s 5MB /dev/md0 newfs -L THELABEL /dev/md0s1 glabel status | grep THELABEL => Label is present, now we resize the slice: gpart resize -i 1 /dev/md0 glabel status | grep THELABEL => Label is still present, now we growfs the slice: growfs /dev/md0s1 glabel status | grep THELABEL => UFS label disapear ! Ok, I will try to re-set it: tunefs -L THELABEL /dev/md0s1 glabel status | grep THELABEL => Still no label !?! Should I create a PR about this problem ? Regards, Olivier From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 11:18:47 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 725381065670; Fri, 9 Mar 2012 11:18:47 +0000 (UTC) (envelope-from gperez@entel.upc.edu) Received: from dash.upc.es (dash.upc.es [147.83.2.50]) by mx1.freebsd.org (Postfix) with ESMTP id D28628FC13; Fri, 9 Mar 2012 11:18:46 +0000 (UTC) Received: from ackerman2.upc.es (ackerman2.upc.es [147.83.2.244]) by dash.upc.es (8.14.1/8.13.1) with ESMTP id q299mm1Z002832 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Fri, 9 Mar 2012 10:48:48 +0100 Received: from portgus.lan ([147.83.40.234]) (authenticated bits=0) by ackerman2.upc.es (8.14.4/8.14.4) with ESMTP id q299mmsK027440 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Fri, 9 Mar 2012 10:48:48 +0100 Message-ID: <4F59D1DD.9020708@entel.upc.edu> Date: Fri, 09 Mar 2012 10:48:13 +0100 From: =?ISO-8859-1?Q?Gustau_P=E9rez?= User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:10.0.2) Gecko/20120226 Thunderbird/10.0.2 MIME-Version: 1.0 To: George Neville-Neil References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.70 on 147.83.2.244 X-Mail-Scanned: Criba 2.0 + Clamd X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-3.0 (dash.upc.es [147.83.2.50]); Fri, 09 Mar 2012 10:48:49 +0100 (CET) Cc: current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 11:18:47 -0000 On 08/03/2012 22:20, George Neville-Neil wrote: > Howdy, > > I've taken the GSoC work done with the FUSE kernel module, and created a patch against HEAD > which I have now subjected to testing using tools/regression/fsx. > > The patch is here: http://people.freebsd.org/~gnn/head-fuse-1.diff > > I would like to commit this patch in the next few days, so, please, if you care > about this take a look and get back to me. > > Thanks, > George > > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to "freebsd-current-unsubscribe@freebsd.org" When this GSoC was going on, I asked Hans Peter Selasky (the mentor) and Ilya to try the code, because I thought the project would be very useful to me (mostly in the server side, there are a few distributed/parallel filesystems using fuse). The code was not finished at the time the GSoC ended. So it does work with some filesystems, with some others doesn't. Is this the last version Ilya released for the GSoC? From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 16:52:14 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E1DAF106566C; Fri, 9 Mar 2012 16:52:14 +0000 (UTC) (envelope-from hselasky@c2i.net) Received: from swip.net (mailfe01.c2i.net [212.247.154.2]) by mx1.freebsd.org (Postfix) with ESMTP id 10DB68FC13; Fri, 9 Mar 2012 16:52:13 +0000 (UTC) X-T2-Spam-Status: No, hits=-1.0 required=5.0 tests=ALL_TRUSTED Received: from [176.74.212.201] (account mc467741@c2i.net HELO laptop002.hselasky.homeunix.org) by mailfe01.swip.net (CommuniGate Pro SMTP 5.4.2) with ESMTPA id 251205459; Fri, 09 Mar 2012 17:42:03 +0100 From: Hans Petter Selasky To: freebsd-current@freebsd.org Date: Fri, 9 Mar 2012 17:40:27 +0100 User-Agent: KMail/1.13.5 (FreeBSD/8.3-PRERELEASE; KDE/4.4.5; amd64; ; ) References: <20120308235419.GX75778@deviant.kiev.zoral.com.ua> In-Reply-To: <20120308235419.GX75778@deviant.kiev.zoral.com.ua> X-Face: 'mmZ:T{)),Oru^0c+/}w'`gU1$ubmG?lp!=R4Wy\ELYo2)@'UZ24N@d2+AyewRX}mAm; Yp |U[@, _z/([?1bCfM{_"B<.J>mICJCHAzzGHI{y7{%JVz%R~yJHIji`y>Y}k1C4TfysrsUI -%GU9V5]iUZF&nRn9mJ'?&>O MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201203091740.27913.hselasky@c2i.net> Cc: Adrian Chadd , current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 16:52:15 -0000 On Friday 09 March 2012 00:54:19 Konstantin Belousov wrote: > On Thu, Mar 08, 2012 at 03:43:42PM -0800, Adrian Chadd wrote: > > Hi, > > Last version is here: https://code.google.com/p/google-summer-of-code-2011-freebsd/downloads/list --HPS From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 18:31:08 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 9E035106564A; Fri, 9 Mar 2012 18:31:08 +0000 (UTC) (envelope-from gnn@freebsd.org) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 705CB8FC14; Fri, 9 Mar 2012 18:31:08 +0000 (UTC) Received: from [209.249.190.124] (helo=[10.2.212.229]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1S64ap-0007OW-V2; Fri, 09 Mar 2012 13:31:07 -0500 Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=us-ascii From: George Neville-Neil In-Reply-To: <20120308225407.GV75778@deviant.kiev.zoral.com.ua> Date: Fri, 9 Mar 2012 13:31:14 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <8CDE3851-6C32-41ED-8CD0-70EC1BA245C5@freebsd.org> References: <20120308225407.GV75778@deviant.kiev.zoral.com.ua> To: Konstantin Belousov X-Mailer: Apple Mail (2.1257) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - freebsd.org Cc: current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 18:31:08 -0000 On Mar 8, 2012, at 17:54 , Konstantin Belousov wrote: > I just took a very quick look, and the code has all usual bugs. E.g., = the > filesystem is marked mpsafe, while insmntque() is performed before new > vnode is initialized. >=20 > The fuse was known to cause random kernel memory corruption, were the = issues > identified and fixed ? >=20 They are being identified and fixed as we speak. I fixed a couple = yesterday. > Who is going to maintain the code ? I once objected strongly for = throwing > the fuse into svn without first fixing bugs, and having a maintainer. I'm planning to maintain the code. As bugs arise I will take care of = them. I've been using fsx to seek them out. Best, George From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 18:31:58 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 4A50E1065689; Fri, 9 Mar 2012 18:31:58 +0000 (UTC) (envelope-from gnn@freebsd.org) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 1BF838FC18; Fri, 9 Mar 2012 18:31:58 +0000 (UTC) Received: from [209.249.190.124] (helo=[10.2.212.229]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1S64bd-0007OW-98; Fri, 09 Mar 2012 13:31:57 -0500 Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=iso-8859-1 From: George Neville-Neil In-Reply-To: <4F59D1DD.9020708@entel.upc.edu> Date: Fri, 9 Mar 2012 13:32:03 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: References: <4F59D1DD.9020708@entel.upc.edu> To: =?iso-8859-1?Q?Gustau_P=E9rez?= X-Mailer: Apple Mail (2.1257) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - freebsd.org Cc: current@freebsd.org, fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 18:31:58 -0000 On Mar 9, 2012, at 04:48 , Gustau P=E9rez wrote: > On 08/03/2012 22:20, George Neville-Neil wrote: >> Howdy, >>=20 >> I've taken the GSoC work done with the FUSE kernel module, and = created a patch against HEAD >> which I have now subjected to testing using tools/regression/fsx. >>=20 >> The patch is here: http://people.freebsd.org/~gnn/head-fuse-1.diff >>=20 >> I would like to commit this patch in the next few days, so, please, = if you care >> about this take a look and get back to me. >>=20 >> Thanks, >> George >>=20 >> _______________________________________________ >> freebsd-current@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-current >> To unsubscribe, send any mail to = "freebsd-current-unsubscribe@freebsd.org" >=20 > When this GSoC was going on, I asked Hans Peter Selasky (the mentor) = and Ilya to try the code, because I thought the project would be very = useful to me (mostly in the server side, there are a few = distributed/parallel filesystems using fuse). >=20 > The code was not finished at the time the GSoC ended. So it does = work with some filesystems, with some others doesn't. >=20 > Is this the last version Ilya released for the GSoC? Yes, with fixes. It's based off of here: https://github.com/glk/fuse-freebsd Best, George From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 18:44:38 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id D1109106564A for ; Fri, 9 Mar 2012 18:44:38 +0000 (UTC) (envelope-from feld@feld.me) Received: from feld.me (unknown [IPv6:2607:f4e0:100:300::2]) by mx1.freebsd.org (Postfix) with ESMTP id A74968FC15 for ; Fri, 9 Mar 2012 18:44:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=feld.me; s=blargle; h=In-Reply-To:Message-Id:From:Mime-Version:Date:References:Subject:To:Content-Type; bh=46RAtN62+Edb1K4qKrdDm27L12AOHDvS5uHq14KhG3E=; b=hVdhVcxiyOD+0uHvR3LolCFO0Jr6JM+TbmqLhD3sg4tdXpcC5K6Ipz1zK5dXNXMlm1ecG+/3UfeUwg2VC8S/P63rCwJSO8wQOjoZOA+T0K6rXLQQQiUIwRonD5COlapv; Received: from localhost ([127.0.0.1] helo=mwi1.coffeenet.org) by feld.me with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1S64ns-0007Tu-HD for freebsd-fs@freebsd.org; Fri, 09 Mar 2012 12:44:37 -0600 Received: from feld@feld.me by mwi1.coffeenet.org (Archiveopteryx 3.1.4) with esmtpsa id 1331318669-34990-34989/5/9; Fri, 9 Mar 2012 18:44:29 +0000 Content-Type: text/plain; charset=utf-8; format=flowed; delsp=yes To: freebsd-fs@freebsd.org References: Date: Fri, 9 Mar 2012 12:44:29 -0600 Mime-Version: 1.0 From: Mark Felder Message-Id: In-Reply-To: User-Agent: Opera Mail/11.62 (FreeBSD) X-SA-Score: -1.5 Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 18:44:38 -0000 The true test for me is "can you run mp3fs without causing kernel panics now?" as I'm told that's why it's not in ports anymore. From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 19:30:00 2012 Return-Path: Delivered-To: fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5BF21065674; Fri, 9 Mar 2012 19:30:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 58AFA8FC0A; Fri, 9 Mar 2012 19:30:00 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id B44A846B2C; Fri, 9 Mar 2012 14:29:53 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 05598B940; Fri, 9 Mar 2012 14:29:53 -0500 (EST) From: John Baldwin To: Konstantin Belousov Date: Fri, 9 Mar 2012 10:59:29 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201203071318.08241.jhb@freebsd.org> <201203081539.07711.jhb@freebsd.org> <20120308223919.GU75778@deviant.kiev.zoral.com.ua> In-Reply-To: <20120308223919.GU75778@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201203091059.29342.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 09 Mar 2012 14:29:53 -0500 (EST) Cc: freebsd-fs@freebsd.org, pho@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 19:30:00 -0000 On Thursday, March 08, 2012 5:39:19 pm Konstantin Belousov wrote: > On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote: > > On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > > > So I ran into this problem at work. Suppose you have a process that opens a > > > read-write file descriptor with O_EXLOCK (so it has an flock()). It then > > > writes out a binary into that file. Another process wants to execve() the > > > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and > > > will call execve() once it has locked the file. In theory, what should happen > > > is that the second process should wait until the first process has finished > > > and called close(). In practice what happens is that I occasionally see the > > > second process fail with ETXTBUSY. > > > > > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file > > > separately from the call to vn_close() which drops the writecount. Thus, the > > > second process can do an open() and flock() of the file and subsequently call > > > execve() after the first process has done the VOP_ADVLOCK(), but before it > > > calls into vn_close(). In fact, since vn_close() requires a write lock on the > > > vnode, this turns out to not be too hard to reproduce at all. Below is a > > > simple test program that reproduces this constantly. To use, copy /bin/test > > > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run > > > ./flock_close_race /tmp/foo. > > > > > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock > > > until after vn_close() executes. However, even with that fix applied, my test > > > case still fails. Now it is because open() with a given lock flag is > > > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount > > > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' > > > process has the fd locked, the writecount can still be bumped. One gross hack > > > would be to defer the bump of the writecount to the caller of vn_open() if the > > > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus > > > it doesn't actually work. I ended up moving acquiring the lock into > > > vn_open_cred(). The current patch I'm testing has both of these approaches, > > > but the first one is #if 0'd out, and the second is #if 1'd. > > > > > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch > > > > Based on some feedback from Konstantin, I've fixed some issues in the failure > > path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned > > above, so the patch is now the actual change that I'm testing. So far it > > handles both my workload at work and my test program without any issues. > > I think a comment is needed for a reason to call vn_writechk() second time. Fixed. > Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHLOCK > case in the patched kernel ? It wasn't. :( I wonder how this was even working since close shouldn't have been unlocking. I'll need to do some more testing. BTW, I ran into fhopen() and found that I would need to put all this same logic into that, so I've split the common code from fhopen() and vn_open_cred() into a new vn_open_vnode(). I think in general it improves both sets of code. I'll upate the patch once I've done some more testing. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 19:30:00 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A5BF21065674; Fri, 9 Mar 2012 19:30:00 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id 58AFA8FC0A; Fri, 9 Mar 2012 19:30:00 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [96.47.65.170]) by cyrus.watson.org (Postfix) with ESMTPSA id B44A846B2C; Fri, 9 Mar 2012 14:29:53 -0500 (EST) Received: from jhbbsd.localnet (unknown [209.249.190.124]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id 05598B940; Fri, 9 Mar 2012 14:29:53 -0500 (EST) From: John Baldwin To: Konstantin Belousov Date: Fri, 9 Mar 2012 10:59:29 -0500 User-Agent: KMail/1.13.5 (FreeBSD/8.2-CBSD-20110714-p10; KDE/4.5.5; amd64; ; ) References: <201203071318.08241.jhb@freebsd.org> <201203081539.07711.jhb@freebsd.org> <20120308223919.GU75778@deviant.kiev.zoral.com.ua> In-Reply-To: <20120308223919.GU75778@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201203091059.29342.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Fri, 09 Mar 2012 14:29:53 -0500 (EST) Cc: freebsd-fs@freebsd.org, pho@freebsd.org, fs@freebsd.org Subject: Re: close() of an flock'd file is not atomic X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 19:30:00 -0000 On Thursday, March 08, 2012 5:39:19 pm Konstantin Belousov wrote: > On Thu, Mar 08, 2012 at 03:39:07PM -0500, John Baldwin wrote: > > On Wednesday, March 07, 2012 1:18:07 pm John Baldwin wrote: > > > So I ran into this problem at work. Suppose you have a process that opens a > > > read-write file descriptor with O_EXLOCK (so it has an flock()). It then > > > writes out a binary into that file. Another process wants to execve() the > > > file when it is ready, so it opens the file with O_EXLOCK (or O_SHLOCK), and > > > will call execve() once it has locked the file. In theory, what should happen > > > is that the second process should wait until the first process has finished > > > and called close(). In practice what happens is that I occasionally see the > > > second process fail with ETXTBUSY. > > > > > > The bug is that the vn_closefile() does the VOP_ADVLOCK() to unlock the file > > > separately from the call to vn_close() which drops the writecount. Thus, the > > > second process can do an open() and flock() of the file and subsequently call > > > execve() after the first process has done the VOP_ADVLOCK(), but before it > > > calls into vn_close(). In fact, since vn_close() requires a write lock on the > > > vnode, this turns out to not be too hard to reproduce at all. Below is a > > > simple test program that reproduces this constantly. To use, copy /bin/test > > > to some other file (e.g. /tmp/foo) and make it writable (chmod a+w), then run > > > ./flock_close_race /tmp/foo. > > > > > > The "fix" I came up with is to defer calling VOP_ADVLOCK() to release the lock > > > until after vn_close() executes. However, even with that fix applied, my test > > > case still fails. Now it is because open() with a given lock flag is > > > non-atomic in that the open(O_RDWR) will call vn_open() and bump v_writecount > > > before it blocks on the lock due to O_EXLOCK, so even though the 'exec_child' > > > process has the fd locked, the writecount can still be bumped. One gross hack > > > would be to defer the bump of the writecount to the caller of vn_open() if the > > > caller passes in O_EXLOCK or O_SHLOCK, but that's a really gross kludge, plus > > > it doesn't actually work. I ended up moving acquiring the lock into > > > vn_open_cred(). The current patch I'm testing has both of these approaches, > > > but the first one is #if 0'd out, and the second is #if 1'd. > > > > > > http://www.freebsd.org/~jhb/patches/flock_open_close.patch > > > > Based on some feedback from Konstantin, I've fixed some issues in the failure > > path handling for VOP_ADVLOCK(). I've also removed the #if 0'd code mentioned > > above, so the patch is now the actual change that I'm testing. So far it > > handles both my workload at work and my test program without any issues. > > I think a comment is needed for a reason to call vn_writechk() second time. Fixed. > Could you, please, point me, where the FHASLOCK is set for O_EXLOCK | O_SHLOCK > case in the patched kernel ? It wasn't. :( I wonder how this was even working since close shouldn't have been unlocking. I'll need to do some more testing. BTW, I ran into fhopen() and found that I would need to put all this same logic into that, so I've split the common code from fhopen() and vn_open_cred() into a new vn_open_vnode(). I think in general it improves both sets of code. I'll upate the patch once I've done some more testing. -- John Baldwin From owner-freebsd-fs@FreeBSD.ORG Fri Mar 9 22:53:36 2012 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 41C3D1065675 for ; Fri, 9 Mar 2012 22:53:36 +0000 (UTC) (envelope-from gnn@neville-neil.com) Received: from vps.hungerhost.com (vps.hungerhost.com [216.38.53.176]) by mx1.freebsd.org (Postfix) with ESMTP id 12B188FC0A for ; Fri, 9 Mar 2012 22:53:35 +0000 (UTC) Received: from [209.249.190.124] (helo=[10.2.212.229]) by vps.hungerhost.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1S65Zy-0003Zi-PL; Fri, 09 Mar 2012 14:34:18 -0500 Mime-Version: 1.0 (Apple Message framework v1257) Content-Type: text/plain; charset=us-ascii From: George Neville-Neil In-Reply-To: Date: Fri, 9 Mar 2012 14:34:25 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <0826905E-32FE-4F44-923F-220D7A11E65C@neville-neil.com> References: To: Mark Felder X-Mailer: Apple Mail (2.1257) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - vps.hungerhost.com X-AntiAbuse: Original Domain - freebsd.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - neville-neil.com Cc: freebsd-fs@freebsd.org Subject: Re: RFC: FUSE kernel module for the kernel... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Mar 2012 22:53:36 -0000 On Mar 9, 2012, at 13:44 , Mark Felder wrote: > The true test for me is "can you run mp3fs without causing kernel = panics now?" as I'm told that's why it's not in ports anymore. At the moment I'm only using the FUSE example, but I have run glusterfs = on it, and also sshfs. Best, George