From owner-freebsd-fs@freebsd.org Sun Jul 5 15:12:53 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2C1BC9F0A for ; Sun, 5 Jul 2015 15:12:53 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0A8E311A8 for ; Sun, 5 Jul 2015 15:12:52 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 50CD63F6F3 for ; Sun, 5 Jul 2015 11:12:45 -0400 (EDT) Message-ID: <5599496C.6010702@sneakertech.com> Date: Sun, 05 Jul 2015 11:12:44 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD FS Subject: A question about ZFS built-in SMB Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 15:12:53 -0000 Assuming the following: - A server running FreeBSD 10.1 - A ZFS pool with no restrictions on how it can be set up - Clients running Windows XP/Vista/7/8 - The need for a "public share" with two main directories, which we'll call 'stuff' and 'dropbox'. Anonymous guest users have read/write access to 'dropbox', and read-only access to 'stuff' as well as being restricted in which files and directories they can even see there. Admin-class users have full permissions and visibility to both directories. Is installing Samba still a requirement, or is ZFS's built-in SMB sharing complete and robust enough now to be able to handle everything natively? (Alternatively, is SMB itself even still a requirement or are there other options these days (that don't require installing software or custom configs on the clients))? From owner-freebsd-fs@freebsd.org Sun Jul 5 15:24:16 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1EFD9A14D for ; Sun, 5 Jul 2015 15:24:16 +0000 (UTC) (envelope-from felipemonteiro.carvalho@gmail.com) Received: from mail-pd0-x22f.google.com (mail-pd0-x22f.google.com [IPv6:2607:f8b0:400e:c02::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id E55DE1AA1; Sun, 5 Jul 2015 15:24:15 +0000 (UTC) (envelope-from felipemonteiro.carvalho@gmail.com) Received: by pddu5 with SMTP id u5so3542343pdd.3; Sun, 05 Jul 2015 08:24:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=7lCvdf7YhocBdIxjrTrDj2XwQGNouq3eIm1KtOME920=; b=Rvc1jryHd1h5ss/x2N7JhFcc10lOOc/evohwW2l7MkuD1FUiCizTT6Rbj1ZjzFqhi8 bPrjITidlh0y7FpyiTu1idNEdlRgYYhPh04ziLgBX3tqg9y2+WwwYLuOvsKowPLC5DWB vZ7p5wrHQfG2qbSdfC12iVnI1vAOZjs3JBoLvx8kdLzvvz6PfkHiGT4AKRDAeTAQAth+ ff7EFi9fKab7rGGoutxmTv5F6JN3/8ibmlkzsXlVkzwh+Dam1XQGtr0OQI9g+VyGJRP4 /6ts+gdga7EnCMfBme78IBI5FPDX5e9xxZsgU6Tr/lMT8WMK504GM4cjYG4qM/8tHzFj xBxQ== MIME-Version: 1.0 X-Received: by 10.66.136.39 with SMTP id px7mr36042509pab.141.1436109854855; Sun, 05 Jul 2015 08:24:14 -0700 (PDT) Received: by 10.66.147.4 with HTTP; Sun, 5 Jul 2015 08:24:14 -0700 (PDT) In-Reply-To: <5596B3CF.50703@freebsd.org> References: <557B0255.8060809@freebsd.org> <01184F08-1C6B-4282-9203-1BF98F07A05A@gmail.com> <557C282D.8060809@freebsd.org> <5596B3CF.50703@freebsd.org> Date: Sun, 5 Jul 2015 17:24:14 +0200 Message-ID: Subject: Re: Uberblock location From: Felipe Monteiro de Carvalho To: Julian Elischer Cc: "freebsd-fs@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 15:24:16 -0000 On Fri, Jul 3, 2015 at 6:09 PM, Julian Elischer wrote: > which? the kernel code or the bootblock code? > I believe the bootblock version /usr/src/sys/boot/zfs from woudl be easier > to start with. Yes, I'm basing in the bootblock code. > define "path".. in a way that is OS independent and meaningful when running > in a bios environment (bootblocks). Yes, its not meaningful for the boot, but my point was not that the boot code doesn't handle the info, but instead that the partition does not contain this info, which would be very useful for a user program. Surely the zfs pool handling tools must store this info somewhere (probably somewhere in /etc) if the partition itself doesn't contain it. > i should be from the base of the partition containing the filesystem but I > feel you are > probably already doing this of you probably wouldn't have got this far. Yes, I meant partition start. Anyway, I think I figured out myself, it looks like that the offsets are relative to VDEV_LABEL_START_SIZE thanks, -- Felipe Monteiro de Carvalho From owner-freebsd-fs@freebsd.org Sun Jul 5 16:03:11 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id ECC1CA825 for ; Sun, 5 Jul 2015 16:03:10 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-ob0-x22c.google.com (mail-ob0-x22c.google.com [IPv6:2607:f8b0:4003:c01::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B3F851E6A for ; Sun, 5 Jul 2015 16:03:10 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by obbop1 with SMTP id op1so93283452obb.2 for ; Sun, 05 Jul 2015 09:03:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=vUkGDlLxu0ne5bw9CmoKvjO+xZILYqqahCdGfOwOZng=; b=hsY12jy9MKcrOzPwwYUT4lJvVovn4+scxAmV6BGZBCxA3HTVWj4mQmUdaxOdT3YazA ZfVoV9XCgWqJYzXYdAym4LCZoQrAeVMaz1Sl2+pQ8goZxStgRsH5RFKn+7cE0xGjh641 yypqvBCX4Vliu1wbXOLqQiFBsJxcbwagiK92y4/y7ac5BxLqNDt2EghGtV5PgdhJC82o CERtuMdGofCNdzIx82Qd6fsMGrKcfX+ci939XA4zzEsIeBHFwmR/eH4a9lTevgNQRuub ULRgcQeLBJ4rl6r0ai2pH3aQyWk6Rq6JOX1jkmC3opJkiXAWM4KirqbIAmiZYoTvetz2 +2VA== MIME-Version: 1.0 X-Received: by 10.182.95.69 with SMTP id di5mr41329144obb.73.1436112189963; Sun, 05 Jul 2015 09:03:09 -0700 (PDT) Received: by 10.76.81.100 with HTTP; Sun, 5 Jul 2015 09:03:09 -0700 (PDT) Received: by 10.76.81.100 with HTTP; Sun, 5 Jul 2015 09:03:09 -0700 (PDT) In-Reply-To: References: <5599496C.6010702@sneakertech.com> Date: Sun, 5 Jul 2015 09:03:09 -0700 Message-ID: Subject: Fwd: Re: A question about ZFS built-in SMB From: Freddie Cash To: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 16:03:11 -0000 Forgot to include the list in the reply. ---------- Forwarded message ---------- From: "Freddie Cash" Date: Jul 5, 2015 8:55 AM Subject: Re: A question about ZFS built-in SMB To: "Quartz" Cc: On Jul 5, 2015 8:13 AM, "Quartz" wrote: > > Assuming the following: > > - A server running FreeBSD 10.1 > > - A ZFS pool with no restrictions on how it can be set up > > - Clients running Windows XP/Vista/7/8 > > - The need for a "public share" with two main directories, which we'll call 'stuff' and 'dropbox'. Anonymous guest users have read/write access to 'dropbox', and read-only access to 'stuff' as well as being restricted in which files and directories they can even see there. Admin-class users have full permissions and visibility to both directories. > > > > Is installing Samba still a requirement, or is ZFS's built-in SMB sharing complete and robust enough now to be able to handle everything natively? (Alternatively, is SMB itself even still a requirement or are there other options these days (that don't require installing software or custom configs on the clients))? SMB support is only built-in on Solaris derivatives. You need Samba on everything else. From owner-freebsd-fs@freebsd.org Sun Jul 5 17:11:26 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6E5429476 for ; Sun, 5 Jul 2015 17:11:26 +0000 (UTC) (envelope-from matt.churchyard@userve.net) Received: from smtp-outbound.userve.net (smtp-outbound.userve.net [217.196.1.22]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.userve.net", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2269F10AD; Sun, 5 Jul 2015 17:11:25 +0000 (UTC) (envelope-from matt.churchyard@userve.net) Received: from owa.usd-group.com (owa.usd-group.com [217.196.1.2]) by smtp-outbound.userve.net (8.14.7/8.14.7) with ESMTP id t65H35Ac037357 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=FAIL); Sun, 5 Jul 2015 18:03:05 +0100 (BST) (envelope-from matt.churchyard@userve.net) Received: from SERVER.ad.usd-group.com (192.168.0.1) by SERVER.ad.usd-group.com (192.168.0.1) with Microsoft SMTP Server (TLS) id 15.0.847.32; Sun, 5 Jul 2015 18:03:05 +0100 Received: from SERVER.ad.usd-group.com ([fe80::b19d:892a:6fc7:1c9]) by SERVER.ad.usd-group.com ([fe80::b19d:892a:6fc7:1c9%12]) with mapi id 15.00.0847.030; Sun, 5 Jul 2015 18:03:04 +0100 From: Matt Churchyard To: Felipe Monteiro de Carvalho CC: Julian Elischer , "freebsd-fs@freebsd.org" Subject: Re: Uberblock location Thread-Topic: Uberblock location Thread-Index: AQHQpRa59Akg6O6pI0u8erEiljPWpp2o9yCAgAEoxoCAADWEgIAdrBmAgAH47YCAAxfrAIAALGFU Date: Sun, 5 Jul 2015 17:03:04 +0000 Message-ID: References: <557B0255.8060809@freebsd.org> <01184F08-1C6B-4282-9203-1BF98F07A05A@gmail.com> <557C282D.8060809@freebsd.org> <5596B3CF.50703@freebsd.org>, In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 17:11:26 -0000 > On 5 Jul 2015, at 16:24, Felipe Monteiro de Carvalho wrote: >=20 >> On Fri, Jul 3, 2015 at 6:09 PM, Julian Elischer wro= te: >> which? the kernel code or the bootblock code? >> I believe the bootblock version /usr/src/sys/boot/zfs from woudl be easi= er >> to start with. >=20 > Yes, I'm basing in the bootblock code. >=20 >> define "path".. in a way that is OS independent and meaningful when runn= ing >> in a bios environment (bootblocks). >=20 > Yes, its not meaningful for the boot, but my point was not that the > boot code doesn't handle the info, but instead that the partition does > not contain this info, which would be very useful for a user program. >=20 > Surely the zfs pool handling tools must store this info somewhere > (probably somewhere in /etc) if the partition itself doesn't contain > it. All the information is on disk. There's a cache of imported pool config on = the system but it's not required. Each disk stores the details, including d= evice path, for each disk in the same vdev. If you'd created a mirror, both= disks would have the details for each other. As a stripe, you have 2 vdevs= , each only containing their own config. All vdevs do know the total number= of vdevs though. So if an entire vdev is missing, ZFS knows it's missing,= but can't tell the disks involved. I assume you're aware of 'zdb -l device', which outputs exactly what ZFS co= nfig is stored in the label on a disk. Your code is already outputting this= information, but the zdb command is useful to neatly display exactly what = is on each disk. >=20 >> i should be from the base of the partition containing the filesystem but= I >> feel you are >> probably already doing this of you probably wouldn't have got this far. >=20 > Yes, I meant partition start. >=20 > Anyway, I think I figured out myself, it looks like that the offsets > are relative to VDEV_LABEL_START_SIZE >=20 > thanks, > --=20 > Felipe Monteiro de Carvalho > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" From owner-freebsd-fs@freebsd.org Sun Jul 5 21:00:40 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B965698E13F for ; Sun, 5 Jul 2015 21:00:40 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9379C1BF5 for ; Sun, 5 Jul 2015 21:00:40 +0000 (UTC) (envelope-from bugzilla-noreply@FreeBSD.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t65L0eoq048803 for ; Sun, 5 Jul 2015 21:00:40 GMT (envelope-from bugzilla-noreply@FreeBSD.org) Message-Id: <201507052100.t65L0eoq048803@kenobi.freebsd.org> From: bugzilla-noreply@FreeBSD.org To: freebsd-fs@FreeBSD.org Subject: Problem reports for freebsd-fs@FreeBSD.org that need special attention X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 Date: Sun, 05 Jul 2015 21:00:40 +0000 Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 21:00:40 -0000 To view an individual PR, use: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=(Bug Id). The following is a listing of current problems submitted by FreeBSD users, which need special attention. These represent problem reports covering all versions including experimental development code and obsolete releases. Status | Bug Id | Description ------------+-----------+--------------------------------------------------- Open | 136470 | [nfs] Cannot mount / in read-only, over NFS Open | 139651 | [nfs] mount(8): read-only remount of NFS volume d Open | 144447 | [zfs] sharenfs fsunshare() & fsshare_main() non f 3 problems total for which you should take action. From owner-freebsd-fs@freebsd.org Sun Jul 5 21:01:29 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 80E9998E26E for ; Sun, 5 Jul 2015 21:01:29 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wg0-x22a.google.com (mail-wg0-x22a.google.com [IPv6:2a00:1450:400c:c00::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 289D81DEB; Sun, 5 Jul 2015 21:01:29 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wgjx7 with SMTP id x7so124532510wgj.2; Sun, 05 Jul 2015 14:01:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=Xd8TEDa9/mDxNeuMEdQ53NbhkWd1H2+Nvn5qZZnSahM=; b=Qnnuc9zNdsL4ZjjgftntiazxwSWdCGYAl2mLn6kiHj2t6CS+GvTeEJuKZyNbkHAiTN v2O13IPweCauo6RL+TsnNuIEvin76Gmv83NdKPVUafho5r4mogEvi1Tib+2AIBUIp5n0 ERT0Yf+7dnLqqxeS26kMqcfBnWlgsQobinHo+j5KSzLXWpI854aCJAj7dwUQc5J+KiXZ p94tTnwqy+Gz52w1lO6INSn3B53kTI/HNEmeveSejWnS3BcOK+bsKsGCZNUnrr+Wffn/ 0a46PkXjZEQb/l7p29zxuMSBVjDAStRvGXSLSxAiTrgFkTe3PnYsPbua+olePnuGCvXh /G6A== X-Received: by 10.180.79.133 with SMTP id j5mr85788441wix.38.1436130086095; Sun, 05 Jul 2015 14:01:26 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Sun, 5 Jul 2015 14:01:06 -0700 (PDT) In-Reply-To: <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> References: <1491630362.2785531.1435799383802.JavaMail.zimbra@uoguelph.ca> <5594B008.10202@freebsd.org> <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Sun, 5 Jul 2015 23:01:06 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 21:01:29 -0000 Hi folks, Just a quick update. I did not test Xin's patches yet .. What I did so far is to increase the tcp highwater tunable and increase nfsd threads to 60. Today (a working day) I noticed I only got one bad sequence error message! Check this: # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c 1 messages:Jul5 39 messages.1:Jun28 15 messages.1:Jun29 4 messages.1:Jun30 9 messages.1:Jul1 23 messages.1:Jul2 1 messages.1:Jul4 1 messages.2:Jun28 So there seems to be an improvement! Not sure if the Linux nfs4 client is able to somehow recover from those bad-sequence situations or not .. I did get some user complaints that running "ls -l" is sometimes slow and takes a couple of seconds to finish. One final question .. Do you folks think nfs4.1 is more reliable in general than nfs4 .. I've always only used nfs3 (I guess it can't work here with /home/* being separate zfs filesystems) .. So should I go through the pain of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do you expect the protocol to be more solid ? I know it's a fluffy question, just give me your thoughts. Thanks a lot! On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem wrote: > Ahmed Kamal wrote: > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > > reports from users about hung vnc sessions. So maybe just maybe, linux > > clients are able to somehow recover from this bad sequence messages. I > > could still see the bad sequence error message in logs though > > > > Why isn't the highwater tunable set to something better by default ? I > mean > > this server is certainly not under a high or unusual load (it's only 40 > PCs > > mounting from it) > > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > email.ahmedkamal@googlemail.com > > > wrote: > > > > > Thanks all .. I understand now we're doing the "right thing" .. > Although > > > if mounting keeps wedging, I will have to solve it somehow! Either > using > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > > > > Regarding Xin's patch, is it possible to build the patched nfsd code, > as a > > > kernel module ? I'm looking to minimize my delta to upstream. > > > > Yes, you can build the nfsd as a module. If your kernel config does not > include > "options NFSD" the module will get loaded/used. It is also possible to > replace > the module without rebooting, but you need to kill of the nfsd daemon then > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > /boot/.) > > > > Also would adopting Xin's patch and hiding it behind a > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not the > last > > > person on earth to hit this) ? > > > > If it fixes your problem, I think this is reasonable. > I'm also hoping that someone that works on the Linux client reports > if/when this > was changed. > > rick > > > > Thanks a lot for all the help! > > > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem > > > wrote: > > > > > >> Ahmed Kamal wrote: > > >> > Appreciating the fruitful discussion! Can someone please explain to > me, > > >> > what would happen in the current situation (linux client doing this > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect of > that? > > >> Well, as you've seen, the Linux client doesn't function correctly > against > > >> the FreeBSD server (and probably others that don't support this > > >> "skip-by-1" > > >> case). > > >> > > >> > What do users see? Any chances of data loss? > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what the > Linux > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the guy > > >> observing > > >> it. > > >> > > >> > > > >> > Also, I find it strange that netapp have acknowledged this is a bug > on > > >> > their side, which has been fixed since then! > > >> Yea, I think Netapp screwed up. For some reason their server allowed > this, > > >> then was fixed to not allow it and then someone decided that was > broken > > >> and > > >> reversed it. > > >> > > >> > I also find it strange that I'm the first to hit this :) Is no one > > >> running > > >> > nfs4 yet! > > >> > > > >> Well, it seems to be slowly catching on. I suspect that the Linux > client > > >> mounting a Netapp is the most common use of it. Since it appears that > they > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > > >> > > >> It may turn out that the Linux client has been fixed or it may turn > out > > >> that most servers allowed this "skip-by-1" even though David Noveck > (one > > >> of the main authors of the protocol) seems to agree with me that it > should > > >> not be allowed. > > >> > > >> It is possible that others have bumped into this, but it wasn't > isolated > > >> (I wouldn't have guessed it, so it was good you pointed to the RedHat > > >> discussion) > > >> and they worked around it by reverting to NFSv3 or similar. > > >> The protocol is rather complex in this area and changed completely for > > >> NFSv4.1, > > >> so many have also probably moved onto NFSv4.1 where this won't be an > > >> issue. > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > doesn't > > >> use > > >> these seqid fields.) > > >> > > >> This is all just mho, rick > > >> > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem > > >> wrote: > > >> > > > >> > > Julian Elischer wrote: > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > >> > > > > I am going to post to nfsv4@ietf.org to see what they say. > Please > > >> > > > > let me know if Xin Li's patch resolves your problem, even > though I > > >> > > > > don't believe it is correct except for the UINT32_MAX case. > Good > > >> > > > > luck with it, rick > > >> > > > and please keep us all in the loop as to what they say! > > >> > > > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always N+1 > in a > > >> > > > number field that has a > > >> > > > bit of slack at wrap time (probably due to some ambiguity in the > > >> > > > original spec). > > >> > > > > > >> > > Actually, since N is the lock op already done, N + 1 is the next > lock > > >> > > operation in order. Since lock ops need to be strictly ordered, > > >> allowing > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > sense. > > >> > > > > >> > > I think the author of the RFC meant that N + 2 or greater fails, > but > > >> it > > >> > > was poorly worded. > > >> > > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There is > an > > >> archive > > >> > > of it somewhere, but I can't remember where.;-) > > >> > > > > >> > > rick > > >> > > _______________________________________________ > > >> > > freebsd-fs@freebsd.org mailing list > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > >> > > To unsubscribe, send any mail to " > freebsd-fs-unsubscribe@freebsd.org" > > >> > > > > >> > > > >> > > > > > > > > > From owner-freebsd-fs@freebsd.org Sun Jul 5 21:03:09 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 5B91498E2CE for ; Sun, 5 Jul 2015 21:03:09 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from mail.in-addr.com (mail.in-addr.com [IPv6:2a01:4f8:191:61e8::2525:2525]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 22E8C1EBA for ; Sun, 5 Jul 2015 21:03:09 +0000 (UTC) (envelope-from gpalmer@freebsd.org) Received: from gjp by mail.in-addr.com with local (Exim 4.85 (FreeBSD)) (envelope-from ) id 1ZBr46-000LPq-3e; Sun, 05 Jul 2015 22:03:06 +0100 Date: Sun, 5 Jul 2015 22:03:06 +0100 From: Gary Palmer To: Quartz Cc: FreeBSD FS Subject: Re: A question about ZFS built-in SMB Message-ID: <20150705210306.GA1048@in-addr.com> References: <5599496C.6010702@sneakertech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5599496C.6010702@sneakertech.com> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: gpalmer@freebsd.org X-SA-Exim-Scanned: No (on mail.in-addr.com); SAEximRunCond expanded to false X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 21:03:09 -0000 On Sun, Jul 05, 2015 at 11:12:44AM -0400, Quartz wrote: > Assuming the following: > > - A server running FreeBSD 10.1 > > - A ZFS pool with no restrictions on how it can be set up > > - Clients running Windows XP/Vista/7/8 > > - The need for a "public share" with two main directories, which we'll > call 'stuff' and 'dropbox'. Anonymous guest users have read/write access > to 'dropbox', and read-only access to 'stuff' as well as being > restricted in which files and directories they can even see there. > Admin-class users have full permissions and visibility to both directories. > > > > Is installing Samba still a requirement, or is ZFS's built-in SMB > sharing complete and robust enough now to be able to handle everything > natively? (Alternatively, is SMB itself even still a requirement or are > there other options these days (that don't require installing software > or custom configs on the clients))? The sharesmb option to zfs does not work on FreeBSD. You need to use Samba. >From "man zfs": sharesmb=on | off | opts The sharesmb property currently has no effect on FreeBSD. As far as I am aware, without 3rd party software, Windows only supports SMB/CIFS. Note: I try to keep as far away from Windows as possible, so that may be wrong. Regards, Gary From owner-freebsd-fs@freebsd.org Sun Jul 5 21:41:29 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8194998ECB1 for ; Sun, 5 Jul 2015 21:41:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6CF9119FA for ; Sun, 5 Jul 2015 21:41:29 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t65LfTMs064032 for ; Sun, 5 Jul 2015 21:41:29 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 161424] [nullfs] __getcwd() calls fail when used on nullfs mount Date: Sun, 05 Jul 2015 21:41:29 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 8.2-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: berend@pobox.com X-Bugzilla-Status: In Progress X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 21:41:29 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=161424 berend@pobox.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |berend@pobox.com --- Comment #8 from berend@pobox.com --- Still happens on latest 10.1 To repeat you need to be an unprivileged user. As root I get: r = 0x7fffffffe690, errno = 2 dir = /home/berend/sites/staging.xplainhosting.com As user berend I get: r = 0x0, errno = 13 dir = My setup: zpool name u1. Mounted as: /u1/home on /home (nullfs, local) Users reside in a postgresql which is made available via pam_pgsql and nss-pgsql.conf files. The latter appears to be inconsequential, because when a user is simply listed in /etc/passwd get the same error. For example this user: www:*:80:80::0:0:World Wide Web Owner:/nonexistent:/bin/sh I get: $ pwd /home/test # su www $ /tmp/a.out r = 0x0, errno = 13 dir = $ whoami www Hope this helps. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Sun Jul 5 22:30:05 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B89A3992582 for ; Sun, 5 Jul 2015 22:30:05 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 52BC61E53; Sun, 5 Jul 2015 22:30:04 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2C7BABRrplV/61jaINWBoNmYAaDGbozgWQKhS1KAoFXEgEBAQEBAQGBCoQjAQEBAwEBAQEgKyALEAIBCA4KAgINGQICJwEJJgIECAcEARwEh3kDCggNsS+Pbg2FYAEBAQcBAQEBAR2BIYoqgk2BVhACAQUIAQ40B4JogUMFjBmHfIRiglmBXYQKRINRiwCEKoNbAiaCDByBbyIxB39BgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,411,1432612800"; d="scan'208";a="222063314" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 05 Jul 2015 18:28:55 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id A447715F542; Sun, 5 Jul 2015 18:28:55 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id DyGR0CgU2CTx; Sun, 5 Jul 2015 18:28:54 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 6191115F54D; Sun, 5 Jul 2015 18:28:54 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id jz4sox0meuvF; Sun, 5 Jul 2015 18:28:54 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3F17815F542; Sun, 5 Jul 2015 18:28:54 -0400 (EDT) Date: Sun, 5 Jul 2015 18:28:53 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Message-ID: <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: rcZ265AjBv92fGCCMrwVSGc0qtHu/A== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Jul 2015 22:30:05 -0000 Ahmed Kamal wrote: > Hi folks, > > Just a quick update. I did not test Xin's patches yet .. What I did so far > is to increase the tcp highwater tunable and increase nfsd threads to 60. > Today (a working day) I noticed I only got one bad sequence error message! > Check this: > > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c > 1 messages:Jul5 > 39 messages.1:Jun28 > 15 messages.1:Jun29 > 4 messages.1:Jun30 > 9 messages.1:Jul1 > 23 messages.1:Jul2 > 1 messages.1:Jul4 > 1 messages.2:Jun28 > > So there seems to be an improvement! Not sure if the Linux nfs4 client is > able to somehow recover from those bad-sequence situations or not .. I did > get some user complaints that running "ls -l" is sometimes slow and takes a > couple of seconds to finish. > > One final question .. Do you folks think nfs4.1 is more reliable in general > than nfs4 .. I've always only used nfs3 (I guess it can't work here with > /home/* being separate zfs filesystems) .. So should I go through the pain > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do you > expect the protocol to be more solid ? I know it's a fluffy question, just > give me your thoughts. Thanks a lot! > All I can say is that the "bad seqid" errors should not occur, since NFSv4.1 doesn't use the seqid#s to order RPCs. Also I would say that a correctly implemented NFSv4.1 protocol should function "more correctly" since all RPCs and performed "exactly once". (How much effect this will have in practice, I can't say.) On the other hand, NFSv4.1 is a newer protocol (with an RFC of over 500pages), so it is hard to say how mature the implementations are. I think only testing will give you the answer. I would suggest that you test Xi Lin's patch that allows the "seqid + 2" case and see if that makes the "bad seqid" errors go away. (Even though I think this would indicate a client bug, adding this in way that it can be enabled via a sysctl seems reasonable.) Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, rick > > > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem wrote: > > > Ahmed Kamal wrote: > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > > > reports from users about hung vnc sessions. So maybe just maybe, linux > > > clients are able to somehow recover from this bad sequence messages. I > > > could still see the bad sequence error message in logs though > > > > > > Why isn't the highwater tunable set to something better by default ? I > > mean > > > this server is certainly not under a high or unusual load (it's only 40 > > PCs > > > mounting from it) > > > > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com > > > > wrote: > > > > > > > Thanks all .. I understand now we're doing the "right thing" .. > > Although > > > > if mounting keeps wedging, I will have to solve it somehow! Either > > using > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > > > > > > Regarding Xin's patch, is it possible to build the patched nfsd code, > > as a > > > > kernel module ? I'm looking to minimize my delta to upstream. > > > > > > Yes, you can build the nfsd as a module. If your kernel config does not > > include > > "options NFSD" the module will get loaded/used. It is also possible to > > replace > > the module without rebooting, but you need to kill of the nfsd daemon then > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > > /boot/.) > > > > > > Also would adopting Xin's patch and hiding it behind a > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not the > > last > > > > person on earth to hit this) ? > > > > > > If it fixes your problem, I think this is reasonable. > > I'm also hoping that someone that works on the Linux client reports > > if/when this > > was changed. > > > > rick > > > > > > Thanks a lot for all the help! > > > > > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem > > > > wrote: > > > > > > > >> Ahmed Kamal wrote: > > > >> > Appreciating the fruitful discussion! Can someone please explain to > > me, > > > >> > what would happen in the current situation (linux client doing this > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect of > > that? > > > >> Well, as you've seen, the Linux client doesn't function correctly > > against > > > >> the FreeBSD server (and probably others that don't support this > > > >> "skip-by-1" > > > >> case). > > > >> > > > >> > What do users see? Any chances of data loss? > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what the > > Linux > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the guy > > > >> observing > > > >> it. > > > >> > > > >> > > > > >> > Also, I find it strange that netapp have acknowledged this is a bug > > on > > > >> > their side, which has been fixed since then! > > > >> Yea, I think Netapp screwed up. For some reason their server allowed > > this, > > > >> then was fixed to not allow it and then someone decided that was > > broken > > > >> and > > > >> reversed it. > > > >> > > > >> > I also find it strange that I'm the first to hit this :) Is no one > > > >> running > > > >> > nfs4 yet! > > > >> > > > > >> Well, it seems to be slowly catching on. I suspect that the Linux > > client > > > >> mounting a Netapp is the most common use of it. Since it appears that > > they > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > > > >> > > > >> It may turn out that the Linux client has been fixed or it may turn > > out > > > >> that most servers allowed this "skip-by-1" even though David Noveck > > (one > > > >> of the main authors of the protocol) seems to agree with me that it > > should > > > >> not be allowed. > > > >> > > > >> It is possible that others have bumped into this, but it wasn't > > isolated > > > >> (I wouldn't have guessed it, so it was good you pointed to the RedHat > > > >> discussion) > > > >> and they worked around it by reverting to NFSv3 or similar. > > > >> The protocol is rather complex in this area and changed completely for > > > >> NFSv4.1, > > > >> so many have also probably moved onto NFSv4.1 where this won't be an > > > >> issue. > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > > doesn't > > > >> use > > > >> these seqid fields.) > > > >> > > > >> This is all just mho, rick > > > >> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem > > > >> wrote: > > > >> > > > > >> > > Julian Elischer wrote: > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they say. > > Please > > > >> > > > > let me know if Xin Li's patch resolves your problem, even > > though I > > > >> > > > > don't believe it is correct except for the UINT32_MAX case. > > Good > > > >> > > > > luck with it, rick > > > >> > > > and please keep us all in the loop as to what they say! > > > >> > > > > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always N+1 > > in a > > > >> > > > number field that has a > > > >> > > > bit of slack at wrap time (probably due to some ambiguity in the > > > >> > > > original spec). > > > >> > > > > > > >> > > Actually, since N is the lock op already done, N + 1 is the next > > lock > > > >> > > operation in order. Since lock ops need to be strictly ordered, > > > >> allowing > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > > sense. > > > >> > > > > > >> > > I think the author of the RFC meant that N + 2 or greater fails, > > but > > > >> it > > > >> > > was poorly worded. > > > >> > > > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There is > > an > > > >> archive > > > >> > > of it somewhere, but I can't remember where.;-) > > > >> > > > > > >> > > rick > > > >> > > _______________________________________________ > > > >> > > freebsd-fs@freebsd.org mailing list > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > >> > > To unsubscribe, send any mail to " > > freebsd-fs-unsubscribe@freebsd.org" > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > > > From owner-freebsd-fs@freebsd.org Mon Jul 6 03:05:39 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2968998EC05 for ; Mon, 6 Jul 2015 03:05:39 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x231.google.com (mail-wi0-x231.google.com [IPv6:2a00:1450:400c:c05::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B2E351D3D; Mon, 6 Jul 2015 03:05:38 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wifm2 with SMTP id m2so16891959wif.1; Sun, 05 Jul 2015 20:05:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=YZ3+XgzANHejIquJdREkdlK+VMxG8PMdZTjy2YnYvUk=; b=wFByKVPP/+6+YrPPH1gMmA6mmKasBxGkYYp6ibOqyCH9JfQzRzitOLmWgiP9hdygDU ZDOhvlKj+fstWuEHRqZGPzYSNxKndys/a6EP+YSpTtP7BpgxT1XqRENGPCV21TAKy5SR 13pYOChcHkcro/90J6RAg0Xzs6nkPylpLh0GsDIbqA9x4JFUWGrxJAYT9W5WFy8gXiO0 99IMeKrzr6+iMbs9+EZzJ43WswulezLyI3BRv6UW1jDpuo0rP6zvSHOX4B/s7FvrlrK8 7UzEDi8C8318/7HAoyv7qwmOXPPtboiN17+3JvSxYS3F8HNZvYI9AfMVdZUMxryaapCC 1D8g== X-Received: by 10.180.83.40 with SMTP id n8mr48772872wiy.57.1436151936892; Sun, 05 Jul 2015 20:05:36 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by mx.google.com with ESMTPSA id pd7sm25571851wjb.27.2015.07.05.20.05.34 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Jul 2015 20:05:35 -0700 (PDT) From: Mateusz Guzik To: freebsd-fs@freebsd.org Cc: kib@freesd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: [PATCH 0/2] slightly cheaper file lookups + an idea Date: Mon, 6 Jul 2015 05:05:30 +0200 Message-Id: <1436151932-12514-1-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 03:05:39 -0000 From: Mateusz Guzik First, 2 simple patches which imho should go in regardless of whether the idea below is accepted. The first one shuffles code around to make the second one easier, which removes a requirement to hold filedesc lock imposed by auditing. With that out of the way: filedesc lock is held so that fd_cdir and fd_rdir vnodes can be obtained for lookup purposes. In effect namei unnecessarily competes with code manipulating file descriptor table. With patches mentioned here at the very least we can decompose the filedesc lock into 2: one to protect fd_ cdir/jdir/rdir and the other one for the rest. However, I believe we should got a step further and split struct filedesc instead. Specifically, we can move aforementioned vnodes to a copy-on-write structure managed similarly to thread credentials. After such an action there is no lock to take during lookups. Further, since vnodes are guaranteed to be stable we don't have to vref+vrele any of them, apart from the case where the vnode in question is going to be returned. This will make chdir more expensive, but file lookups are way more frequent so it should be worth it. Thoughts? Mateusz Guzik (2): vfs: avoid spurious vref+vrele for absolute lookups audit: utilize vnode pointer found by namei instead of looking it up again sys/kern/vfs_lookup.c | 129 ++++++++++++++++++++---------------- sys/security/audit/audit.h | 14 ++++ sys/security/audit/audit_arg.c | 36 ++++++++++ sys/security/audit/audit_bsm_klib.c | 82 +++++++++++++++-------- sys/security/audit/audit_private.h | 2 + 5 files changed, 178 insertions(+), 85 deletions(-) -- 2.4.5 From owner-freebsd-fs@freebsd.org Mon Jul 6 03:05:40 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C20A498EC0A for ; Mon, 6 Jul 2015 03:05:40 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x235.google.com (mail-wg0-x235.google.com [IPv6:2a00:1450:400c:c00::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5D3161D57; Mon, 6 Jul 2015 03:05:40 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgjx7 with SMTP id x7so128112343wgj.2; Sun, 05 Jul 2015 20:05:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DOvdJngvdWAg8DefsXCXhr8EkNIum23GeXfO+EN45BE=; b=vYbybprWU5ITRYoEyvKMZ1ThKcL04Qj1Y4yhWasIvhjm9YWbx03xFj3vhvwMQ4cFhR DBL1RzBQM+TRvnjIpTNKw6B/kDPjEIO4Dn3qGtz7Tv3V/t4+AhLK3pbgh7StVsUtfWlA sCRuO/fmbAV0RI5JEWabTg9KvtyQcWXG9O5fEvLz+eI18tcOUAXprDADJDUFbDbmLU2D AYaKlLbz4afpDZJC2alGQifGv9CiI4unxlLBxTdyaKD2ZHNebK6uJOgZl5uLnb9JZX0O e49EMeAiJEo8Hj1xKHqRWOuFXrJuO9Hk6csy5ADWpWPv65AmDvOCGH/ryOsSOhj/8CaJ ZG8Q== X-Received: by 10.194.58.69 with SMTP id o5mr91761227wjq.22.1436151938754; Sun, 05 Jul 2015 20:05:38 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by mx.google.com with ESMTPSA id pd7sm25571851wjb.27.2015.07.05.20.05.36 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Jul 2015 20:05:37 -0700 (PDT) From: Mateusz Guzik To: freebsd-fs@freebsd.org Cc: kib@freesd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: [PATCH 1/2] vfs: avoid spurious vref/vrele for absolute lookups Date: Mon, 6 Jul 2015 05:05:31 +0200 Message-Id: <1436151932-12514-2-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436151932-12514-1-git-send-email-mjguzik@gmail.com> References: <1436151932-12514-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 03:05:40 -0000 From: Mateusz Guzik namei used to vref fd_cdir, which was immediatley vrele'd on entry to the loop. Simplify error handling and remove type checking for ni_startdir vnode. It is only set by nfs which does the check on its own. Assert the correct type instead. --- sys/kern/vfs_lookup.c | 92 ++++++++++++++++++++++++++++----------------------- 1 file changed, 51 insertions(+), 41 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 5dc07dc..c5218ec 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -109,6 +109,27 @@ namei_cleanup_cnp(struct componentname *cnp) #endif } +static int +namei_handle_root(struct nameidata *ndp, struct vnode **dpp) +{ + struct componentname *cnp = &ndp->ni_cnd; + + if (ndp->ni_strictrelative != 0) { +#ifdef KTRACE + if (KTRPOINT(curthread, KTR_CAPFAIL)) + ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); +#endif + return (ENOTCAPABLE); + } + while (*(cnp->cn_nameptr) == '/') { + cnp->cn_nameptr++; + ndp->ni_pathlen--; + } + *dpp = ndp->ni_rootdir; + VREF(*dpp); + return (0); +} + /* * Convert a pathname into a pointer to a locked vnode. * @@ -148,6 +169,8 @@ namei(struct nameidata *ndp) ("namei: nameiop contaminated with flags")); KASSERT((cnp->cn_flags & OPMASK) == 0, ("namei: flags contaminated with nameiops")); + if (ndp->ni_startdir != NULL) + MPASS(ndp->ni_startdir->v_type == VDIR); if (!lookup_shared) cnp->cn_flags &= ~LOCKSHARED; fdp = p->p_fd; @@ -220,12 +243,16 @@ namei(struct nameidata *ndp) if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); - dp = NULL; - if (cnp->cn_pnbuf[0] != '/') { + cnp->cn_nameptr = cnp->cn_pnbuf; + if (cnp->cn_pnbuf[0] == '/') { + error = namei_handle_root(ndp, &dp); + } else { if (ndp->ni_startdir != NULL) { dp = ndp->ni_startdir; - error = 0; - } else if (ndp->ni_dirfd != AT_FDCWD) { + } else if (ndp->ni_dirfd == AT_FDCWD) { + dp = fdp->fd_cdir; + VREF(dp); + } else { cap_rights_t rights; rights = ndp->ni_rightsneeded; @@ -251,51 +278,22 @@ namei(struct nameidata *ndp) ndp->ni_strictrelative = 1; } #endif - } - if (error != 0 || dp != NULL) { - FILEDESC_SUNLOCK(fdp); - if (error == 0 && dp->v_type != VDIR) { - vrele(dp); + if (error == 0 && dp->v_type != VDIR) error = ENOTDIR; - } - } - if (error) { - namei_cleanup_cnp(cnp); - return (error); } } - if (dp == NULL) { - dp = fdp->fd_cdir; - VREF(dp); - FILEDESC_SUNLOCK(fdp); - if (ndp->ni_startdir != NULL) + FILEDESC_SUNLOCK(fdp); + if (error != 0) { + if (dp != NULL) + vrele(dp); + if (ndp->ni_startdir != NULL && dp != ndp->ni_startdir) vrele(ndp->ni_startdir); + namei_cleanup_cnp(cnp); + return (error); } SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, cnp->cn_flags, 0, 0); for (;;) { - /* - * Check if root directory should replace current directory. - * Done at start of translation and after symbolic link. - */ - cnp->cn_nameptr = cnp->cn_pnbuf; - if (*(cnp->cn_nameptr) == '/') { - vrele(dp); - if (ndp->ni_strictrelative != 0) { -#ifdef KTRACE - if (KTRPOINT(curthread, KTR_CAPFAIL)) - ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); -#endif - namei_cleanup_cnp(cnp); - return (ENOTCAPABLE); - } - while (*(cnp->cn_nameptr) == '/') { - cnp->cn_nameptr++; - ndp->ni_pathlen--; - } - dp = ndp->ni_rootdir; - VREF(dp); - } ndp->ni_startdir = dp; error = lookup(ndp); if (error) { @@ -370,6 +368,18 @@ namei(struct nameidata *ndp) ndp->ni_pathlen += linklen; vput(ndp->ni_vp); dp = ndp->ni_dvp; + /* + * Check if root directory should replace current directory. + */ + cnp->cn_nameptr = cnp->cn_pnbuf; + if (*(cnp->cn_nameptr) == '/') { + vrele(dp); + error = namei_handle_root(ndp, &dp); + if (error != 0) { + namei_cleanup_cnp(cnp); + return (error); + } + } } namei_cleanup_cnp(cnp); vput(ndp->ni_vp); -- 2.4.5 From owner-freebsd-fs@freebsd.org Mon Jul 6 03:05:42 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B7F5A98EC26 for ; Mon, 6 Jul 2015 03:05:42 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 481F01D6B; Mon, 6 Jul 2015 03:05:42 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wibdq8 with SMTP id dq8so140451177wib.1; Sun, 05 Jul 2015 20:05:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JaoGmXOJ1W3fx7OyHZz+9qTJhQN+C4r5PJADw74qGJc=; b=M/W/C728dg2C4Qw4RTzGQjdkk1/4/g837ppT12jPiy8shWZKPo1FxhNTljEjKhrba4 DTjVeDLsDOkNwHCBifQHKAsdAIBWBoJE4qxzYCQoUCHu/29kEc3h+wWATBr1AAzOkHVj y71i1tJsNzsaKoIQwLo/T9Cpub514M8bgbIYysyVHcp8uIrkgGYse0NeW0pVpuC/xk1H qOIPVj9nYeA8JANwVeHDnM9/H8HqZiz4XfGJrSZSvD2oPJ7vJ4f68i0pj1KUJZ56k7/o WYzge19gk/3nqMxx3B3cE3V7b+H3y59I1ALfe4qSIOHiaimssv2Z2R9aKeG4FlyYFvTA wjQA== X-Received: by 10.194.22.105 with SMTP id c9mr45960608wjf.120.1436151940475; Sun, 05 Jul 2015 20:05:40 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by mx.google.com with ESMTPSA id pd7sm25571851wjb.27.2015.07.05.20.05.38 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Jul 2015 20:05:39 -0700 (PDT) From: Mateusz Guzik To: freebsd-fs@freebsd.org Cc: kib@freesd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: [PATCH 2/2] audit: utilize vnode pointer found by namei instead of looking it up again Date: Mon, 6 Jul 2015 05:05:32 +0200 Message-Id: <1436151932-12514-3-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436151932-12514-1-git-send-email-mjguzik@gmail.com> References: <1436151932-12514-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 03:05:42 -0000 From: Mateusz Guzik With the file descriptor translated only once, the code no longer imposes the need to hold filedesc lock, previously needed to make sure namei and audit translation return the same vnode. --- sys/kern/vfs_lookup.c | 41 +++++++++++-------- sys/security/audit/audit.h | 14 +++++++ sys/security/audit/audit_arg.c | 36 ++++++++++++++++ sys/security/audit/audit_bsm_klib.c | 82 ++++++++++++++++++++++++------------- sys/security/audit/audit_private.h | 2 + 5 files changed, 129 insertions(+), 46 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index c5218ec..4fbe18b 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) struct vnode *dp; /* the directory we are searching */ struct iovec aiov; /* uio for reading symbolic links */ struct uio auio; - int error, linklen; + int error, needcapcheck, linklen; struct componentname *cnp = &ndp->ni_cnd; struct thread *td = cnp->cn_thread; struct proc *p = td->td_proc; @@ -235,14 +235,7 @@ namei(struct nameidata *ndp) ndp->ni_rootdir = fdp->fd_rdir; ndp->ni_topdir = fdp->fd_jdir; - /* - * If we are auditing the kernel pathname, save the user pathname. - */ - if (cnp->cn_flags & AUDITVNODE1) - AUDIT_ARG_UPATH1(td, ndp->ni_dirfd, cnp->cn_pnbuf); - if (cnp->cn_flags & AUDITVNODE2) - AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); - + needcapcheck = 0; cnp->cn_nameptr = cnp->cn_pnbuf; if (cnp->cn_pnbuf[0] == '/') { error = namei_handle_root(ndp, &dp); @@ -253,17 +246,33 @@ namei(struct nameidata *ndp) dp = fdp->fd_cdir; VREF(dp); } else { - cap_rights_t rights; - - rights = ndp->ni_rightsneeded; - cap_rights_set(&rights, CAP_LOOKUP); + needcapcheck = 1; if (cnp->cn_flags & AUDITVNODE1) AUDIT_ARG_ATFD1(ndp->ni_dirfd); if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_ATFD2(ndp->ni_dirfd); error = fgetvp_rights(td, ndp->ni_dirfd, - &rights, &ndp->ni_filecaps, &dp); + NULL, &ndp->ni_filecaps, &dp); + if (error == 0 && dp->v_type != VDIR) + error = ENOTDIR; + } + } + /* + * If we are auditing the kernel pathname, save the user pathname. + */ + if (cnp->cn_flags & AUDITVNODE1) + AUDIT_ARG_UPATH1_VP(td, dp, cnp->cn_pnbuf); + if (cnp->cn_flags & AUDITVNODE2) + AUDIT_ARG_UPATH2_VP(td, dp, cnp->cn_pnbuf); + FILEDESC_SUNLOCK(fdp); + if (error == 0 && needcapcheck) { + cap_rights_t rights; + + rights = ndp->ni_rightsneeded; + cap_rights_set(&rights, CAP_LOOKUP); + + error = cap_check(&ndp->ni_filecaps.fc_rights, &rights); #ifdef CAPABILITIES /* * If file descriptor doesn't have all rights, @@ -278,11 +287,7 @@ namei(struct nameidata *ndp) ndp->ni_strictrelative = 1; } #endif - if (error == 0 && dp->v_type != VDIR) - error = ENOTDIR; - } } - FILEDESC_SUNLOCK(fdp); if (error != 0) { if (dp != NULL) vrele(dp); diff --git a/sys/security/audit/audit.h b/sys/security/audit/audit.h index 559d571..2f6420f 100644 --- a/sys/security/audit/audit.h +++ b/sys/security/audit/audit.h @@ -101,6 +101,10 @@ void audit_arg_auditinfo(struct auditinfo *au_info); void audit_arg_auditinfo_addr(struct auditinfo_addr *au_info); void audit_arg_upath1(struct thread *td, int dirfd, char *upath); void audit_arg_upath2(struct thread *td, int dirfd, char *upath); +void audit_arg_upath1_vp(struct thread *td, struct vnode *dirvp, + char *upath); +void audit_arg_upath2_vp(struct thread *td, struct vnode *dirvp, + char *upath); void audit_arg_vnode1(struct vnode *vp); void audit_arg_vnode2(struct vnode *vp); void audit_arg_text(char *text); @@ -297,6 +301,16 @@ void audit_thread_free(struct thread *td); audit_arg_upath2((td), (dirfd), (upath)); \ } while (0) +#define AUDIT_ARG_UPATH1_VP(td, dirvp, upath) do { \ + if (AUDITING_TD(curthread)) \ + audit_arg_upath1_vp((td), (dirvp), (upath)); \ +} while (0) + +#define AUDIT_ARG_UPATH2_VP(td, dirvp, upath) do { \ + if (AUDITING_TD(curthread)) \ + audit_arg_upath2_vp((td), (dirvp), (upath)); \ +} while (0) + #define AUDIT_ARG_VALUE(value) do { \ if (AUDITING_TD(curthread)) \ audit_arg_value((value)); \ diff --git a/sys/security/audit/audit_arg.c b/sys/security/audit/audit_arg.c index c006b90..c019bad 100644 --- a/sys/security/audit/audit_arg.c +++ b/sys/security/audit/audit_arg.c @@ -719,6 +719,16 @@ audit_arg_upath(struct thread *td, int dirfd, char *upath, char **pathp) audit_canon_path(td, dirfd, upath, *pathp); } +static void +audit_arg_upath_vp(struct thread *td, struct vnode *dirvp, char *upath, + char **pathp) +{ + + if (*pathp == NULL) + *pathp = malloc(MAXPATHLEN, M_AUDITPATH, M_WAITOK); + audit_canon_path_vp(td, dirvp, upath, *pathp); +} + void audit_arg_upath1(struct thread *td, int dirfd, char *upath) { @@ -745,6 +755,32 @@ audit_arg_upath2(struct thread *td, int dirfd, char *upath) ARG_SET_VALID(ar, ARG_UPATH2); } +void +audit_arg_upath1_vp(struct thread *td, struct vnode *dirvp, char *upath) +{ + struct kaudit_record *ar; + + ar = currecord(); + if (ar == NULL) + return; + + audit_arg_upath_vp(td, dirvp, upath, &ar->k_ar.ar_arg_upath1); + ARG_SET_VALID(ar, ARG_UPATH1); +} + +void +audit_arg_upath2_vp(struct thread *td, struct vnode *dirvp, char *upath) +{ + struct kaudit_record *ar; + + ar = currecord(); + if (ar == NULL) + return; + + audit_arg_upath_vp(td, dirvp, upath, &ar->k_ar.ar_arg_upath2); + ARG_SET_VALID(ar, ARG_UPATH2); +} + /* * Function to save the path and vnode attr information into the audit * record. diff --git a/sys/security/audit/audit_bsm_klib.c b/sys/security/audit/audit_bsm_klib.c index b687a15..7e8dac5 100644 --- a/sys/security/audit/audit_bsm_klib.c +++ b/sys/security/audit/audit_bsm_klib.c @@ -461,23 +461,19 @@ auditon_command_event(int cmd) * but this results in a volfs name written to the audit log. So we will * leave the filename starting with '/' in the audit log in this case. */ -void -audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) +static void +audit_canon_path_common(struct thread *td, struct vnode *dirvp, + char *path, char *cpath) { struct vnode *cvnp, *rvnp; - char *rbuf, *fbuf, *copy; struct filedesc *fdp; + char *rbuf, *fbuf, *copy; struct sbuf sbf; - cap_rights_t rights; int error, needslash; - WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", - __func__, __FILE__, __LINE__); - copy = path; - rvnp = cvnp = NULL; + cvnp = rvnp = NULL; fdp = td->td_proc->p_fd; - FILEDESC_SLOCK(fdp); /* * Make sure that we handle the chroot(2) case. If there is an * alternate root directory, prepend it to the audited pathname. @@ -492,22 +488,7 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) * path. */ if (*path != '/') { - if (dirfd == AT_FDCWD) { - cvnp = fdp->fd_cdir; - vhold(cvnp); - } else { - /* XXX: fgetvp() that vhold()s vnode instead of vref()ing it would be better */ - error = fgetvp(td, dirfd, cap_rights_init(&rights), &cvnp); - if (error) { - FILEDESC_SUNLOCK(fdp); - cpath[0] = '\0'; - if (rvnp != NULL) - vdrop(rvnp); - return; - } - vhold(cvnp); - vrele(cvnp); - } + cvnp = dirvp; needslash = (fdp->fd_rdir != cvnp); } else { needslash = 1; @@ -536,8 +517,6 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) vdrop(rvnp); if (error) { cpath[0] = '\0'; - if (cvnp != NULL) - vdrop(cvnp); return; } (void) sbuf_cat(&sbf, rbuf); @@ -545,7 +524,6 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) } if (cvnp != NULL) { error = vn_fullpath(td, cvnp, &rbuf, &fbuf); - vdrop(cvnp); if (error) { cpath[0] = '\0'; return; @@ -571,3 +549,51 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) } sbuf_finish(&sbf); } + +void +audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) +{ + struct vnode *dirvp; + struct filedesc *fdp; + cap_rights_t rights; + + WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", + __func__, __FILE__, __LINE__); + + dirvp = NULL; + fdp = td->td_proc->p_fd; + FILEDESC_SLOCK(fdp); + if (*path != '/') { + if (dirfd == AT_FDCWD) { + dirvp = fdp->fd_cdir; + vhold(dirvp); + } else { + /* XXX: fgetvp() that vhold()s vnode instead of vref()ing it would be better */ + if (fgetvp(td, dirfd, cap_rights_init(&rights), &dirvp) != 0) { + FILEDESC_SUNLOCK(fdp); + cpath[0] = '\0'; + return; + } + vhold(dirvp); + vrele(dirvp); + } + } + + audit_canon_path_common(td, dirvp, path, cpath); + if (dirvp != NULL) + vdrop(dirvp); +} + +void +audit_canon_path_vp(struct thread *td, struct vnode *dirvp, char *path, char *cpath) +{ + + WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", + __func__, __FILE__, __LINE__); + + if (dirvp == NULL) + return; + + FILEDESC_SLOCK(td->td_proc->p_fd); + audit_canon_path_common(td, dirvp, path, cpath); +} diff --git a/sys/security/audit/audit_private.h b/sys/security/audit/audit_private.h index b5c373a..7ecf3a6 100644 --- a/sys/security/audit/audit_private.h +++ b/sys/security/audit/audit_private.h @@ -394,6 +394,8 @@ au_event_t audit_msgctl_to_event(int cmd); au_event_t audit_semctl_to_event(int cmr); void audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath); +void audit_canon_path_vp(struct thread *td, struct vnode *dirvp, + char *path, char *cpath); au_event_t auditon_command_event(int cmd); /* -- 2.4.5 From owner-freebsd-fs@freebsd.org Mon Jul 6 03:07:20 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4560698ECB5 for ; Mon, 6 Jul 2015 03:07:20 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x230.google.com (mail-wg0-x230.google.com [IPv6:2a00:1450:400c:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id DBD56100E; Mon, 6 Jul 2015 03:07:19 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wguu7 with SMTP id u7so128415041wgu.3; Sun, 05 Jul 2015 20:07:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=YZ3+XgzANHejIquJdREkdlK+VMxG8PMdZTjy2YnYvUk=; b=qwa4R4ZD8upduQUCcXjads3+vYNvkhbzQcXeYXWiEFD5Yf8QFy2D+NUocWhHiFlP1g v/4W+9SMCOt/TwbGjA5u7btZ0IdBhUbjHTpJB0PTU3JzoNckTbvAfykVuEwqm333Hdw/ G+47l2WVgcAVkEF6C1ThkV6UPcezdHikWo/BTszxVe2T/+8nLXLj3zjFKaXk20tnHw+6 KRRlCJiPH5SccfTASr13u6+H+uqlXqXPsVdCWKEmiAFWDgjOzAmTTdXFm8mejA6XMgc3 CoMBZ0lqDShD19DPgvfDKk3mJP9SWmCibjB0fFXHU8S5Xm/VOZSwRdejjLgOvNF+NYGl Saqw== X-Received: by 10.194.78.110 with SMTP id a14mr96317113wjx.87.1436152038249; Sun, 05 Jul 2015 20:07:18 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by mx.google.com with ESMTPSA id df1sm44689075wib.12.2015.07.05.20.07.16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Jul 2015 20:07:17 -0700 (PDT) From: Mateusz Guzik To: freebsd-fs@freebsd.org Cc: kib@freebsd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: [PATCH 0/2] slightly cheaper file lookups + an idea Date: Mon, 6 Jul 2015 05:07:13 +0200 Message-Id: <1436152035-12564-1-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 03:07:20 -0000 From: Mateusz Guzik First, 2 simple patches which imho should go in regardless of whether the idea below is accepted. The first one shuffles code around to make the second one easier, which removes a requirement to hold filedesc lock imposed by auditing. With that out of the way: filedesc lock is held so that fd_cdir and fd_rdir vnodes can be obtained for lookup purposes. In effect namei unnecessarily competes with code manipulating file descriptor table. With patches mentioned here at the very least we can decompose the filedesc lock into 2: one to protect fd_ cdir/jdir/rdir and the other one for the rest. However, I believe we should got a step further and split struct filedesc instead. Specifically, we can move aforementioned vnodes to a copy-on-write structure managed similarly to thread credentials. After such an action there is no lock to take during lookups. Further, since vnodes are guaranteed to be stable we don't have to vref+vrele any of them, apart from the case where the vnode in question is going to be returned. This will make chdir more expensive, but file lookups are way more frequent so it should be worth it. Thoughts? Mateusz Guzik (2): vfs: avoid spurious vref+vrele for absolute lookups audit: utilize vnode pointer found by namei instead of looking it up again sys/kern/vfs_lookup.c | 129 ++++++++++++++++++++---------------- sys/security/audit/audit.h | 14 ++++ sys/security/audit/audit_arg.c | 36 ++++++++++ sys/security/audit/audit_bsm_klib.c | 82 +++++++++++++++-------- sys/security/audit/audit_private.h | 2 + 5 files changed, 178 insertions(+), 85 deletions(-) -- 2.4.5 From owner-freebsd-fs@freebsd.org Mon Jul 6 03:07:21 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F0ACE98ECBF for ; Mon, 6 Jul 2015 03:07:21 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com [IPv6:2a00:1450:400c:c05::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 8B33E1014; Mon, 6 Jul 2015 03:07:21 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wifm2 with SMTP id m2so16911188wif.1; Sun, 05 Jul 2015 20:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DOvdJngvdWAg8DefsXCXhr8EkNIum23GeXfO+EN45BE=; b=O5czrBY9/DkivfdguWQB5eDIp2shWZCCTdsg1ZcADLiKB70bpFsLS/SIyMr/nT9x09 FY0dwNRrLvfSWkWOrDb89WZzpV/gVNK0RvAzwDBar2w8frzF6GI1tKnRR4KaVoIxIPiL hcUEpkORKilIx0QxsGa1EYrVF2dbqwdLIsGPsBSekZDrWV5xeVxkKZsIIc0GlwwKRFGZ MXo8RIIWiVptpkA+GUeBU65RbN0mf8ZFst1xylxnHk0kMUACoHl0tJlb6uTi/6zZFU/X LwfsZHvq9WxeNs7UGfvYJ7epO9whJgvYaZbIDHZnccTlOWSnG7jZJ7Jm1uPvy5muSCN0 xgyA== X-Received: by 10.194.123.4 with SMTP id lw4mr89000333wjb.94.1436152039939; Sun, 05 Jul 2015 20:07:19 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by mx.google.com with ESMTPSA id df1sm44689075wib.12.2015.07.05.20.07.18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Jul 2015 20:07:19 -0700 (PDT) From: Mateusz Guzik To: freebsd-fs@freebsd.org Cc: kib@freebsd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: [PATCH 1/2] vfs: avoid spurious vref/vrele for absolute lookups Date: Mon, 6 Jul 2015 05:07:14 +0200 Message-Id: <1436152035-12564-2-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436152035-12564-1-git-send-email-mjguzik@gmail.com> References: <1436152035-12564-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 03:07:22 -0000 From: Mateusz Guzik namei used to vref fd_cdir, which was immediatley vrele'd on entry to the loop. Simplify error handling and remove type checking for ni_startdir vnode. It is only set by nfs which does the check on its own. Assert the correct type instead. --- sys/kern/vfs_lookup.c | 92 ++++++++++++++++++++++++++++----------------------- 1 file changed, 51 insertions(+), 41 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 5dc07dc..c5218ec 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -109,6 +109,27 @@ namei_cleanup_cnp(struct componentname *cnp) #endif } +static int +namei_handle_root(struct nameidata *ndp, struct vnode **dpp) +{ + struct componentname *cnp = &ndp->ni_cnd; + + if (ndp->ni_strictrelative != 0) { +#ifdef KTRACE + if (KTRPOINT(curthread, KTR_CAPFAIL)) + ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); +#endif + return (ENOTCAPABLE); + } + while (*(cnp->cn_nameptr) == '/') { + cnp->cn_nameptr++; + ndp->ni_pathlen--; + } + *dpp = ndp->ni_rootdir; + VREF(*dpp); + return (0); +} + /* * Convert a pathname into a pointer to a locked vnode. * @@ -148,6 +169,8 @@ namei(struct nameidata *ndp) ("namei: nameiop contaminated with flags")); KASSERT((cnp->cn_flags & OPMASK) == 0, ("namei: flags contaminated with nameiops")); + if (ndp->ni_startdir != NULL) + MPASS(ndp->ni_startdir->v_type == VDIR); if (!lookup_shared) cnp->cn_flags &= ~LOCKSHARED; fdp = p->p_fd; @@ -220,12 +243,16 @@ namei(struct nameidata *ndp) if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); - dp = NULL; - if (cnp->cn_pnbuf[0] != '/') { + cnp->cn_nameptr = cnp->cn_pnbuf; + if (cnp->cn_pnbuf[0] == '/') { + error = namei_handle_root(ndp, &dp); + } else { if (ndp->ni_startdir != NULL) { dp = ndp->ni_startdir; - error = 0; - } else if (ndp->ni_dirfd != AT_FDCWD) { + } else if (ndp->ni_dirfd == AT_FDCWD) { + dp = fdp->fd_cdir; + VREF(dp); + } else { cap_rights_t rights; rights = ndp->ni_rightsneeded; @@ -251,51 +278,22 @@ namei(struct nameidata *ndp) ndp->ni_strictrelative = 1; } #endif - } - if (error != 0 || dp != NULL) { - FILEDESC_SUNLOCK(fdp); - if (error == 0 && dp->v_type != VDIR) { - vrele(dp); + if (error == 0 && dp->v_type != VDIR) error = ENOTDIR; - } - } - if (error) { - namei_cleanup_cnp(cnp); - return (error); } } - if (dp == NULL) { - dp = fdp->fd_cdir; - VREF(dp); - FILEDESC_SUNLOCK(fdp); - if (ndp->ni_startdir != NULL) + FILEDESC_SUNLOCK(fdp); + if (error != 0) { + if (dp != NULL) + vrele(dp); + if (ndp->ni_startdir != NULL && dp != ndp->ni_startdir) vrele(ndp->ni_startdir); + namei_cleanup_cnp(cnp); + return (error); } SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, cnp->cn_flags, 0, 0); for (;;) { - /* - * Check if root directory should replace current directory. - * Done at start of translation and after symbolic link. - */ - cnp->cn_nameptr = cnp->cn_pnbuf; - if (*(cnp->cn_nameptr) == '/') { - vrele(dp); - if (ndp->ni_strictrelative != 0) { -#ifdef KTRACE - if (KTRPOINT(curthread, KTR_CAPFAIL)) - ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); -#endif - namei_cleanup_cnp(cnp); - return (ENOTCAPABLE); - } - while (*(cnp->cn_nameptr) == '/') { - cnp->cn_nameptr++; - ndp->ni_pathlen--; - } - dp = ndp->ni_rootdir; - VREF(dp); - } ndp->ni_startdir = dp; error = lookup(ndp); if (error) { @@ -370,6 +368,18 @@ namei(struct nameidata *ndp) ndp->ni_pathlen += linklen; vput(ndp->ni_vp); dp = ndp->ni_dvp; + /* + * Check if root directory should replace current directory. + */ + cnp->cn_nameptr = cnp->cn_pnbuf; + if (*(cnp->cn_nameptr) == '/') { + vrele(dp); + error = namei_handle_root(ndp, &dp); + if (error != 0) { + namei_cleanup_cnp(cnp); + return (error); + } + } } namei_cleanup_cnp(cnp); vput(ndp->ni_vp); -- 2.4.5 From owner-freebsd-fs@freebsd.org Mon Jul 6 03:07:23 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3E33198ECD4 for ; Mon, 6 Jul 2015 03:07:23 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22f.google.com (mail-wg0-x22f.google.com [IPv6:2a00:1450:400c:c00::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CA9761029; Mon, 6 Jul 2015 03:07:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgck11 with SMTP id k11so128307795wgc.0; Sun, 05 Jul 2015 20:07:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JaoGmXOJ1W3fx7OyHZz+9qTJhQN+C4r5PJADw74qGJc=; b=NKKtTE2+MZa51vUJVFkWsnNp5vF44Xgp3SHPHB1xXQn7Wtpm9+Oz9NkPkPUFI7qQP5 7WvqEjPm6emNOrSampGfFz4HYB64dMAzoP8KAZ/Ys7Y2vK+5tRYL5ZxGoflRSVn3Suun 7IyBkYTeJsFG9k0YDWljnhDat7RU3vlyLdbD5riDkKPs/2rC2jxgv04ROWDToC2pixsl R/2U+TZH1xGfOQYOMdmAy/RueXjPoUIETxIKjkPCdTSz2urMjggkNsQ/BDXoeqMPtvKp o8vfzkeuuKc0GCKYJ5q61Nt6NrROltUQdQig3iju3jdqWCkoktUALmRKJmbWmbBOqwuc uKsQ== X-Received: by 10.194.203.138 with SMTP id kq10mr94808917wjc.124.1436152041321; Sun, 05 Jul 2015 20:07:21 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by mx.google.com with ESMTPSA id df1sm44689075wib.12.2015.07.05.20.07.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 05 Jul 2015 20:07:20 -0700 (PDT) From: Mateusz Guzik To: freebsd-fs@freebsd.org Cc: kib@freebsd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: [PATCH 2/2] audit: utilize vnode pointer found by namei instead of looking it up again Date: Mon, 6 Jul 2015 05:07:15 +0200 Message-Id: <1436152035-12564-3-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436152035-12564-1-git-send-email-mjguzik@gmail.com> References: <1436152035-12564-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 03:07:23 -0000 From: Mateusz Guzik With the file descriptor translated only once, the code no longer imposes the need to hold filedesc lock, previously needed to make sure namei and audit translation return the same vnode. --- sys/kern/vfs_lookup.c | 41 +++++++++++-------- sys/security/audit/audit.h | 14 +++++++ sys/security/audit/audit_arg.c | 36 ++++++++++++++++ sys/security/audit/audit_bsm_klib.c | 82 ++++++++++++++++++++++++------------- sys/security/audit/audit_private.h | 2 + 5 files changed, 129 insertions(+), 46 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index c5218ec..4fbe18b 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) struct vnode *dp; /* the directory we are searching */ struct iovec aiov; /* uio for reading symbolic links */ struct uio auio; - int error, linklen; + int error, needcapcheck, linklen; struct componentname *cnp = &ndp->ni_cnd; struct thread *td = cnp->cn_thread; struct proc *p = td->td_proc; @@ -235,14 +235,7 @@ namei(struct nameidata *ndp) ndp->ni_rootdir = fdp->fd_rdir; ndp->ni_topdir = fdp->fd_jdir; - /* - * If we are auditing the kernel pathname, save the user pathname. - */ - if (cnp->cn_flags & AUDITVNODE1) - AUDIT_ARG_UPATH1(td, ndp->ni_dirfd, cnp->cn_pnbuf); - if (cnp->cn_flags & AUDITVNODE2) - AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); - + needcapcheck = 0; cnp->cn_nameptr = cnp->cn_pnbuf; if (cnp->cn_pnbuf[0] == '/') { error = namei_handle_root(ndp, &dp); @@ -253,17 +246,33 @@ namei(struct nameidata *ndp) dp = fdp->fd_cdir; VREF(dp); } else { - cap_rights_t rights; - - rights = ndp->ni_rightsneeded; - cap_rights_set(&rights, CAP_LOOKUP); + needcapcheck = 1; if (cnp->cn_flags & AUDITVNODE1) AUDIT_ARG_ATFD1(ndp->ni_dirfd); if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_ATFD2(ndp->ni_dirfd); error = fgetvp_rights(td, ndp->ni_dirfd, - &rights, &ndp->ni_filecaps, &dp); + NULL, &ndp->ni_filecaps, &dp); + if (error == 0 && dp->v_type != VDIR) + error = ENOTDIR; + } + } + /* + * If we are auditing the kernel pathname, save the user pathname. + */ + if (cnp->cn_flags & AUDITVNODE1) + AUDIT_ARG_UPATH1_VP(td, dp, cnp->cn_pnbuf); + if (cnp->cn_flags & AUDITVNODE2) + AUDIT_ARG_UPATH2_VP(td, dp, cnp->cn_pnbuf); + FILEDESC_SUNLOCK(fdp); + if (error == 0 && needcapcheck) { + cap_rights_t rights; + + rights = ndp->ni_rightsneeded; + cap_rights_set(&rights, CAP_LOOKUP); + + error = cap_check(&ndp->ni_filecaps.fc_rights, &rights); #ifdef CAPABILITIES /* * If file descriptor doesn't have all rights, @@ -278,11 +287,7 @@ namei(struct nameidata *ndp) ndp->ni_strictrelative = 1; } #endif - if (error == 0 && dp->v_type != VDIR) - error = ENOTDIR; - } } - FILEDESC_SUNLOCK(fdp); if (error != 0) { if (dp != NULL) vrele(dp); diff --git a/sys/security/audit/audit.h b/sys/security/audit/audit.h index 559d571..2f6420f 100644 --- a/sys/security/audit/audit.h +++ b/sys/security/audit/audit.h @@ -101,6 +101,10 @@ void audit_arg_auditinfo(struct auditinfo *au_info); void audit_arg_auditinfo_addr(struct auditinfo_addr *au_info); void audit_arg_upath1(struct thread *td, int dirfd, char *upath); void audit_arg_upath2(struct thread *td, int dirfd, char *upath); +void audit_arg_upath1_vp(struct thread *td, struct vnode *dirvp, + char *upath); +void audit_arg_upath2_vp(struct thread *td, struct vnode *dirvp, + char *upath); void audit_arg_vnode1(struct vnode *vp); void audit_arg_vnode2(struct vnode *vp); void audit_arg_text(char *text); @@ -297,6 +301,16 @@ void audit_thread_free(struct thread *td); audit_arg_upath2((td), (dirfd), (upath)); \ } while (0) +#define AUDIT_ARG_UPATH1_VP(td, dirvp, upath) do { \ + if (AUDITING_TD(curthread)) \ + audit_arg_upath1_vp((td), (dirvp), (upath)); \ +} while (0) + +#define AUDIT_ARG_UPATH2_VP(td, dirvp, upath) do { \ + if (AUDITING_TD(curthread)) \ + audit_arg_upath2_vp((td), (dirvp), (upath)); \ +} while (0) + #define AUDIT_ARG_VALUE(value) do { \ if (AUDITING_TD(curthread)) \ audit_arg_value((value)); \ diff --git a/sys/security/audit/audit_arg.c b/sys/security/audit/audit_arg.c index c006b90..c019bad 100644 --- a/sys/security/audit/audit_arg.c +++ b/sys/security/audit/audit_arg.c @@ -719,6 +719,16 @@ audit_arg_upath(struct thread *td, int dirfd, char *upath, char **pathp) audit_canon_path(td, dirfd, upath, *pathp); } +static void +audit_arg_upath_vp(struct thread *td, struct vnode *dirvp, char *upath, + char **pathp) +{ + + if (*pathp == NULL) + *pathp = malloc(MAXPATHLEN, M_AUDITPATH, M_WAITOK); + audit_canon_path_vp(td, dirvp, upath, *pathp); +} + void audit_arg_upath1(struct thread *td, int dirfd, char *upath) { @@ -745,6 +755,32 @@ audit_arg_upath2(struct thread *td, int dirfd, char *upath) ARG_SET_VALID(ar, ARG_UPATH2); } +void +audit_arg_upath1_vp(struct thread *td, struct vnode *dirvp, char *upath) +{ + struct kaudit_record *ar; + + ar = currecord(); + if (ar == NULL) + return; + + audit_arg_upath_vp(td, dirvp, upath, &ar->k_ar.ar_arg_upath1); + ARG_SET_VALID(ar, ARG_UPATH1); +} + +void +audit_arg_upath2_vp(struct thread *td, struct vnode *dirvp, char *upath) +{ + struct kaudit_record *ar; + + ar = currecord(); + if (ar == NULL) + return; + + audit_arg_upath_vp(td, dirvp, upath, &ar->k_ar.ar_arg_upath2); + ARG_SET_VALID(ar, ARG_UPATH2); +} + /* * Function to save the path and vnode attr information into the audit * record. diff --git a/sys/security/audit/audit_bsm_klib.c b/sys/security/audit/audit_bsm_klib.c index b687a15..7e8dac5 100644 --- a/sys/security/audit/audit_bsm_klib.c +++ b/sys/security/audit/audit_bsm_klib.c @@ -461,23 +461,19 @@ auditon_command_event(int cmd) * but this results in a volfs name written to the audit log. So we will * leave the filename starting with '/' in the audit log in this case. */ -void -audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) +static void +audit_canon_path_common(struct thread *td, struct vnode *dirvp, + char *path, char *cpath) { struct vnode *cvnp, *rvnp; - char *rbuf, *fbuf, *copy; struct filedesc *fdp; + char *rbuf, *fbuf, *copy; struct sbuf sbf; - cap_rights_t rights; int error, needslash; - WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", - __func__, __FILE__, __LINE__); - copy = path; - rvnp = cvnp = NULL; + cvnp = rvnp = NULL; fdp = td->td_proc->p_fd; - FILEDESC_SLOCK(fdp); /* * Make sure that we handle the chroot(2) case. If there is an * alternate root directory, prepend it to the audited pathname. @@ -492,22 +488,7 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) * path. */ if (*path != '/') { - if (dirfd == AT_FDCWD) { - cvnp = fdp->fd_cdir; - vhold(cvnp); - } else { - /* XXX: fgetvp() that vhold()s vnode instead of vref()ing it would be better */ - error = fgetvp(td, dirfd, cap_rights_init(&rights), &cvnp); - if (error) { - FILEDESC_SUNLOCK(fdp); - cpath[0] = '\0'; - if (rvnp != NULL) - vdrop(rvnp); - return; - } - vhold(cvnp); - vrele(cvnp); - } + cvnp = dirvp; needslash = (fdp->fd_rdir != cvnp); } else { needslash = 1; @@ -536,8 +517,6 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) vdrop(rvnp); if (error) { cpath[0] = '\0'; - if (cvnp != NULL) - vdrop(cvnp); return; } (void) sbuf_cat(&sbf, rbuf); @@ -545,7 +524,6 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) } if (cvnp != NULL) { error = vn_fullpath(td, cvnp, &rbuf, &fbuf); - vdrop(cvnp); if (error) { cpath[0] = '\0'; return; @@ -571,3 +549,51 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) } sbuf_finish(&sbf); } + +void +audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) +{ + struct vnode *dirvp; + struct filedesc *fdp; + cap_rights_t rights; + + WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", + __func__, __FILE__, __LINE__); + + dirvp = NULL; + fdp = td->td_proc->p_fd; + FILEDESC_SLOCK(fdp); + if (*path != '/') { + if (dirfd == AT_FDCWD) { + dirvp = fdp->fd_cdir; + vhold(dirvp); + } else { + /* XXX: fgetvp() that vhold()s vnode instead of vref()ing it would be better */ + if (fgetvp(td, dirfd, cap_rights_init(&rights), &dirvp) != 0) { + FILEDESC_SUNLOCK(fdp); + cpath[0] = '\0'; + return; + } + vhold(dirvp); + vrele(dirvp); + } + } + + audit_canon_path_common(td, dirvp, path, cpath); + if (dirvp != NULL) + vdrop(dirvp); +} + +void +audit_canon_path_vp(struct thread *td, struct vnode *dirvp, char *path, char *cpath) +{ + + WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", + __func__, __FILE__, __LINE__); + + if (dirvp == NULL) + return; + + FILEDESC_SLOCK(td->td_proc->p_fd); + audit_canon_path_common(td, dirvp, path, cpath); +} diff --git a/sys/security/audit/audit_private.h b/sys/security/audit/audit_private.h index b5c373a..7ecf3a6 100644 --- a/sys/security/audit/audit_private.h +++ b/sys/security/audit/audit_private.h @@ -394,6 +394,8 @@ au_event_t audit_msgctl_to_event(int cmd); au_event_t audit_semctl_to_event(int cmr); void audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath); +void audit_canon_path_vp(struct thread *td, struct vnode *dirvp, + char *path, char *cpath); au_event_t auditon_command_event(int cmd); /* -- 2.4.5 From owner-freebsd-fs@freebsd.org Mon Jul 6 04:48:50 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 209F49920B4 for ; Mon, 6 Jul 2015 04:48:50 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 014A73B0F for ; Mon, 6 Jul 2015 04:48:49 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id F3E813F6FA for ; Mon, 6 Jul 2015 00:48:47 -0400 (EDT) Message-ID: <559A08AF.9050809@sneakertech.com> Date: Mon, 06 Jul 2015 00:48:47 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD FS Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> In-Reply-To: <20150705210306.GA1048@in-addr.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 04:48:50 -0000 On 2015-07-05 12:03 PM, Freddie Cash wrote: > SMB support is only built-in on Solaris derivatives. You need Samba on > everything else. On 2015-07-05 5:03 PM, Gary Palmer wrote: > The sharesmb option to zfs does not work on FreeBSD. You need to use > Samba. Ok wait, it IS implemented on the Linux versions of ZFS. I thought the FreeBSD version of ZFS superseded all the features of the Linux port? From owner-freebsd-fs@freebsd.org Mon Jul 6 04:57:58 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 90C0F9922CF for ; Mon, 6 Jul 2015 04:57:58 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-ob0-x232.google.com (mail-ob0-x232.google.com [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5C59D3EE8 for ; Mon, 6 Jul 2015 04:57:58 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by obdbs4 with SMTP id bs4so99713638obd.3 for ; Sun, 05 Jul 2015 21:57:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=Yzi9uoPm7AFka9yJjlzGStys1sSARrLrcEK2EBCY6Ms=; b=mGJ+eENvvhkWiTzVso6ZNXHHJnSoOpErgWoY92ZG+wk3FanHdEHaiUhMC4T9WVxAvF DZ7deNhhUqly78rSsXfKPLqs1gpiF0NvvbB/4LAYASw9Ayxwmu1i2KJ8TfZge8RA5mOc w7zfUdSHJoF4dWxgk+c/azqz101l2Dzre3jKlwiit04G/iqLt+lPhjpYQNY6JGtiLToY xOjGJT2R0atDza8wQkNvogsI+I1lajlw4f5Xvy+585qNA5S9oWhOgSZDZdl8mXupr1Ur MNRzSgGqHgDU8EK3qsby6P7jOdKrJtmT9bbBW5DrSEs0bHD2ozQC+3t74lt5igYx/FqG WzEg== MIME-Version: 1.0 X-Received: by 10.202.232.67 with SMTP id f64mr22900782oih.63.1436158677280; Sun, 05 Jul 2015 21:57:57 -0700 (PDT) Received: by 10.76.81.100 with HTTP; Sun, 5 Jul 2015 21:57:57 -0700 (PDT) Received: by 10.76.81.100 with HTTP; Sun, 5 Jul 2015 21:57:57 -0700 (PDT) In-Reply-To: <559A08AF.9050809@sneakertech.com> References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> Date: Sun, 5 Jul 2015 21:57:57 -0700 Message-ID: Subject: Re: A question about ZFS built-in SMB From: Freddie Cash To: Quartz Cc: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 04:57:58 -0000 On Jul 5, 2015 9:49 PM, "Quartz" wrote: > > On 2015-07-05 12:03 PM, Freddie Cash wrote: >> >> SMB support is only built-in on Solaris derivatives. You need Samba on >> everything else. > > > On 2015-07-05 5:03 PM, Gary Palmer wrote: >> >> The sharesmb option to zfs does not work on FreeBSD. You need to use >> Samba. > > > > Ok wait, it IS implemented on the Linux versions of ZFS. I thought the FreeBSD version of ZFS superseded all the features of the Linux port? No, actually, it isn't. :) It works in a similar manner to sharenfs on FreeBSD. You still require a separate NFS server installed, and ask it does it copy the info to an exports file. Similar for sharesmb. You still require Samba being installed on Linux. All the property does is add the filesystem to a separate smb config file (or something like that; never actually used it on Linux). You still require the NFS and SMB packages installed for your distro. Same as you would for any other FS on Linux. Personally, I don't see the use for either of the share properties in zfs. Why treat it different than the other FSes on the system? Just edit exports/smb.conf as per normal. From owner-freebsd-fs@freebsd.org Mon Jul 6 05:40:45 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 062F3994A1B for ; Mon, 6 Jul 2015 05:40:45 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id D98CF38D6 for ; Mon, 6 Jul 2015 05:40:44 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 97BB23F6FC for ; Mon, 6 Jul 2015 01:40:43 -0400 (EDT) Message-ID: <559A14DB.3080905@sneakertech.com> Date: Mon, 06 Jul 2015 01:40:43 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD Filesystems Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 05:40:45 -0000 > No, actually, it isn't. :) It works in a similar manner to sharenfs on > FreeBSD. You still require a separate NFS server installed, and ask it > does it copy the info to an exports file. > > Similar for sharesmb. You still require Samba being installed on Linux. > All the property does is add the filesystem to a separate smb config > file (or something like that; never actually used it on Linux). > > You still require the NFS and SMB packages installed for your distro. > Same as you would for any other FS on Linux. So I'm a little confused here. On Linux, the property is active and usable but only creates the share, meaning you still need the sever software to host it. On FreeBSD, the property doesn't work at all, and you need the server software to do everything......? From owner-freebsd-fs@freebsd.org Mon Jul 6 05:46:55 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CBC8B994C64 for ; Mon, 6 Jul 2015 05:46:55 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id AB0803CA0 for ; Mon, 6 Jul 2015 05:46:55 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id D7C2E3F701 for ; Mon, 6 Jul 2015 01:46:54 -0400 (EDT) Message-ID: <559A164E.70200@sneakertech.com> Date: Mon, 06 Jul 2015 01:46:54 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD Filesystems Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 05:46:55 -0000 > > Ok wait, it IS implemented on the Linux versions of ZFS. I thought > the FreeBSD version of ZFS superseded all the features of the Linux port? > > No, actually, it isn't. :) And what I meant was, I know that the sharesmb properties are both implemented and functional under Linux ZFS (although I don't know how much they can do), so "SMB support is only built-in on Solaris derivatives" is inaccurate (unless you mean "full-stack-no-samba" is only on Solaris). From owner-freebsd-fs@freebsd.org Mon Jul 6 13:52:07 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D136A0CD for ; Mon, 6 Jul 2015 13:52:07 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: from mail-ie0-f182.google.com (mail-ie0-f182.google.com [209.85.223.182]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3D5C21A3B for ; Mon, 6 Jul 2015 13:52:06 +0000 (UTC) (envelope-from lkateley@kateley.com) Received: by iecvh10 with SMTP id vh10so113647732iec.3 for ; Mon, 06 Jul 2015 06:52:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding; bh=uYb3hkBhGk0TKdJP7wggU+P30URV6BHNm3fPkeZ9qi0=; b=G3ANVOni6MS8CYqxZzjJzO2tlO1kHYDngLc2cPjlaFVRsic0iqwruMJVpazrBr0YUl vJpC/L45K+bMbBebA/qZyAHAcqOEFnAh9buHKtj0+rnWXg/MeaAhvP/V6hK+q6JRQBJd /B4YXKwYXckqVLXtnVMhKOP2o1LbqDtTNeJXTnHD1LL2udVir/OGAkZm92eV0gJM4DCi N0WIFl9w2MB1f5BWQGBVdqtHXa9AOEbbegi+8db18yZlwuKHJNwbdMu30L2bsIMH64Wg HQsYwwk6/W/ccbs8jxdxfeQS4Ejhup1SrrUcFcjG8q3DKrGwXnE+lfNtmA1WkjFIy4s2 ctoA== X-Gm-Message-State: ALoCoQkrKpx3jBYTJXZaVa+GCx2ag2/BHaNYVCB2D6JJuBrI93S0wLLNShBZnRnlUZfK01StelXo X-Received: by 10.107.25.15 with SMTP id 15mr75218311ioz.11.1436190720305; Mon, 06 Jul 2015 06:52:00 -0700 (PDT) Received: from kateleycoimac.local ([63.231.252.189]) by mx.google.com with ESMTPSA id a139sm12493800ioa.14.2015.07.06.06.51.58 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Jul 2015 06:51:59 -0700 (PDT) Message-ID: <559A87FE.70309@kateley.com> Date: Mon, 06 Jul 2015 08:51:58 -0500 From: Linda Kateley User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> In-Reply-To: <559A14DB.3080905@sneakertech.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 13:52:07 -0000 All open-zfs come from basically the same source. The communities that use can make decisions on how it operates on their platform. The solaris and opensolaris distros have sharesmb and an in kernel cifs. Maybe try omnios. There is always some additional config, creating user, permissions, etc. My company helps people walk through the differences of the different distros and recommend a platform. Feel free to contact me directly. I also have an open-zfs bootcamp videos series that show the similarities. http://kateleyco.com/?page_id=783 Also you might want to try something like freenas, that has everything you need to turn freebsd into a nas platform. There are also many, for purchase, software packages that do these things for you and include support, ie. cloudbyte, osnexus, nexenta, syneto... linda On 7/6/15 12:40 AM, Quartz wrote: >> No, actually, it isn't. :) It works in a similar manner to sharenfs on >> FreeBSD. You still require a separate NFS server installed, and ask it >> does it copy the info to an exports file. >> >> Similar for sharesmb. You still require Samba being installed on Linux. >> All the property does is add the filesystem to a separate smb config >> file (or something like that; never actually used it on Linux). >> >> You still require the NFS and SMB packages installed for your distro. >> Same as you would for any other FS on Linux. > > > So I'm a little confused here. > > On Linux, the property is active and usable but only creates the > share, meaning you still need the sever software to host it. On > FreeBSD, the property doesn't work at all, and you need the server > software to do everything......? > > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" -- Linda Kateley Kateley Company Skype ID-kateleyco http://kateleyco.com From owner-freebsd-fs@freebsd.org Mon Jul 6 14:19:16 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8296A5A4 for ; Mon, 6 Jul 2015 14:19:16 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: from mail-ob0-x230.google.com (mail-ob0-x230.google.com [IPv6:2607:f8b0:4003:c01::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7C0A61877 for ; Mon, 6 Jul 2015 14:19:16 +0000 (UTC) (envelope-from fjwcash@gmail.com) Received: by obdbs4 with SMTP id bs4so107894978obd.3 for ; Mon, 06 Jul 2015 07:19:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/pLsQb6pbBNH+KzNTveIlgL+tg9nxtB+GgvXpVR6KeU=; b=RzeMOt9CJ5XLynsRVNomaKfdfDh86FNFLUjIq3n2Nz5WfLKCs//88pM9giOvk7ucy9 XJkOmp6sTl7oYTxWHTjRxRf6aNQx3bqWRp2FATzXJPJ2oB2kxziqTwnh7MCywzjq4S4D cdQXNblBlKWi86Z0bgk2QyuvPwyaEY5FIEzwWFRxLiOVAUCL1mUx3EDg7ApBYr1F/WsG N1rwdCXR+HPPF4UB5m3akRdOb4de+Dkrduly3KDLpCxiSbRyRxUT0ZwylRJu+Uw/pR5F BRD+H6YxGI8Grf+7pBqphU0OBzq+Z5kvMZZ+2M47yY6hbnSQ8Dsi8PkgjD9IJJWD3yOE rUpw== MIME-Version: 1.0 X-Received: by 10.202.63.212 with SMTP id m203mr45925692oia.35.1436192355818; Mon, 06 Jul 2015 07:19:15 -0700 (PDT) Received: by 10.76.81.100 with HTTP; Mon, 6 Jul 2015 07:19:15 -0700 (PDT) Received: by 10.76.81.100 with HTTP; Mon, 6 Jul 2015 07:19:15 -0700 (PDT) In-Reply-To: <559A14DB.3080905@sneakertech.com> References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> Date: Mon, 6 Jul 2015 07:19:15 -0700 Message-ID: Subject: Re: A question about ZFS built-in SMB From: Freddie Cash To: Quartz Cc: FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 14:19:16 -0000 On Jul 5, 2015 10:40 PM, "Quartz" wrote: >> >> No, actually, it isn't. :) It works in a similar manner to sharenfs on >> FreeBSD. You still require a separate NFS server installed, and ask it >> does it copy the info to an exports file. >> >> Similar for sharesmb. You still require Samba being installed on Linux. >> All the property does is add the filesystem to a separate smb config >> file (or something like that; never actually used it on Linux). >> >> You still require the NFS and SMB packages installed for your distro. >> Same as you would for any other FS on Linux. > > > > So I'm a little confused here. > > On Linux, the property is active and usable but only creates the share, meaning you still need the sever software to host it. On FreeBSD, the property doesn't work at all, and you need the server software to do everything...... Correct. On Solaris derivatives, you only need the OS installed and ZFS configured in order to share filesystems via the share{nfs|smb} properties. They have in-kernel NFS and CIFS servers. No 3rd party software required, the zfs system does everything. On Linux, you need the OS, ZFS, an NFS server, and Samba installed in order to share filesystems. You can use the share{nfs|smb} properties to configure the shares, but it's 3rd party software (external to zfs) that actually does the sharing. On FreeBSD, you need the OS, ZFS, an NFS server, and Samba installed in order to share filesystems. You can use the sharenfs property to configure an nfs share, but it's the 3rd party nfs server that actually does the sharing. And you need to do everything manually via samba to share filesystems as the sharesmb property isn't supported. Or, you can just ignore the share{nfs|smb} properties and do everything manually, the way you would for any filesystem. From owner-freebsd-fs@freebsd.org Mon Jul 6 16:56:18 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B5C77AEEC for ; Mon, 6 Jul 2015 16:56:18 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 8C6A812E9 for ; Mon, 6 Jul 2015 16:56:18 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id B32B23F740 for ; Mon, 6 Jul 2015 12:56:10 -0400 (EDT) Message-ID: <559AB32A.7070702@sneakertech.com> Date: Mon, 06 Jul 2015 12:56:10 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> <559A87FE.70309@kateley.com> In-Reply-To: <559A87FE.70309@kateley.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 16:56:18 -0000 > All open-zfs come from basically the same source. The communities that > use can make decisions on how it operates on their platform. Oh ok. I was under the impression that Linux ZFS was basically a hack/port of the FreeBSD version due to licensing issues, and the FreeBSD version was itself a port of the illumos version. From owner-freebsd-fs@freebsd.org Mon Jul 6 17:01:58 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A6A83BEA5 for ; Mon, 6 Jul 2015 17:01:58 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 687F01A8A for ; Mon, 6 Jul 2015 17:01:58 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 9BEB33F733 for ; Mon, 6 Jul 2015 13:01:57 -0400 (EDT) Message-ID: <559AB480.8090608@sneakertech.com> Date: Mon, 06 Jul 2015 13:01:52 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: FreeBSD Filesystems Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 17:01:58 -0000 > On Linux [...] You can use the share{nfs|smb} properties to > configure the shares, but it's 3rd party software (external to zfs) that > actually does the sharing. > > On FreeBSD [...] You can use the sharenfs property to > configure an nfs share, but it's the 3rd party nfs server that actually > does the sharing. And you need to do everything manually via samba to > share filesystems as the sharesmb property isn't supported. That clears it up, thanks. > Or, you can just ignore the share{nfs|smb} properties and do everything > manually, the way you would for any filesystem. That what we've always been doing, I was just wondering/hoping that maybe there was a simpler way now. Oh well. From owner-freebsd-fs@freebsd.org Mon Jul 6 17:04:13 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EB494D9C7 for ; Mon, 6 Jul 2015 17:04:13 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-la0-x236.google.com (mail-la0-x236.google.com [IPv6:2a00:1450:4010:c03::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CA2A61B2E for ; Mon, 6 Jul 2015 17:04:12 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: by labgy5 with SMTP id gy5so10894688lab.2 for ; Mon, 06 Jul 2015 10:04:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/v6Ho3vyIGhf0SxzX8SJKkmidAbBSUQDeI5AcMuj3lQ=; b=ITl49u15GaBod/9nG01BiPPhPtGT6k+PEy3lWZ9fWVeDqbIB1ya2YI+GDfYbBFkphs //EGhmE8fnPURrqggmW8dgODPZw1DAaptI9Nu5FYY5NiPkQBFGnXxWjfSaXq7HXeriI4 JBdQbZjpNw20v7T4VWTRbVEw1IZpGHlheacBCLVF+wGt9pVftQShB7aju9eyrOaxqLFF aDlBO97edNLikNWCEKAc5GnyCBUn3JdlZ9L3YzkJy6IsmdsahCj4eJ9D2MEWYkoz6uJn eCE0lVrRGFNy8VvhMGrSVJpfoKZD3QpmShLPoZeTmwc1o2hgmu+zuwQzKVNy8KOFL1pd zD+Q== MIME-Version: 1.0 X-Received: by 10.112.164.35 with SMTP id yn3mr44094865lbb.91.1436202250702; Mon, 06 Jul 2015 10:04:10 -0700 (PDT) Received: by 10.152.219.35 with HTTP; Mon, 6 Jul 2015 10:04:10 -0700 (PDT) In-Reply-To: <559AB32A.7070702@sneakertech.com> References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> <559A87FE.70309@kateley.com> <559AB32A.7070702@sneakertech.com> Date: Mon, 6 Jul 2015 20:04:10 +0300 Message-ID: Subject: Re: A question about ZFS built-in SMB From: Kimmo Paasiala To: Quartz Cc: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 17:04:13 -0000 On Mon, Jul 6, 2015 at 7:56 PM, Quartz wrote: >> All open-zfs come from basically the same source. The communities that >> use can make decisions on how it operates on their platform. > > > Oh ok. I was under the impression that Linux ZFS was basically a hack/port > of the FreeBSD version due to licensing issues, and the FreeBSD version was > itself a port of the illumos version. > > > > http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue There is no real licensing issue since ZFS can be compiled into a loadable kernel module(s) on Linux (just as it is used in FreeBSD anyway) and that avoids the clash between the two mutually incompatible licenses. -Kimmo From owner-freebsd-fs@freebsd.org Mon Jul 6 17:59:02 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C4245994DE0 for ; Mon, 6 Jul 2015 17:59:02 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from douhisi.pair.com (douhisi.pair.com [209.68.5.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9DBA512EA for ; Mon, 6 Jul 2015 17:59:02 +0000 (UTC) (envelope-from quartz@sneakertech.com) Received: from [10.2.2.1] (pool-173-48-121-235.bstnma.fios.verizon.net [173.48.121.235]) by douhisi.pair.com (Postfix) with ESMTPSA id 76BFE3F718 for ; Mon, 6 Jul 2015 13:59:00 -0400 (EDT) Message-ID: <559AC1E4.6050906@sneakertech.com> Date: Mon, 06 Jul 2015 13:59:00 -0400 From: Quartz User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: freebsd-fs Subject: Re: A question about ZFS built-in SMB References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> <559A87FE.70309@kateley.com> <559AB32A.7070702@sneakertech.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 17:59:02 -0000 >> Oh ok. I was under the impression that Linux ZFS was basically a hack/port >> of the FreeBSD version due to licensing issues, and the FreeBSD version was >> itself a port of the illumos version. >> > > http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue > > There is no real licensing issue since ZFS can be compiled into a > loadable kernel module(s) on Linux (just as it is used in FreeBSD > anyway) and that avoids the clash between the two mutually > incompatible licenses. Did this change recently? Last time I looked was a few years ago and ZFS on Linux was still largely an awkward hack because what they had to do to work around the licensing issue. I heavily investigated the Debian/kFreeBSD project before just going with mainline FreeBSD. From owner-freebsd-fs@freebsd.org Mon Jul 6 18:52:28 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72425A6E5 for ; Mon, 6 Jul 2015 18:52:28 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 4C5561F72 for ; Mon, 6 Jul 2015 18:52:27 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from spa-sysadm-01.spa.umn.edu ([134.84.199.8]) by mail.physics.umn.edu with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.77 (FreeBSD)) (envelope-from ) id 1ZCBV6-0005ib-4N; Mon, 06 Jul 2015 13:52:20 -0500 Message-ID: <559ACE63.7060409@physics.umn.edu> Date: Mon, 06 Jul 2015 13:52:19 -0500 From: Graham Allan User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: Rick Macklem CC: freebsd-fs@freebsd.org Subject: Re: Strange NFS problem implicating nfsuserd? References: <55946FFE.8070402@physics.umn.edu> <972685551.2776991.1435795831472.JavaMail.zimbra@uoguelph.ca> <55948916.4080405@physics.umn.edu> <1203156989.2786078.1435799642755.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <1203156989.2786078.1435799642755.JavaMail.zimbra@uoguelph.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 18:52:28 -0000 On 7/1/2015 8:14 PM, Rick Macklem wrote: > Graham Allan wrote: >> >> I was always able to get a failure within 10-60 minutes or so, so having >> the nfsuserd cache timeout at 600 minutes seems like it should eliminate >> any intermittent id lookup issues. >> > I'll take another look at nfsuserd.c. Maybe it does something stupid like > getting the length of the argument wrong (trailing blank or null or something > like that, that doesn't show up when it is printed out). All I can think of > is a subtle bug in nfsuserd.c when the argument is specified. > >> I guess I could try... >> (1) rpcdebug on the linux client, though I'm not sure which flags to >> enable to log idmapping issues. >> (2) watch nfsuserd with truss and look for different behaviors. >> (3) capture NFS traffic, examine with wireshark >> > I'd try #3 if I were you and see if the owner and owner_group names look > right. > > I'll post if I find anything in nfsuserd.c, rick Thanks for indulging me Rick. As you might have expected though, it's time for me to follow up with my mea culpa that my problem identification was entirely wrong. I knew none of it made sense, but perhaps it's fate that I need to post something embarrassingly wrong to find the true cause :-) The reason things became stable when I altered the nfsuserd flags is that I also stopped our configuration management system on the affected systems so they wouldn't get reverted during testing. And of course that was doing something else which was responsible. We've had a lot of workstation movement over the last few months, with machines being moved to new buildings and new ip addresses though the hostname remains the same. To try and address this, a periodic reload of mountd was added - the list of permitted hostnames are in /etc/netgroup, and it seems that mountd doesn't pick up on changed DNS values in the netgroup without a HUP. I guess I never thought that reloading mountd could cause i/o disruption, but the man page does of course allude to this when discussing the "-S" flag. I've used lots of types of unix for a long time; I never thought I needed to read the mountd man page! For now I simply stopped doing any reloads, but I could probably start using that flag instead... Graham From owner-freebsd-fs@freebsd.org Mon Jul 6 19:19:58 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BDF3AAAC6 for ; Mon, 6 Jul 2015 19:19:58 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 9B8381D3F for ; Mon, 6 Jul 2015 19:19:57 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from peevish.spa.umn.edu ([128.101.220.230]) by mail.physics.umn.edu with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1ZCBvo-0007VX-Vi for freebsd-fs@freebsd.org; Mon, 06 Jul 2015 14:19:56 -0500 Received: by peevish.spa.umn.edu (Postfix, from userid 5000) id E310B136; Mon, 6 Jul 2015 14:19:56 -0500 (CDT) Date: Mon, 6 Jul 2015 14:19:56 -0500 From: Graham Allan To: freebsd-fs@freebsd.org Subject: increasing MAXNFSDCNT Message-ID: <20150706191956.GL12772@physics.umn.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-12-10) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 19:19:58 -0000 Regarding the maximum number of nfs server threads, is increasing the "#define MAXNFSDCNT 256" in nfsd.c the only thing needed to try higher values? My FreeBSD nfs servers are still just at 128 threads, but I was experimenting with one of our linux servers which became happy at 512 threads (really ~400 in use under load). The load profile is pretty similar, so it might be reasonable to expect ending up around the same value. Graham -- From owner-freebsd-fs@freebsd.org Mon Jul 6 20:27:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4D5CC9955D0 for ; Mon, 6 Jul 2015 20:27:51 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 10DB01DE9 for ; Mon, 6 Jul 2015 20:27:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2BrAwCK45pV/61jaINcDoNYYAaDGbo+CYFkCoUtSgKBcxQBAQEBAQEBgQqEIwEBAQMBAQEBICsgCwULAgEIGAICDRkCAicBCSYCBAgHBAEcBIgFCA20dpZiAQEBAQEFAQEBAQEBARuBIYoqhDQBAQUXNAeCaIFDBZQVhGKENoROllYCJmOCWloiMQeBBjqBBAEBAQ X-IronPort-AV: E=Sophos;i="5.15,417,1432612800"; d="scan'208";a="222204096" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 06 Jul 2015 16:27:49 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id AC20515F542; Mon, 6 Jul 2015 16:27:49 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MblbZzgULswQ; Mon, 6 Jul 2015 16:27:49 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 5AE3715F55D; Mon, 6 Jul 2015 16:27:49 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 4TIqvQioyIsF; Mon, 6 Jul 2015 16:27:49 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 42C4915F542; Mon, 6 Jul 2015 16:27:49 -0400 (EDT) Date: Mon, 6 Jul 2015 16:27:49 -0400 (EDT) From: Rick Macklem To: Graham Allan Cc: freebsd-fs@freebsd.org Message-ID: <1025579120.4972878.1436214469123.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <20150706191956.GL12772@physics.umn.edu> References: <20150706191956.GL12772@physics.umn.edu> Subject: Re: increasing MAXNFSDCNT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: increasing MAXNFSDCNT Thread-Index: vueAz0/lBiN82Qpf2Eic22K1cUpNyQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 20:27:51 -0000 Graham Allan wrote: > Regarding the maximum number of nfs server threads, is increasing the > "#define MAXNFSDCNT 256" in nfsd.c the only thing needed to try higher > values? > I haven't done it for a little while, but I think that is all it takes. (I had actually planned on increasing this, but it slipped through the cracks. Maybe soon.) rick > My FreeBSD nfs servers are still just at 128 threads, but I was > experimenting with one of our linux servers which became happy > at 512 threads (really ~400 in use under load). The load profile is > pretty similar, so it might be reasonable to expect ending up around the > same value. > > Graham > -- > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Mon Jul 6 20:46:53 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 264F79959DC for ; Mon, 6 Jul 2015 20:46:53 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id DD2E810E6 for ; Mon, 6 Jul 2015 20:46:52 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2BrAwDa6JpV/61jaINcDoNYYAaDGbo+CYFkCoUtSgKBcxQBAQEBAQEBgQqEIwEBAQMBAQEBICsgCwULAgEIGAICDRkCAicBCSYCBAgHBAEcBIgFCA20bJZkAQEBAQEFAQEBAQEBAQEagSGKKoQtBwEBBRc0B4ItOxKBMQWUFYRihDaECkSGYIwbg1sCJmOCWloiMQd9CRcjgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,417,1432612800"; d="scan'208";a="224070687" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 06 Jul 2015 16:46:15 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id C196615F542; Mon, 6 Jul 2015 16:46:15 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id sGTNhsyW7zTJ; Mon, 6 Jul 2015 16:46:15 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 4F4C215F54D; Mon, 6 Jul 2015 16:46:15 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id pE26ETPH4wsh; Mon, 6 Jul 2015 16:46:15 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3492915F542; Mon, 6 Jul 2015 16:46:15 -0400 (EDT) Date: Mon, 6 Jul 2015 16:46:15 -0400 (EDT) From: Rick Macklem To: Graham Allan Cc: freebsd-fs@freebsd.org Message-ID: <1318238880.4985990.1436215575189.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <20150706191956.GL12772@physics.umn.edu> References: <20150706191956.GL12772@physics.umn.edu> Subject: Re: increasing MAXNFSDCNT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: increasing MAXNFSDCNT Thread-Index: 443y2hyN1u9INvTdgeWX2pNBtFKiLQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 20:46:53 -0000 Graham Allan wrote: > Regarding the maximum number of nfs server threads, is increasing the > "#define MAXNFSDCNT 256" in nfsd.c the only thing needed to try higher > values? > Oops, my turn to admit that I should have read the man page before answering. If you have a fairly recent nfsd that supports the "-maxthreads" option, you can set this and I'm pretty sure you can make it larger than 256. The system actually dynamically adjusts the # of nfsd kernel threads between "-minthreads" and "-maxthreads". The "-n" option (which should probably be deprecated) sets both of these to the same value and is limited to MAXNFSDCNT (256). If your nfsd doesn't support "-maxthreads", increasing MAXNFSDCNT should work. rick > My FreeBSD nfs servers are still just at 128 threads, but I was > experimenting with one of our linux servers which became happy > at 512 threads (really ~400 in use under load). The load profile is > pretty similar, so it might be reasonable to expect ending up around the > same value. > > Graham > -- > > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Mon Jul 6 21:23:23 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0C8A1995056 for ; Mon, 6 Jul 2015 21:23:23 +0000 (UTC) (envelope-from daved@nostrum.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id E88C91EC1 for ; Mon, 6 Jul 2015 21:23:22 +0000 (UTC) (envelope-from daved@nostrum.com) Received: by mailman.ysv.freebsd.org (Postfix) id E4E84995055; Mon, 6 Jul 2015 21:23:22 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E4481995053 for ; Mon, 6 Jul 2015 21:23:22 +0000 (UTC) (envelope-from daved@nostrum.com) Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id C0BF21EC0 for ; Mon, 6 Jul 2015 21:23:22 +0000 (UTC) (envelope-from daved@nostrum.com) Received: from [10.1.12.128] (vpn.net.tamu.edu [128.194.177.117]) (authenticated bits=0) by nostrum.com (8.15.2/8.14.9) with ESMTPSA id t66LNLAC049007 (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128 verify=NO) for ; Mon, 6 Jul 2015 16:23:22 -0500 (CDT) (envelope-from daved@nostrum.com) X-Authentication-Warning: raven.nostrum.com: Host vpn.net.tamu.edu [128.194.177.117] claimed to be [10.1.12.128] From: Dave Duchscher Subject: ZFS system lockup Message-Id: <1BCFA515-BF3D-4E64-B826-BA475B13E770@nostrum.com> Date: Mon, 6 Jul 2015 16:23:16 -0500 To: fs@freebsd.org Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2102\)) X-Mailer: Apple Mail (2.2102) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 21:23:23 -0000 In the process of diagnosing an IO performance problem with our virtual = environment, we ran into FreeBSD instances used in testing locking up = and needing to be reset. Moving to real hardware and running the same = tests, we are able to reproduce the lockup. We are testing using fio running a few read and write tests over and = over again. Watching via top, the system locks up and last update from = top is reporting wired memory has taking all the memory (2G in the = system, top shows1947M wired). ARC size at the time of the latest lockup = was around 437M. I can keep the system from locking if I reduce the = maximum ARC size to 512M and wired memory floats around 1G. Setting = maximum ARC 768M or higher and we get consistent lockups after running = for a few hours. What is using this wired memory? Is there a way to keep wired memory under control with ZFS besides = shrinking the ARC cache? Is there any guidance on how much wired memory will be used for various = ARC sizes? Is 2G just too little memory to run ZFS? We understand that the maximum ARC size will need to tuned in some cases = but shrinking it down to 512M seems low. This test hardware has a single 250G disk and 2G of RAM. OS is FreeBSD = 10.1 Release. Upgrading the system to stable and saw similar results. = Currently, the system is running 10.1 Release since that is what is used = elsewhere. We have seen a lockup on one of our database nodes which has 20G of RAM = which we thought was caused by a SAN switch on our VM system. Now we = are not so sure. -- Dave From owner-freebsd-fs@freebsd.org Mon Jul 6 21:29:58 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C96E999509A for ; Mon, 6 Jul 2015 21:29:58 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from mail.physics.umn.edu (smtp.spa.umn.edu [128.101.220.4]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id ABBD61F97 for ; Mon, 6 Jul 2015 21:29:57 +0000 (UTC) (envelope-from allan@physics.umn.edu) Received: from peevish.spa.umn.edu ([128.101.220.230]) by mail.physics.umn.edu with esmtp (Exim 4.77 (FreeBSD)) (envelope-from ) id 1ZCDxc-000DqW-JW; Mon, 06 Jul 2015 16:29:56 -0500 Received: by peevish.spa.umn.edu (Postfix, from userid 5000) id 86818A0F; Mon, 6 Jul 2015 16:29:56 -0500 (CDT) Date: Mon, 6 Jul 2015 16:29:56 -0500 From: Graham Allan To: Rick Macklem Cc: freebsd-fs@freebsd.org Subject: Re: increasing MAXNFSDCNT Message-ID: <20150706212956.GO12772@physics.umn.edu> References: <20150706191956.GL12772@physics.umn.edu> <1318238880.4985990.1436215575189.JavaMail.zimbra@uoguelph.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1318238880.4985990.1436215575189.JavaMail.zimbra@uoguelph.ca> User-Agent: Mutt/1.5.20 (2009-12-10) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 06 Jul 2015 21:29:58 -0000 On Mon, Jul 06, 2015 at 04:46:15PM -0400, Rick Macklem wrote: > Graham Allan wrote: > > Regarding the maximum number of nfs server threads, is increasing the > > "#define MAXNFSDCNT 256" in nfsd.c the only thing needed to try higher > > values? > > > Oops, my turn to admit that I should have read the man page before answering. > If you have a fairly recent nfsd that supports the "-maxthreads" option, you > can set this and I'm pretty sure you can make it larger than 256. > The system actually dynamically adjusts the # of nfsd kernel threads between > "-minthreads" and "-maxthreads". The "-n" option (which should probably be deprecated) > sets both of these to the same value and is limited to MAXNFSDCNT (256). > > If your nfsd doesn't support "-maxthreads", increasing MAXNFSDCNT should work. Would have helped if I stated the version too - I'm using 9.3 so it just supports "-n" (I can see "-minthreads" and "-maxthreads" in the 10.1 man pages). I guess I'm a bit cautious about increasing it though, since I expect it's possible I might rebuild a new patch release without remembering to re-edit nfsd.c. If that were the case and I'd specified "-n" > MAXNFSDCNT then this version of nfsd would revert back to DEFNFSDCNT = 4, which could be a painful shot in the foot. Thanks, Graham From owner-freebsd-fs@freebsd.org Tue Jul 7 05:22:04 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C38869941CE for ; Tue, 7 Jul 2015 05:22:04 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 9F5861025 for ; Tue, 7 Jul 2015 05:22:04 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: by mailman.ysv.freebsd.org (Postfix) id 9C1F79941CD; Tue, 7 Jul 2015 05:22:04 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9A8099941CC for ; Tue, 7 Jul 2015 05:22:04 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 668F51024 for ; Tue, 7 Jul 2015 05:22:03 +0000 (UTC) (envelope-from matthew.ahrens@delphix.com) Received: by igrv9 with SMTP id v9so132464684igr.1 for ; Mon, 06 Jul 2015 22:22:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=delphix.com; s=google; h=mime-version:date:message-id:subject:from:to:content-type; bh=nikEb4YXV0B2qzTXTiwNecZBTymnMnYynPvTutPSBig=; b=MWsCuu4FVEco/aCwUe0lj6mbk0y5oZRsT0uu0apFCINIPaqRccQSgsJ12LbpWRcMlx HvhS3DTmxYKSTJAwhzdbWBBGYw0ieQ98gjGIThVGd6KzJ2CXUpK0TMJxp7eJVz//OaLR eLG0M+TFPcKpQS6g3OdD2aJjdjfdifRJZY4js= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=nikEb4YXV0B2qzTXTiwNecZBTymnMnYynPvTutPSBig=; b=CCoy8uxd4wvgBGcsOeLyl8LwjLH2/rHlABJM2qqPQLrYiqZJrM2F7TC7V0wPrBUZGQ NrEU0Jk7cxxFpxLcQNwgDG7/6L20PqkS7Y6Bo7a0qcekoTIRTfi9LNzguMyBprFzW1jp 6ERzKYWloPJYLBXqTT1fHau5Tu+/JlWny5WX1Uh4pxQqpePszeVBwBTiF63bHPf7TQSX owFpBJb7wnWnOQN4r1c5BWfZ59u49tqCSMNmBMWgcGE5SomfZYwwnBFISIacs15sqtAK b5icmWONBqLtyL0FtTOXksKCOA1cVubFDOM5/aDwD9ma0sutbKoY+GRCCwLw19Ot7z92 CWUg== X-Gm-Message-State: ALoCoQnMGxiYjgiRKwWEr1PosLUeX9C6NCNHKUcAa6GQStKe9epY+XUTV9GGRV8+L95tTsLFdCP7 MIME-Version: 1.0 X-Received: by 10.50.129.40 with SMTP id nt8mr45427057igb.24.1436246523025; Mon, 06 Jul 2015 22:22:03 -0700 (PDT) Received: by 10.36.85.197 with HTTP; Mon, 6 Jul 2015 22:22:02 -0700 (PDT) Date: Mon, 6 Jul 2015 22:22:02 -0700 Message-ID: Subject: OpenZFS European Conference videos posted From: Matthew Ahrens To: developer , illumos-zfs , zfs-discuss , fs@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 05:22:04 -0000 Videos from the the OpenZFS European Conferences are now available on YouTube. This conference was then 2nd annual, held in Paris in May. There were 9 talks, including some ZFS deep dives from myself, George Wilson, Saso Kiselkov, and Dan Vatva, as well as overviews of how storage hardware actually works from representatives of HGST (Hitachi) and Toshiba. All the videos are posted here: http://www.open-zfs.org/wiki/Publications#2015_OpenZFS_European_Conference Notably, the audio and video are much better than last year's OpenZFS Developer Summit (we'll be striving to improve that this year). While I have your attention, I'll take a moment to plug 2 more things: - Kirk McKusick (of FreeBSD and UFS fame) gave a great overview of ZFS at BSDCAN last month, video now available: https://www.youtube.com/watch?v=UP_JfUUmDZo&index=35&list=PLWW0CjV-TafY0NqFDvD4k31CtnX-CGn8f - A reminder that information about the OpenZFS Developer Summit is now available on the website: http://open-zfs.org/wiki/OpenZFS_Developer_Summit_2015 The deadline for talk proposals is August 31st, and we welcome everyone - both old hats and those new to the community - to submit proposals. --matt From owner-freebsd-fs@freebsd.org Tue Jul 7 08:59:04 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id CE7819BB7 for ; Tue, 7 Jul 2015 08:59:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 5817419FD; Tue, 7 Jul 2015 08:59:04 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t678wwIP057032 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 7 Jul 2015 11:58:59 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t678wwIP057032 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t678wvTF057031; Tue, 7 Jul 2015 11:58:57 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 7 Jul 2015 11:58:57 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org, kib@freebsd.org, rwatson@FreeBSD.org, Mateusz Guzik Subject: Re: [PATCH 1/2] vfs: avoid spurious vref/vrele for absolute lookups Message-ID: <20150707085857.GZ2080@kib.kiev.ua> References: <1436152035-12564-1-git-send-email-mjguzik@gmail.com> <1436152035-12564-2-git-send-email-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1436152035-12564-2-git-send-email-mjguzik@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 08:59:04 -0000 On Mon, Jul 06, 2015 at 05:07:14AM +0200, Mateusz Guzik wrote: > From: Mateusz Guzik > > namei used to vref fd_cdir, which was immediatley vrele'd on entry to > the loop. Does it make sense to do this, if the other patch, for interlock-less vref/vrele on holdcount > 0, is in progress ? > > Simplify error handling and remove type checking for ni_startdir vnode. > It is only set by nfs which does the check on its own. Assert the > correct type instead. > --- > sys/kern/vfs_lookup.c | 92 ++++++++++++++++++++++++++++----------------------- > 1 file changed, 51 insertions(+), 41 deletions(-) > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > index 5dc07dc..c5218ec 100644 > --- a/sys/kern/vfs_lookup.c > +++ b/sys/kern/vfs_lookup.c > @@ -109,6 +109,27 @@ namei_cleanup_cnp(struct componentname *cnp) > #endif > } > > +static int > +namei_handle_root(struct nameidata *ndp, struct vnode **dpp) > +{ > + struct componentname *cnp = &ndp->ni_cnd; > + > + if (ndp->ni_strictrelative != 0) { > +#ifdef KTRACE > + if (KTRPOINT(curthread, KTR_CAPFAIL)) > + ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); > +#endif > + return (ENOTCAPABLE); > + } > + while (*(cnp->cn_nameptr) == '/') { > + cnp->cn_nameptr++; > + ndp->ni_pathlen--; > + } > + *dpp = ndp->ni_rootdir; > + VREF(*dpp); > + return (0); > +} > + > /* > * Convert a pathname into a pointer to a locked vnode. > * > @@ -148,6 +169,8 @@ namei(struct nameidata *ndp) > ("namei: nameiop contaminated with flags")); > KASSERT((cnp->cn_flags & OPMASK) == 0, > ("namei: flags contaminated with nameiops")); > + if (ndp->ni_startdir != NULL) > + MPASS(ndp->ni_startdir->v_type == VDIR); ni_startdir is not locked, am I correct ? If yes, the assert is not safe. > if (!lookup_shared) > cnp->cn_flags &= ~LOCKSHARED; > fdp = p->p_fd; Could this patch be further split ? E.g. could the introduction of the namei_handle_root() and its use twice be done in the first patch, while the loop logic reorganization come into the follow-up ? As it is now, the patch is almost impossible to review without rewriting the logic independently. From owner-freebsd-fs@freebsd.org Tue Jul 7 09:31:30 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BD703AF5F for ; Tue, 7 Jul 2015 09:31:30 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id A152E1D88 for ; Tue, 7 Jul 2015 09:31:30 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: by mailman.ysv.freebsd.org (Postfix) id A003EAF5E; Tue, 7 Jul 2015 09:31:30 +0000 (UTC) Delivered-To: fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F87FAF5D for ; Tue, 7 Jul 2015 09:31:30 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from smtp.digiware.nl (smtp.digiware.nl [31.223.170.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 605D51D87; Tue, 7 Jul 2015 09:31:30 +0000 (UTC) (envelope-from wjw@digiware.nl) Received: from rack1.digiware.nl (unknown [127.0.0.1]) by smtp.digiware.nl (Postfix) with ESMTP id DED5B153416; Tue, 7 Jul 2015 11:31:20 +0200 (CEST) X-Virus-Scanned: amavisd-new at digiware.nl Received: from smtp.digiware.nl ([127.0.0.1]) by rack1.digiware.nl (rack1.digiware.nl [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nI7N6dzDh7-R; Tue, 7 Jul 2015 11:31:01 +0200 (CEST) Received: from [IPv6:2001:4cb8:3:1:301d:d194:f8e3:4290] (unknown [IPv6:2001:4cb8:3:1:301d:d194:f8e3:4290]) by smtp.digiware.nl (Postfix) with ESMTP id 3999515344D; Tue, 7 Jul 2015 11:31:01 +0200 (CEST) Message-ID: <559B9C54.9060903@digiware.nl> Date: Tue, 07 Jul 2015 11:31:00 +0200 From: Willem Jan Withagen Organization: Digiware Management b.v. User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: Bob Friesenhahn , Steve Wills CC: fs@freebsd.org Subject: Re: This diskfailure should not panic a system, but just disconnect disk from ZFS References: <5585767B.4000206@digiware.nl> <20150620221431.GB26416@mouf.net> In-Reply-To: Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 09:31:30 -0000 On 21-6-2015 23:01, Bob Friesenhahn wrote: > On Sat, 20 Jun 2015, Steve Wills wrote: >>> rev=0x00 hdr=0x00 >>> vendor = 'Areca Technology Corp.' >>> device = 'ARC-1120 8-Port PCI-X to SATA RAID Controller' >>> class = mass storage >>> subclass = RAID >> >> You may be hitting the zfs deadman panic, which is triggered when the >> controller hangs. This can in some cases be caused by disks that die >> in unusual >> ways. > > Notice that the RAID controller is a PCI-X device (shared parallel, not > dedicated serial like PCIe). The whole PCI backplane could have hung. I had this panic problem a while ago, but since then it has sort of recured quite a few times.... However this times I was working on the system and noticed it right away. So I just went into the basement and chekced the box. Console is not really dead: - I can switch terminals but cannot login - I can ping but cannot ssh into it. - Can not break into the kernel There is totally no I/O shown of the disk. No of the leds flash for lile atleast 30 sec... Just the reset button get me back to normal... So that suggest a lot more that something is really hung. Question is how can I debug this? Breaking into the kernel (ctl-del-esc) does not seem to work... Also contemplating to get an Areca controller for PCIe instead but that is shelling out again another $250. And that just to get JBODs --WjW From owner-freebsd-fs@freebsd.org Tue Jul 7 12:38:19 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id F361A9954A8 for ; Tue, 7 Jul 2015 12:38:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 999B21EFA for ; Tue, 7 Jul 2015 12:38:19 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t67Cc90P008779 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Tue, 7 Jul 2015 15:38:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t67Cc90P008779 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t67Cc90i008778; Tue, 7 Jul 2015 15:38:09 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Tue, 7 Jul 2015 15:38:09 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org Subject: Re: atomic v_usecount and v_holdcnt Message-ID: <20150707123809.GH2080@kib.kiev.ua> References: <20141122002812.GA32289@dft-labs.eu> <20141122092527.GT17068@kib.kiev.ua> <20141122211147.GA23623@dft-labs.eu> <20141124095251.GH17068@kib.kiev.ua> <20150314225226.GA15302@dft-labs.eu> <20150316094643.GZ2379@kib.kiev.ua> <20150317014412.GA10819@dft-labs.eu> <20150318104442.GS2379@kib.kiev.ua> <20150625123156.GA29667@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150625123156.GA29667@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 12:38:20 -0000 On Thu, Jun 25, 2015 at 02:31:57PM +0200, Mateusz Guzik wrote: > On Wed, Mar 18, 2015 at 12:44:42PM +0200, Konstantin Belousov wrote: > > On Tue, Mar 17, 2015 at 02:44:12AM +0100, Mateusz Guzik wrote: > > > On Mon, Mar 16, 2015 at 11:46:43AM +0200, Konstantin Belousov wrote: > > > > On Sat, Mar 14, 2015 at 11:52:26PM +0100, Mateusz Guzik wrote: > > > > > On Mon, Nov 24, 2014 at 11:52:52AM +0200, Konstantin Belousov wrote: > > > > > > On Sat, Nov 22, 2014 at 10:11:47PM +0100, Mateusz Guzik wrote: > > > > > > > On Sat, Nov 22, 2014 at 11:25:27AM +0200, Konstantin Belousov wrote: > > > > > > > > On Sat, Nov 22, 2014 at 01:28:12AM +0100, Mateusz Guzik wrote: > > > > > > > > > The idea is that we don't need an interlock as long as we don't > > > > > > > > > transition either counter 1->0 or 0->1. > > > > > > > > I already said that something along the lines of the patch should work. > > > > > > > > In fact, you need vnode lock when hold count changes between 0 and 1, > > > > > > > > and probably the same for use count. > > > > > > > > > > > > > > > > > > > > > > I don't see why this would be required (not that I'm an VFS expert). > > > > > > > vnode recycling seems to be protected with the interlock. > > > > > > > > > > > > > > In fact I would argue that if this is really needed, current code is > > > > > > > buggy. > > > > > > Yes, it is already (somewhat) buggy. > > > > > > > > > > > > Most need of the lock is for the case of counts coming from 1 to 0. > > > > > > The reason is the handling of the active vnode list, which is used > > > > > > for limiting the amount of vnode list walking in syncer. When hold > > > > > > count is decremented to 0, vnode is removed from the active list. > > > > > > When use count is decremented to 0, vnode is supposedly inactivated, > > > > > > and vinactive() cleans the cached pages belonging to vnode. In other > > > > > > words, VI_OWEINACT for dirty vnode is sort of bug. > > > > > > > > > > > > > > > > Modified the patch to no longer have the usecount + interlock dropped + > > > > > VI_OWEINACT set window. > > > > > > > > > > Extended 0->1 hold count + vnode not locked window remains. I can fix > > > > > that if it is really necessary by having _vhold return with interlock > > > > > held if it did such transition. > > > > > > > > In v_upgrade_usecount(), you call v_incr_devcount() without without interlock > > > > held. What prevents the devfs vnode from being recycled, in particular, > > > > from invalidation of v_rdev pointer ? > > > > > > > > > > Right, that was buggy. Fixed in the patch below. > > Why non-atomicity of updates to several counters is safe ? This at least > > requires an explanation in the comment, I mean holdcnt/usecnt pair. > > > > The patch below was tested with make -j 40 buildworld in a loop for 7 hours > and it survived. > > I started a comment above vget, unfinished yet. Indeed, the following: * The hold count prevents the vnode from being freed, while the * use count prevents it from being recycled. is not true. usecount > 0 does not prevent recycling, it only gives a hint to VFS to try to avoid recycling, but could be ignored, e.g. for forced unmount. > > Further playing around revealed that zfs will vref a vnode with no > usecount (zfs_lookup -> zfs_dirlook -> zfs_dirent_lock -> zfs_zget -> > VN_HOLD) and it is possible that it will have VI_OWEINACT set (tested on > a kernel without my patch). VN_HOLD is defined as vref(). The code can > sleep, so some shuffling around can be done to call vinactive() if it > happens to be exclusively locked (but most of the time it is locked > shared). > > However, it seems that vputx deals with such consumers: > if (vp->v_usecount > 0) > vp->v_iflag &= ~VI_OWEINACT; > > Given that there are possibly more consumers like zfs how about: > In vputx assert that the flag is unset if the usecount went to > 0. Clear > the flag in vref and vget if transitioning 0->1 and assert it is unset > otherwise. This should be fine, I would only be concerned with VI_OWEINACT not cleared for unreferenced vnodes, see r284495. > > The way I read it is that in the stock kernel with properly timed vref > the flag would be cleared anyway, with vinactive() only called if it was > done by vget and only with the vnode exclusively locked. > > With a aforementioned change likelyhood of vinactive() remains the same, > but now the flag state can be asserted. > > > Assume the thread increased the v_usecount, but did not managed to > > acquire dev_mtx. Another thread performs vrele() and progressed to > > v_decr_devcount(). It decreases the si_usecount, which might allow yet > > another thread to see the si_usecount as too low and start unwanted > > action. I think that the tests for VCHR must be done at the very > > start of the functions, and devfs vnodes must hold vnode interlock > > unconditionally. > > > > Inserted v_type != VCHR checks in relevant places, vi_usecount > manipulation functions now assert that the interlock is held. Ok. > > > > > > > > I think that refcount_acquire_if_greater() KPI is excessive. You always > > > > calls acquire with val == 0, and release with val == 1. > > > > > > > > > > Yea i noted in my prevoius e-mail it should be changed (see below). > > > > > > I replaced them with refcount_acquire_if_not_zero and > > > refcount_release_if_not_last. > > I dislike the length of the names. Can you propose something shorter ? > > > > Unfortunately the original API is alreday quite verbose and I don't have > anything readable which would retain "refcount_acquire" (instead of a > "ref_get" or "ref_acq"). Adding "_nz" as a suffix does not look good > ("refcount_acquire_if_nz"). There were some proposals given ? BTW, refcount_acquire() has an aquire semantic. It is debatable whether the acq is needed there, or not. But your if_not functions are not acq. > > > The type for the local variable old in both functions should be u_int. > > > > Done. > > > > > > > > WRT to _refcount_release_lock, why is lock_object->lc_lock/lc_unlock KPI > > > > cannot be used ? This allows to make refcount_release_lock() a function > > > > instead of gcc extension macros. Not to mention that the macro is unused. > > > > > > These were supposed to be used by other code, forgot to remove it from > > > the patch I sent here. > > > > > > We can discuss this in another thread. > > > > > > Striclty speaking we could use it here for vnode interlock, but I did > > > not want to get around VI_LOCK macro (which right now is just a > > > mtx_lock, but this may change). > > > > > > Updated patch is below: > > Do not introduce ASSERT_VI_LOCK, the name difference between > > ASSERT_VI_LOCKED and ASSERT_VI_LOCK is only in the broken grammar. > > I do not see anything wrong with explicit if() statements where needed, > > in all four places. > > Done. > > > > > In vputx(), wrap the long line (if (refcount_release() || VI_DOINGINACT)). > > Done. Overall, the patch looks fine. I still have to convince myself each time I read the following changes - VI_LOCK(ddvp); + vhold(ddvp); CACHE_RUNLOCK(); - if (vget(ddvp, LK_INTERLOCK | LK_SHARED | LK_NOWAIT, curthread)) + if (vget(ddvp, LK_SHARED | LK_NOWAIT | LK_VNHELD, curthread)) but I indeed cannot find anything wrong. I.e. before it could fail if the vnode is locked, now it can fail if the vnode is locked or reclaimed meantime, which is surprisingly not harmful. From owner-freebsd-fs@freebsd.org Tue Jul 7 12:42:46 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 28A41995761 for ; Tue, 7 Jul 2015 12:42:46 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: from mail-la0-x236.google.com (mail-la0-x236.google.com [IPv6:2a00:1450:4010:c03::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id A72E11206 for ; Tue, 7 Jul 2015 12:42:45 +0000 (UTC) (envelope-from kpaasial@gmail.com) Received: by laar3 with SMTP id r3so193833314laa.0 for ; Tue, 07 Jul 2015 05:42:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=6nl3iFA+KQP4icWJ5ph4r2kzCRztocBabqD/nuGRZR8=; b=OePTIZ4kGfcWY/hqTjj2KEaqPHYTmygTbEUNb1C6OwWPzLOF5dHt8oreDlnz3ZS3hA qKgqxk5UEjjhOIq7CfnVNS5qCzmQ3Nme10baFdihQzIzF1Sy2YEgqH9QG8zU7d0p6gQb QXFXwRSU2UfseRyfRvAd7FieZyPZpUvnLsKxbps61pyJrK2eC1Dh9ak1naEklt0iVcvW TOJEjquLU3FPpimW7TH0xZSi66Vxjd9uT72eWKrJNKHyCTaBo1OJYix7E4OIqwogDX15 377yzJQC6GABdwsFPn3Nm/cFU1IF/UfIoXR5r3YbT/+sYWUBf6F4gnkzR/LeTDoX0IYc KZqg== MIME-Version: 1.0 X-Received: by 10.152.6.1 with SMTP id w1mr3766042law.91.1436272963706; Tue, 07 Jul 2015 05:42:43 -0700 (PDT) Received: by 10.152.219.35 with HTTP; Tue, 7 Jul 2015 05:42:43 -0700 (PDT) In-Reply-To: <559AC1E4.6050906@sneakertech.com> References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> <559A87FE.70309@kateley.com> <559AB32A.7070702@sneakertech.com> <559AC1E4.6050906@sneakertech.com> Date: Tue, 7 Jul 2015 15:42:43 +0300 Message-ID: Subject: Re: A question about ZFS built-in SMB From: Kimmo Paasiala To: Quartz Cc: freebsd-fs Content-Type: text/plain; charset=UTF-8 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 12:42:46 -0000 On Mon, Jul 6, 2015 at 8:59 PM, Quartz wrote: > >>> Oh ok. I was under the impression that Linux ZFS was basically a >>> hack/port >>> of the FreeBSD version due to licensing issues, and the FreeBSD version >>> was >>> itself a port of the illumos version. >>> >> >> http://zfsonlinux.org/faq.html#WhatAboutTheLicensingIssue >> >> There is no real licensing issue since ZFS can be compiled into a >> loadable kernel module(s) on Linux (just as it is used in FreeBSD >> anyway) and that avoids the clash between the two mutually >> incompatible licenses. > > > Did this change recently? Last time I looked was a few years ago and ZFS on > Linux was still largely an awkward hack because what they had to do to work > around the licensing issue. I heavily investigated the Debian/kFreeBSD > project before just going with mainline FreeBSD. > > > As far as I understand the main issue was that the ZFS sources couldn't be imported to the kernel source tree and built into a kernel module with the other similar kernel modules. The solution was to build the kernel module separately treating it as a 3rd party "binary blob". -Kimmo From owner-freebsd-fs@freebsd.org Tue Jul 7 13:18:05 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9EA1C995E1F for ; Tue, 7 Jul 2015 13:18:05 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: from mail-wg0-x230.google.com (mail-wg0-x230.google.com [IPv6:2a00:1450:400c:c00::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 30D6B1885 for ; Tue, 7 Jul 2015 13:18:05 +0000 (UTC) (envelope-from kraduk@gmail.com) Received: by wgbgr6 with SMTP id gr6so13831232wgb.3 for ; Tue, 07 Jul 2015 06:18:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ZRP2OFffz9Oo0KAbHlOnBNWplV4iJj2sprGGIoqV7is=; b=CC8CZcZnjEg6g5aGNhcegT6QYgtzpOxEvsVxPW6kKyZbGDcbILjJp66R1Ji6YeJ3sg eqtI2MY+ZHBqBn0OGWD2jy6i+FVD8XTTEwlWgA0qAyLpI/G3rD6hjyUlnUpw/iC+VcDe W9SkezXYBot83P4WVtOM5uDzxUpr4EeIXKqqmdec4mNNBbjqmtjZkZ/Zw7D7HTGcZRkL HpxnLPZXjVR6guwKbG6ODzJXBaVKim0+UJv9IJ42+ewMU9L+LrjOcMhRqR92//7Ge9lR sFTFmDPWg0YXPA2UbEMyayZnb5B2k9d8/I5GJ00ZpJ4GFLKNOUzLgHfiew9qIxQdWlRx ZYqQ== MIME-Version: 1.0 X-Received: by 10.180.104.197 with SMTP id gg5mr102980064wib.27.1436275081650; Tue, 07 Jul 2015 06:18:01 -0700 (PDT) Received: by 10.180.73.5 with HTTP; Tue, 7 Jul 2015 06:18:01 -0700 (PDT) In-Reply-To: References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> <559A14DB.3080905@sneakertech.com> Date: Tue, 7 Jul 2015 14:18:01 +0100 Message-ID: Subject: Re: A question about ZFS built-in SMB From: krad To: Freddie Cash Cc: Quartz , FreeBSD Filesystems Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 13:18:05 -0000 "but it's the 3rd party nfs server that actually does the sharing." You are misrepresenting here a little as the nfs server in freebsd is a core part of the OS, and not some third party addon. It has a few userland daemons that can be used however its mostly in kernel. Linux is pretty much the same as well. However the key difference between Freebsd and linux is that because no mainstream linux distro ships with production grade zfsonlinux there has been little work on the userland integration in comparison to solaris, so stuff like the auto configuration of nfs isn't always there yet. Samba is a 3rd part product for freebsd and linux though explaining why there isn't any auto configuration. freebsd # kldstat -v | grep nfs 208 nfslockd 158 nfscommon 207 nfssvc 161 nfsd 159 nfs 179 acl_nfs4 160 nfscl 206 nfslock Linux [cscott@SL1VSIKS ~]$ lsmod | grep nfs nfsd 299008 13 auth_rpcgss 61440 1 nfsd nfs_acl 16384 1 nfsd lockd 94208 1 nfsd grace 16384 2 nfsd,lockd sunrpc 327680 19 nfsd,auth_rpcgss,lockd,nfs_acl On 6 July 2015 at 15:19, Freddie Cash wrote: > On Jul 5, 2015 10:40 PM, "Quartz" wrote: > >> > >> No, actually, it isn't. :) It works in a similar manner to sharenfs on > >> FreeBSD. You still require a separate NFS server installed, and ask it > >> does it copy the info to an exports file. > >> > >> Similar for sharesmb. You still require Samba being installed on Linux. > >> All the property does is add the filesystem to a separate smb config > >> file (or something like that; never actually used it on Linux). > >> > >> You still require the NFS and SMB packages installed for your distro. > >> Same as you would for any other FS on Linux. > > > > > > > > So I'm a little confused here. > > > > On Linux, the property is active and usable but only creates the share, > meaning you still need the sever software to host it. On FreeBSD, the > property doesn't work at all, and you need the server software to do > everything...... > > Correct. > > On Solaris derivatives, you only need the OS installed and ZFS configured > in order to share filesystems via the share{nfs|smb} properties. They have > in-kernel NFS and CIFS servers. No 3rd party software required, the zfs > system does everything. > > On Linux, you need the OS, ZFS, an NFS server, and Samba installed in order > to share filesystems. You can use the share{nfs|smb} properties to > configure the shares, but it's 3rd party software (external to zfs) that > actually does the sharing. > > On FreeBSD, you need the OS, ZFS, an NFS server, and Samba installed in > order to share filesystems. You can use the sharenfs property to configure > an nfs share, but it's the 3rd party nfs server that actually does the > sharing. And you need to do everything manually via samba to share > filesystems as the sharesmb property isn't supported. > > Or, you can just ignore the share{nfs|smb} properties and do everything > manually, the way you would for any filesystem. > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Tue Jul 7 14:32:02 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 456AF995FF2 for ; Tue, 7 Jul 2015 14:32:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 2D9001E34 for ; Tue, 7 Jul 2015 14:32:02 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t67EW2F4012411 for ; Tue, 7 Jul 2015 14:32:02 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 184478] [smbfs] mount_smbfs cannot read/write files Date: Tue, 07 Jul 2015 14:32:02 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-STABLE X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Only Me X-Bugzilla-Who: gjb@FreeBSD.org X-Bugzilla-Status: In Progress X-Bugzilla-Priority: Normal X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: version Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Jul 2015 14:32:02 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=184478 Glen Barber changed: What |Removed |Added ---------------------------------------------------------------------------- Version|10.3-BETA1 |10.1-STABLE --- Comment #2 from Glen Barber --- Changing the version, since this cannot possibly affect 10.3-BETA1. It does not exist. :) -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Jul 8 06:13:10 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 31194995C30 for ; Wed, 8 Jul 2015 06:13:10 +0000 (UTC) (envelope-from marcus@odin.blazingdot.com) Received: from odin.blazingdot.com (odin.blazingdot.com [204.109.60.170]) by mx1.freebsd.org (Postfix) with ESMTP id 19B3C1AD7 for ; Wed, 8 Jul 2015 06:13:09 +0000 (UTC) (envelope-from marcus@odin.blazingdot.com) Received: by odin.blazingdot.com (Postfix, from userid 1001) id B024F1320ED; Tue, 7 Jul 2015 23:13:01 -0700 (PDT) Date: Tue, 7 Jul 2015 23:13:01 -0700 From: Marcus Reid To: Quartz Cc: FreeBSD FS Subject: Re: A question about ZFS built-in SMB Message-ID: <20150708061301.GA76767@blazingdot.com> References: <5599496C.6010702@sneakertech.com> <20150705210306.GA1048@in-addr.com> <559A08AF.9050809@sneakertech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <559A08AF.9050809@sneakertech.com> X-Coffee-Level: nearly-fatal User-Agent: Mutt/1.5.23 (2014-03-12) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 06:13:10 -0000 On Mon, Jul 06, 2015 at 12:48:47AM -0400, Quartz wrote: > On 2015-07-05 12:03 PM, Freddie Cash wrote: > > SMB support is only built-in on Solaris derivatives. You need Samba on > > everything else. > > On 2015-07-05 5:03 PM, Gary Palmer wrote: > > The sharesmb option to zfs does not work on FreeBSD. You need to use > > Samba. > > > Ok wait, it IS implemented on the Linux versions of ZFS. I thought the > FreeBSD version of ZFS superseded all the features of the Linux port? I don't think "superseded" is the word you were looking for. FreeBSD supports more feature flags: http://open-zfs.org/wiki/Feature_Flags ZFS on Linux will apparently go out and modify Samba's smb.conf for sharesmb functionality. If this seems to be an acceptable approach, I'll bet the code that does it wouldn't be too hard to port over or reinvent. Marcus From owner-freebsd-fs@freebsd.org Wed Jul 8 09:35:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B4D83995F1B for ; Wed, 8 Jul 2015 09:35:12 +0000 (UTC) (envelope-from gergely.czuczy@harmless.hu) Received: from marvin.harmless.hu (marvin.harmless.hu [195.56.55.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 6917B1518 for ; Wed, 8 Jul 2015 09:35:11 +0000 (UTC) (envelope-from gergely.czuczy@harmless.hu) Received: from business-89-133-214-250.business.broadband.hu ([89.133.214.250] helo=[10.128.1.202]) by marvin.harmless.hu with esmtpsa (TLSv1.2:DHE-RSA-AES128-SHA:128) (Exim 4.84 (FreeBSD)) (envelope-from ) id 1ZClgf-000I6h-4g for freebsd-fs@freebsd.org; Wed, 08 Jul 2015 09:30:41 +0000 To: freebsd-fs@freebsd.org From: Gergely Czuczy Subject: Crashed ZFS pool Message-ID: <559CEDC3.2040107@harmless.hu> Date: Wed, 8 Jul 2015 11:30:43 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.0.1 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 09:35:12 -0000 Hello, We have a crashed ZFS pool. Initially the system was running 8, which we've upgraded to 9, then to 10-STABLE yesterday. Upon importing the pool the system crashes with a panic. The pool used to have a file-backed zil device under /usr/zfslog, however the file size was 0 when this happened, and it used to be bigger. We've set vfs.zfs.recover=1 in /boot/loader.conf, and trying to import it with: # zpool import -fm tank But it crashes the system We've tried removing /boot/zfs/zpool.cache as well (renamed it actually), but it resulted in the same panic. # uname -a FreeBSD $x 10.2-PRERELEASE FreeBSD 10.2-PRERELEASE #0: Tue Jul 7 20:30:27 CEST 2015 toor@$x:/usr/obj/usr/src/sys/REFLECTION amd64 When running zdb -AAAFXve tank it dumps some info, then gets stuck. zdb output can be found here: http://czg.harmless.hu/zfscrash/tank.zdb-AAAFXve.script The suspicious part is: Assertion failed: zap_lookup(ddt->ddt_os, ddt->ddt_spa->spa_ddt_stat_object, name, sizeof (uint64_t), sizeof (ddt_histogram_t) / sizeof (uint64_t), &ddt->ddt_histogram[type][class]) == 0 (0x6 == 0x0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 127. Assertion failed: (ddt_object_info(ddt, type, class, &doi) == 0), file /usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/ddt.c, line 132. zdb seems to be stuck in the following state: 21697 zdb RET read 8 21697 zdb CALL _umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL _umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL read(0x4,0x7fffd7fbdf50,0x8) 21697 zdb GIO fd 4 read 8 bytes 0x0000 02be fe70 08a4 2335 |...p..#5| 21697 zdb RET read 8 21697 zdb CALL _umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL _umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL read(0x4,0x7fffd7fbdf50,0x8) 21697 zdb GIO fd 4 read 8 bytes 0x0000 459a ca93 c54b 9922 |E....K."| 21697 zdb RET read 8 21697 zdb CALL _umtx_op(0x800638608,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffd7fbde70) 21697 zdb RET _umtx_op -1 errno 60 Operation timed out 21697 zdb CALL _umtx_op(0x800638f68,UMTX_OP_WAIT_UINT_PRIVATE,0,0x18,0x7fffc7b3be80) However I wasn't able to find what's FD 4. There were no disk read errors in dmesg/messages, so i'm not sure what would be timing out. And here's a screenshot of the crash: http://czg.harmless.hu/zfscrash/zfspanic.jpg So, anyone has any idea what to do with it? It would be nice to get it back to a functional state. Or at least to a state where the data can be accessed. Thanks in advance, -czg From owner-freebsd-fs@freebsd.org Wed Jul 8 13:51:06 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48CEF995CF3 for ; Wed, 8 Jul 2015 13:51:06 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id C8B721902 for ; Wed, 8 Jul 2015 13:51:05 +0000 (UTC) (envelope-from matthew@FreeBSD.org) Received: from zero-gravitas.local (no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged)) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.1/8.15.1) with ESMTPSA id t68Dosnm062574 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Wed, 8 Jul 2015 14:50:55 +0100 (BST) (envelope-from matthew@FreeBSD.org) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=FreeBSD.org DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t68Dosnm062574 Authentication-Results: smtp.infracaninophile.co.uk/t68Dosnm062574; dkim=none reason="no signature"; dkim-adsp=none; dkim-atps=neutral X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged) claimed to be zero-gravitas.local Message-ID: <559D2AB6.5070007@FreeBSD.org> Date: Wed, 08 Jul 2015 14:50:46 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Speeding up resilvering Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="K4lg1kdEEpMeN4clwKawFbeEhR1CVa3EQ" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 13:51:06 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --K4lg1kdEEpMeN4clwKawFbeEhR1CVa3EQ Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi, I've a zpool which is taking an inordinately long time to resilver and replace a device. It had only got to about 8% completion after a day, implying over a week to resilver about 6TB data. Now, I've applied the resilver performance tuning sysctls from https://wiki.freebsd.org/ZFSTuningGuide: vfs.zfs.scrub_delay=3D0 vfs.zfs.top_maxinflight=3D128 vfs.zfs.resilver_min_time_ms=3D5000 vfs.zfs.resilver_delay=3D0 but it doesn't seem to have made a great deal of difference. Most of the benefit there would apparently come from reducing scrub_delay or resilver_delay -- but this system is only booted to single user mode, so the drives are otherwise idle and the resilver should have automatically switched to running full throttle anyhow. Given the machine is not going to be doing anything else other than resilvering for the time being, is there any more aggressive tuning or tricks to get it to go faster that people would recommend? Cheers, Matthew --K4lg1kdEEpMeN4clwKawFbeEhR1CVa3EQ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQJ8BAEBCgBmBQJVnSq+XxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQxOUYxNTRFQ0JGMTEyRTUwNTQ0RTNGMzAw MDUxM0YxMEUwQTlFNEU3AAoJEABRPxDgqeTnb1cP/ihNpXV54l9h8YCVD3El/2vu ew6Ju5WZIhGwdHY4G+Omj5kefiiW8gVBkh1MzZR6tQIqjJw7iUm1IrgQ8xFmZoxg TRYA0+V67tSYTGQS73QnKfBHhaa4mwQuQSsdyz3LWFqKt+P7c4cY4j3Wxny4FgDH /ptTrv0aeeyVIWVqo+DvgK7RIgH857YuxkXsUGRve3EjRKQtWBfLTSmL0B8KXPSn ojyplk9e9hVgPX+il9kB3m9x5qj9H9w8/84rgd0z8w4pCdub40MCNYAz6rCuZSbW C136Vp44zNwv6u4tMFgTbBZcr4qVC20HvQuXOhttSHXX3yV6qUpCKu1WQp3jcLwE We+3WnnWG9ISon/Zi+l2vLlP0dF3e7uW6j55theRORy99ZgWC03MPIibW98aX7fm cqVuY2HFzumNtb9eyUwh3Yo1TQ73CYOp6lik4djXxACwp/lfN+wkWaG9mjAn/JJD Bt8AnwZRn5Um7p32cwmZVgLGXXSazLJPbUoVmmbmbn9UFGmeqb4Bo4Qdt7aC+eoi EgqAC0T4erVYQ+Yvcq4LiZCb0kvO9qzSMrc1CbQPPuB8fZELFycniOlJSIu5f7Hh 6QYKryQXPR56YS9RDu2qQmTUU5QTRh4J2/Ri7IUTotGk1sK/Jip1501Nt4tk8zyg JadP++tZGqfEVfw/JXnS =hNKJ -----END PGP SIGNATURE----- --K4lg1kdEEpMeN4clwKawFbeEhR1CVa3EQ-- From owner-freebsd-fs@freebsd.org Wed Jul 8 14:12:20 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 96D7B996199 for ; Wed, 8 Jul 2015 14:12:20 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 44CAD15ED for ; Wed, 8 Jul 2015 14:12:19 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.14.9/8.14.8) with ESMTP id t68DwSjM045732 for ; Wed, 8 Jul 2015 08:58:28 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Wed Jul 8 08:58:28 2015 Message-ID: <559D2C7C.6060201@denninger.net> Date: Wed, 08 Jul 2015 08:58:20 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> In-Reply-To: <559D2AB6.5070007@FreeBSD.org> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms040408040201010207050800" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:12:20 -0000 This is a cryptographically signed message in MIME format. --------------ms040408040201010207050800 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/8/2015 08:50, Matthew Seaman wrote: > Hi, > > I've a zpool which is taking an inordinately long time to resilver and > replace a device. It had only got to about 8% completion after a day, > implying over a week to resilver about 6TB data. > > Now, I've applied the resilver performance tuning sysctls from > https://wiki.freebsd.org/ZFSTuningGuide: > > vfs.zfs.scrub_delay=3D0 > vfs.zfs.top_maxinflight=3D128 > vfs.zfs.resilver_min_time_ms=3D5000 > vfs.zfs.resilver_delay=3D0 > > but it doesn't seem to have made a great deal of difference. Most of > the benefit there would apparently come from reducing scrub_delay or > resilver_delay -- but this system is only booted to single user mode, s= o > the drives are otherwise idle and the resilver should have automaticall= y > switched to running full throttle anyhow. > > Given the machine is not going to be doing anything else other than > resilvering for the time being, is there any more aggressive tuning or > tricks to get it to go faster that people would recommend? > > Cheers, > > Matthew > What OS version? --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms040408040201010207050800 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9 TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv /e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4 k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1 o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm 7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+ fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+ ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko 3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8 2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc BgkqhkiG9w0BCQUxDxcNMTUwNzA4MTM1ODIwWjAjBgkqhkiG9w0BCQQxFgQUmFSS3DEelkRl +C/VcEfeteKIOwEwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgCLfLG1 evZd0f2zpKYE4hPgldzlUbdRUoko3ta7Sm2xCKRL6YqhcUHZVFx8/xuAAaJKvW1Bek0QhCyU 1ZBUQeq6SaW5sio5SOFHkko9ZMCNEFxksV3olnplfJc/P2z0+SRD1ihJD4t8yATo3f+NzppR gDWoNhqXRuD8712jXpokQD5wj3SPyll+RUcaJ/HQQiUX5MHt7LhDJDeISyv2nVppqt4ibVcJ 3ZYI1BM4MqdaMwYaFaTfRg1i/z8mNA00mHv0VRTEE263Wz5anHapNIpjHqTy5V83nMIybzmD 4RGDSM8+grgNqKjCU9xlG37IMwbj9v6796bvCoWJG0LTrztNytumbKoAsPJsQw1P+NMHMsbv 25eQC5lvXN+dNpSIiC+Ac7bGflvPd+PIUwSGytHoyJutHE7X+M12T9XnXcMVkaNBKa8OxmkE G4NTOd0rYeea+E3X5ugs17vw0yvS+MkYur3P1KKYRYowg5onVFiYBxxGYtQX2vQqahvtSKuS d7InbJhrWFo8XO0LteZDYNFmty5CVCZygPFDOI1l2mN3M8YPaZYFxD4TRvEhXuT2G3Wtws5h Wtgfd8UayBK0+YNiyoP9YQIIttSUuF6GV2b6h61ZAQhX4I7+QtzS6a8sgfwgBBGm7dLPkhIA wzz4w/hwTlefAUl87DsvVy1IdmjvnQAAAAAAAA== --------------ms040408040201010207050800-- From owner-freebsd-fs@freebsd.org Wed Jul 8 14:19:19 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 745AC9962ED for ; Wed, 8 Jul 2015 14:19:19 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [IPv6:2a00:1450:400c:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 0A42919DA; Wed, 8 Jul 2015 14:19:18 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wiwl6 with SMTP id l6so346332544wiw.0; Wed, 08 Jul 2015 07:19:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=AYYlwZRaEhVzGHCgNVxbkXR9S04Hue6NM36zIH8a5yo=; b=rrw+BmMjCG8FDxMbY/shbB8CS+vBreohCnCvUFQytREaoZwnzGkFf30b+Bf6RbSlVj RSP0U3D1EZGfD/N9SALS+1uyPBKnU12tYZ72zwO/UfqVJzzrDIB6mYmso9KXDVL2EF7N bF7ADGj2uFEGP5ZJ4B5LvomU1NfBawjyIk2ubmZTw6rdyGRHsPGqHJcceaQPmbyTADUw HusbFY6cyxFXv+WbWIqdiO2v1jy32F01mLgSyrkN86lg5Mf/xpYGsyETKfHFQZqFQ9mm gyLgcM0CvmXH+1FmQ4VaS8oeYagMCcWuhutzVwZAUbV+HdkLPR1uZ/SYQCE4RSmnmcTv H6FQ== X-Received: by 10.194.6.229 with SMTP id e5mr20897818wja.158.1436365157272; Wed, 08 Jul 2015 07:19:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Wed, 8 Jul 2015 07:18:57 -0700 (PDT) In-Reply-To: <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> References: <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Wed, 8 Jul 2015 16:18:57 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:19:19 -0000 Hi folks, I have tested Xin's patches .. Unfortunately the problem didn't go away :/ Many users are still reporting hung processes. If it would help, can you show me how to dump a network trace that would help you identify the issue ? Also, is it possible in any way to have my trusted nfs3, handle the case where every zfs /home folder is its own dataset ? On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem wrote: > Ahmed Kamal wrote: > > Hi folks, > > > > Just a quick update. I did not test Xin's patches yet .. What I did so > far > > is to increase the tcp highwater tunable and increase nfsd threads to 60. > > Today (a working day) I noticed I only got one bad sequence error > message! > > Check this: > > > > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c > > 1 messages:Jul5 > > 39 messages.1:Jun28 > > 15 messages.1:Jun29 > > 4 messages.1:Jun30 > > 9 messages.1:Jul1 > > 23 messages.1:Jul2 > > 1 messages.1:Jul4 > > 1 messages.2:Jun28 > > > > So there seems to be an improvement! Not sure if the Linux nfs4 client is > > able to somehow recover from those bad-sequence situations or not .. I > did > > get some user complaints that running "ls -l" is sometimes slow and > takes a > > couple of seconds to finish. > > > > One final question .. Do you folks think nfs4.1 is more reliable in > general > > than nfs4 .. I've always only used nfs3 (I guess it can't work here with > > /home/* being separate zfs filesystems) .. So should I go through the > pain > > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do you > > expect the protocol to be more solid ? I know it's a fluffy question, > just > > give me your thoughts. Thanks a lot! > > > All I can say is that the "bad seqid" errors should not occur, since > NFSv4.1 > doesn't use the seqid#s to order RPCs. > > Also I would say that a correctly implemented NFSv4.1 protocol should > function > "more correctly" since all RPCs and performed "exactly once". (How much > effect > this will have in practice, I can't say.) > > On the other hand, NFSv4.1 is a newer protocol (with an RFC of over > 500pages), > so it is hard to say how mature the implementations are. > I think only testing will give you the answer. > > I would suggest that you test Xi Lin's patch that allows the "seqid + 2" > case > and see if that makes the "bad seqid" errors go away. (Even though I think > this > would indicate a client bug, adding this in way that it can be enabled via > a sysctl > seems reasonable.) > > Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, rick > > > > > > > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem > wrote: > > > > > Ahmed Kamal wrote: > > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > > > > reports from users about hung vnc sessions. So maybe just maybe, > linux > > > > clients are able to somehow recover from this bad sequence messages. > I > > > > could still see the bad sequence error message in logs though > > > > > > > > Why isn't the highwater tunable set to something better by default ? > I > > > mean > > > > this server is certainly not under a high or unusual load (it's only > 40 > > > PCs > > > > mounting from it) > > > > > > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > > > email.ahmedkamal@googlemail.com > > > > > wrote: > > > > > > > > > Thanks all .. I understand now we're doing the "right thing" .. > > > Although > > > > > if mounting keeps wedging, I will have to solve it somehow! Either > > > using > > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > > > > > > > > Regarding Xin's patch, is it possible to build the patched nfsd > code, > > > as a > > > > > kernel module ? I'm looking to minimize my delta to upstream. > > > > > > > > Yes, you can build the nfsd as a module. If your kernel config does not > > > include > > > "options NFSD" the module will get loaded/used. It is also possible to > > > replace > > > the module without rebooting, but you need to kill of the nfsd daemon > then > > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > > > /boot/.) > > > > > > > > Also would adopting Xin's patch and hiding it behind a > > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not > the > > > last > > > > > person on earth to hit this) ? > > > > > > > > If it fixes your problem, I think this is reasonable. > > > I'm also hoping that someone that works on the Linux client reports > > > if/when this > > > was changed. > > > > > > rick > > > > > > > > Thanks a lot for all the help! > > > > > > > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > rmacklem@uoguelph.ca> > > > > > wrote: > > > > > > > > > >> Ahmed Kamal wrote: > > > > >> > Appreciating the fruitful discussion! Can someone please > explain to > > > me, > > > > >> > what would happen in the current situation (linux client doing > this > > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect > of > > > that? > > > > >> Well, as you've seen, the Linux client doesn't function correctly > > > against > > > > >> the FreeBSD server (and probably others that don't support this > > > > >> "skip-by-1" > > > > >> case). > > > > >> > > > > >> > What do users see? Any chances of data loss? > > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > the > > > Linux > > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the > guy > > > > >> observing > > > > >> it. > > > > >> > > > > >> > > > > > >> > Also, I find it strange that netapp have acknowledged this is a > bug > > > on > > > > >> > their side, which has been fixed since then! > > > > >> Yea, I think Netapp screwed up. For some reason their server > allowed > > > this, > > > > >> then was fixed to not allow it and then someone decided that was > > > broken > > > > >> and > > > > >> reversed it. > > > > >> > > > > >> > I also find it strange that I'm the first to hit this :) Is no > one > > > > >> running > > > > >> > nfs4 yet! > > > > >> > > > > > >> Well, it seems to be slowly catching on. I suspect that the Linux > > > client > > > > >> mounting a Netapp is the most common use of it. Since it appears > that > > > they > > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > > > > >> > > > > >> It may turn out that the Linux client has been fixed or it may > turn > > > out > > > > >> that most servers allowed this "skip-by-1" even though David > Noveck > > > (one > > > > >> of the main authors of the protocol) seems to agree with me that > it > > > should > > > > >> not be allowed. > > > > >> > > > > >> It is possible that others have bumped into this, but it wasn't > > > isolated > > > > >> (I wouldn't have guessed it, so it was good you pointed to the > RedHat > > > > >> discussion) > > > > >> and they worked around it by reverting to NFSv3 or similar. > > > > >> The protocol is rather complex in this area and changed > completely for > > > > >> NFSv4.1, > > > > >> so many have also probably moved onto NFSv4.1 where this won't be > an > > > > >> issue. > > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > > > doesn't > > > > >> use > > > > >> these seqid fields.) > > > > >> > > > > >> This is all just mho, rick > > > > >> > > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > rmacklem@uoguelph.ca> > > > > >> wrote: > > > > >> > > > > > >> > > Julian Elischer wrote: > > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > say. > > > Please > > > > >> > > > > let me know if Xin Li's patch resolves your problem, even > > > though I > > > > >> > > > > don't believe it is correct except for the UINT32_MAX > case. > > > Good > > > > >> > > > > luck with it, rick > > > > >> > > > and please keep us all in the loop as to what they say! > > > > >> > > > > > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always > N+1 > > > in a > > > > >> > > > number field that has a > > > > >> > > > bit of slack at wrap time (probably due to some ambiguity > in the > > > > >> > > > original spec). > > > > >> > > > > > > > >> > > Actually, since N is the lock op already done, N + 1 is the > next > > > lock > > > > >> > > operation in order. Since lock ops need to be strictly > ordered, > > > > >> allowing > > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > > > sense. > > > > >> > > > > > > >> > > I think the author of the RFC meant that N + 2 or greater > fails, > > > but > > > > >> it > > > > >> > > was poorly worded. > > > > >> > > > > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There > is > > > an > > > > >> archive > > > > >> > > of it somewhere, but I can't remember where.;-) > > > > >> > > > > > > >> > > rick > > > > >> > > _______________________________________________ > > > > >> > > freebsd-fs@freebsd.org mailing list > > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > >> > > To unsubscribe, send any mail to " > > > freebsd-fs-unsubscribe@freebsd.org" > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > From owner-freebsd-fs@freebsd.org Wed Jul 8 14:19:40 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3929D996312 for ; Wed, 8 Jul 2015 14:19:40 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D1B861A51 for ; Wed, 8 Jul 2015 14:19:39 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from zero-gravitas.local (no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged)) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.1/8.15.1) with ESMTPSA id t68EJSUe063085 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Wed, 8 Jul 2015 15:19:29 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=infracaninophile.co.uk DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t68EJSUe063085 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infracaninophile.co.uk; s=201001-infracaninophile; t=1436365169; bh=IENb1z9AtQvAkCOSfo4dXBcPHJAY2lKrhp5piImxFG8=; h=Date:From:To:Subject:References:In-Reply-To; z=Date:=20Wed,=2008=20Jul=202015=2015:19:19=20+0100|From:=20Matthew =20Seaman=20|To:=20freebsd-fs@fre ebsd.org|Subject:=20Re:=20Speeding=20up=20resilvering|References:= 20<559D2AB6.5070007@FreeBSD.org>=20<559D2C7C.6060201@denninger.net >|In-Reply-To:=20<559D2C7C.6060201@denninger.net>; b=v9SA5nSJ9yMpj5mNEndU0O5TnjYfoaBO2GGJ9MVtyog7pwyWdZl1HvJ5vBgp1i0CX 3JwvJJKTfIHlpwWi4ETjOAh6CdwwGxxWmGbCtnTQtDJzbwWWFSIHU2BnHEzwsyppXz jPCOOI9hn9jzQovXP2UStr3T4ptkFeae0GTRSC/Y= X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged) claimed to be zero-gravitas.local Message-ID: <559D3167.1000705@infracaninophile.co.uk> Date: Wed, 08 Jul 2015 15:19:19 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> <559D2C7C.6060201@denninger.net> In-Reply-To: <559D2C7C.6060201@denninger.net> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="0vIn11cr7atgHNNdPN6Dr7L5Aol8EfJva" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:19:40 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --0vIn11cr7atgHNNdPN6Dr7L5Aol8EfJva Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2015/07/08 14:58, Karl Denninger wrote: > On 7/8/2015 08:50, Matthew Seaman wrote: >> Hi, >> >> I've a zpool which is taking an inordinately long time to resilver and= >> replace a device. It had only got to about 8% completion after a day,= >> implying over a week to resilver about 6TB data. >> >> Now, I've applied the resilver performance tuning sysctls from >> https://wiki.freebsd.org/ZFSTuningGuide: >> >> vfs.zfs.scrub_delay=3D0 >> vfs.zfs.top_maxinflight=3D128 >> vfs.zfs.resilver_min_time_ms=3D5000 >> vfs.zfs.resilver_delay=3D0 >> >> but it doesn't seem to have made a great deal of difference. Most of >> the benefit there would apparently come from reducing scrub_delay or >> resilver_delay -- but this system is only booted to single user mode, = so >> the drives are otherwise idle and the resilver should have automatical= ly >> switched to running full throttle anyhow. >> >> Given the machine is not going to be doing anything else other than >> resilvering for the time being, is there any more aggressive tuning or= >> tricks to get it to go faster that people would recommend? >> >> Cheers, >> >> Matthew >> >=20 > What OS version? >=20 10.1-RELEASE-p10 Cheers, Matthew --0vIn11cr7atgHNNdPN6Dr7L5Aol8EfJva Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Comment: GPGTools - https://gpgtools.org iQJ8BAEBCgBmBQJVnTFvXxSAAAAAAC4AKGlzc3Vlci1mcHJAbm90YXRpb25zLm9w ZW5wZ3AuZmlmdGhob3JzZW1hbi5uZXQxOUYxNTRFQ0JGMTEyRTUwNTQ0RTNGMzAw MDUxM0YxMEUwQTlFNEU3AAoJEABRPxDgqeTnDRIP/iMrFSeTk8KOIit94mvlLrOD suQz5ZL65QSDRyrBtRsD1PSVCLgkpG+DPTIOFeWd0ES6qhZrsXKj4saaJoIsR9LN sTZhXcUxZHIhK7IItZYlo+8jVJtgsg8fiPaMU76sjPH1lc2VS9LsD71Zs0FujU80 fFvtkn0jreC9A48X/5e3e/ueh8Ny1kLxr5dUkFDnvLaoo+CUBe4GqdBXvCaAlBsS KPBeztNypJD0FCVa79XYb0RCFhD2LquzkfOmm4bhbjTJqYEeqzPOZeZcjtu1Rm5c r8mnRMEeGbnPVJQFejHxz4MO4JcTmNpKAiejHSM5WM2KrXFE8orp3dJ9xhA+2wpO WfvTMJ0W4BWnjBC+AjFE9AlV8yKsiSPlj6ntcU3QrVMoZVIvZB3mHST2g3t9rap+ SXNK3wovTb0XSTFrNFGyzFyJjJZdGw9JdDqLjaG6+IkvYUqJlE0dhskPeQDYLdRc zNQYre5cFqLr5fcSe2+SR059fI2p/CkJOvIQ5ONzL4o6JXMULxR96jK9ib8ZSsUd ENKE+1qkFtDKWv7sX4Ns6yIaRck8hibrVSsmBvJGfugBcLIubPKMr7f+dTlCe2mf iWyK+7qzZftGbJPBeVYAWHQjrsT4z+1dfaJ0jv1qfJGCRV1C8KSXF0IBR1tKf69C X5/9cBczGduqox4acfoT =oEf/ -----END PGP SIGNATURE----- --0vIn11cr7atgHNNdPN6Dr7L5Aol8EfJva-- From owner-freebsd-fs@freebsd.org Wed Jul 8 14:21:07 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8685A9964AC for ; Wed, 8 Jul 2015 14:21:07 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com [IPv6:2a00:1450:400c:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id EF5801C69; Wed, 8 Jul 2015 14:21:06 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wifm2 with SMTP id m2so91158379wif.1; Wed, 08 Jul 2015 07:21:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=1EhHi3t63se/jWQid+AX5mG+L6QhjZbv47JRuo1Byl0=; b=a1qN6xe+EbOuhrk4j1lIbC5/D0oPIwm7T63JwgSgLSAL/yUGxU0vlxf73oqSul+gty D6hmUhBTWe0uRIpmyVz+D+vSMN5HT3PFxccCdBXb9FDID6L8+EuxgCsn/tzYHhOiMP7y ifYX3fRryBiv1wYexB/+MTHErOvrRlpEtUjxSHM4XkPVIJTFdb5KKAp2OrksmATfZmhs D5vJ/DuClSpGWgsqIRfM8kn7pkIgdukUF+N/3pfjBaZh9j73lKdIZmBPlyoxJeL1WfZV QnWqDxAZV0Q/pjun2Ywf/XqN0+uDwG79uk2grMBofXNod9wS8YLLpXXXWtlBqH2BpnV8 FbYA== X-Received: by 10.194.192.33 with SMTP id hd1mr20709204wjc.96.1436365265462; Wed, 08 Jul 2015 07:21:05 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Wed, 8 Jul 2015 07:20:45 -0700 (PDT) In-Reply-To: References: <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Wed, 8 Jul 2015 16:20:45 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:21:07 -0000 Another note .. is that the linux boxes when they have hung processes .. They have a process (rpciod) taking 10-15% CPU On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal wrote: > Hi folks, > > I have tested Xin's patches .. Unfortunately the problem didn't go away :/ > Many users are still reporting hung processes. If it would help, can you > show me how to dump a network trace that would help you identify the issue ? > > Also, is it possible in any way to have my trusted nfs3, handle the case > where every zfs /home folder is its own dataset ? > > On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem > wrote: > >> Ahmed Kamal wrote: >> > Hi folks, >> > >> > Just a quick update. I did not test Xin's patches yet .. What I did so >> far >> > is to increase the tcp highwater tunable and increase nfsd threads to >> 60. >> > Today (a working day) I noticed I only got one bad sequence error >> message! >> > Check this: >> > >> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c >> > 1 messages:Jul5 >> > 39 messages.1:Jun28 >> > 15 messages.1:Jun29 >> > 4 messages.1:Jun30 >> > 9 messages.1:Jul1 >> > 23 messages.1:Jul2 >> > 1 messages.1:Jul4 >> > 1 messages.2:Jun28 >> > >> > So there seems to be an improvement! Not sure if the Linux nfs4 client >> is >> > able to somehow recover from those bad-sequence situations or not .. I >> did >> > get some user complaints that running "ls -l" is sometimes slow and >> takes a >> > couple of seconds to finish. >> > >> > One final question .. Do you folks think nfs4.1 is more reliable in >> general >> > than nfs4 .. I've always only used nfs3 (I guess it can't work here with >> > /home/* being separate zfs filesystems) .. So should I go through the >> pain >> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do >> you >> > expect the protocol to be more solid ? I know it's a fluffy question, >> just >> > give me your thoughts. Thanks a lot! >> > >> All I can say is that the "bad seqid" errors should not occur, since >> NFSv4.1 >> doesn't use the seqid#s to order RPCs. >> >> Also I would say that a correctly implemented NFSv4.1 protocol should >> function >> "more correctly" since all RPCs and performed "exactly once". (How much >> effect >> this will have in practice, I can't say.) >> >> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over >> 500pages), >> so it is hard to say how mature the implementations are. >> I think only testing will give you the answer. >> >> I would suggest that you test Xi Lin's patch that allows the "seqid + 2" >> case >> and see if that makes the "bad seqid" errors go away. (Even though I >> think this >> would indicate a client bug, adding this in way that it can be enabled >> via a sysctl >> seems reasonable.) >> >> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, >> rick >> >> > >> > >> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem >> wrote: >> > >> > > Ahmed Kamal wrote: >> > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming >> > > > reports from users about hung vnc sessions. So maybe just maybe, >> linux >> > > > clients are able to somehow recover from this bad sequence >> messages. I >> > > > could still see the bad sequence error message in logs though >> > > > >> > > > Why isn't the highwater tunable set to something better by default >> ? I >> > > mean >> > > > this server is certainly not under a high or unusual load (it's >> only 40 >> > > PCs >> > > > mounting from it) >> > > > >> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < >> > > email.ahmedkamal@googlemail.com >> > > > > wrote: >> > > > >> > > > > Thanks all .. I understand now we're doing the "right thing" .. >> > > Although >> > > > > if mounting keeps wedging, I will have to solve it somehow! Either >> > > using >> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >> > > > > >> > > > > Regarding Xin's patch, is it possible to build the patched nfsd >> code, >> > > as a >> > > > > kernel module ? I'm looking to minimize my delta to upstream. >> > > > > >> > > Yes, you can build the nfsd as a module. If your kernel config does >> not >> > > include >> > > "options NFSD" the module will get loaded/used. It is also possible to >> > > replace >> > > the module without rebooting, but you need to kill of the nfsd daemon >> then >> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In >> > > /boot/.) >> > > >> > > > > Also would adopting Xin's patch and hiding it behind a >> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not >> the >> > > last >> > > > > person on earth to hit this) ? >> > > > > >> > > If it fixes your problem, I think this is reasonable. >> > > I'm also hoping that someone that works on the Linux client reports >> > > if/when this >> > > was changed. >> > > >> > > rick >> > > >> > > > > Thanks a lot for all the help! >> > > > > >> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < >> rmacklem@uoguelph.ca> >> > > > > wrote: >> > > > > >> > > > >> Ahmed Kamal wrote: >> > > > >> > Appreciating the fruitful discussion! Can someone please >> explain to >> > > me, >> > > > >> > what would happen in the current situation (linux client doing >> this >> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the >> effect of >> > > that? >> > > > >> Well, as you've seen, the Linux client doesn't function correctly >> > > against >> > > > >> the FreeBSD server (and probably others that don't support this >> > > > >> "skip-by-1" >> > > > >> case). >> > > > >> >> > > > >> > What do users see? Any chances of data loss? >> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what >> the >> > > Linux >> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're >> the guy >> > > > >> observing >> > > > >> it. >> > > > >> >> > > > >> > >> > > > >> > Also, I find it strange that netapp have acknowledged this is >> a bug >> > > on >> > > > >> > their side, which has been fixed since then! >> > > > >> Yea, I think Netapp screwed up. For some reason their server >> allowed >> > > this, >> > > > >> then was fixed to not allow it and then someone decided that was >> > > broken >> > > > >> and >> > > > >> reversed it. >> > > > >> >> > > > >> > I also find it strange that I'm the first to hit this :) Is no >> one >> > > > >> running >> > > > >> > nfs4 yet! >> > > > >> > >> > > > >> Well, it seems to be slowly catching on. I suspect that the Linux >> > > client >> > > > >> mounting a Netapp is the most common use of it. Since it appears >> that >> > > they >> > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. >> > > > >> >> > > > >> It may turn out that the Linux client has been fixed or it may >> turn >> > > out >> > > > >> that most servers allowed this "skip-by-1" even though David >> Noveck >> > > (one >> > > > >> of the main authors of the protocol) seems to agree with me that >> it >> > > should >> > > > >> not be allowed. >> > > > >> >> > > > >> It is possible that others have bumped into this, but it wasn't >> > > isolated >> > > > >> (I wouldn't have guessed it, so it was good you pointed to the >> RedHat >> > > > >> discussion) >> > > > >> and they worked around it by reverting to NFSv3 or similar. >> > > > >> The protocol is rather complex in this area and changed >> completely for >> > > > >> NFSv4.1, >> > > > >> so many have also probably moved onto NFSv4.1 where this won't >> be an >> > > > >> issue. >> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >> > > doesn't >> > > > >> use >> > > > >> these seqid fields.) >> > > > >> >> > > > >> This is all just mho, rick >> > > > >> >> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >> rmacklem@uoguelph.ca> >> > > > >> wrote: >> > > > >> > >> > > > >> > > Julian Elischer wrote: >> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they >> say. >> > > Please >> > > > >> > > > > let me know if Xin Li's patch resolves your problem, even >> > > though I >> > > > >> > > > > don't believe it is correct except for the UINT32_MAX >> case. >> > > Good >> > > > >> > > > > luck with it, rick >> > > > >> > > > and please keep us all in the loop as to what they say! >> > > > >> > > > >> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its >> always N+1 >> > > in a >> > > > >> > > > number field that has a >> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity >> in the >> > > > >> > > > original spec). >> > > > >> > > > >> > > > >> > > Actually, since N is the lock op already done, N + 1 is the >> next >> > > lock >> > > > >> > > operation in order. Since lock ops need to be strictly >> ordered, >> > > > >> allowing >> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no >> > > sense. >> > > > >> > > >> > > > >> > > I think the author of the RFC meant that N + 2 or greater >> fails, >> > > but >> > > > >> it >> > > > >> > > was poorly worded. >> > > > >> > > >> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. >> (There is >> > > an >> > > > >> archive >> > > > >> > > of it somewhere, but I can't remember where.;-) >> > > > >> > > >> > > > >> > > rick >> > > > >> > > _______________________________________________ >> > > > >> > > freebsd-fs@freebsd.org mailing list >> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >> > > > >> > > To unsubscribe, send any mail to " >> > > freebsd-fs-unsubscribe@freebsd.org" >> > > > >> > > >> > > > >> > >> > > > >> >> > > > > >> > > > > >> > > > >> > > >> > >> > > From owner-freebsd-fs@freebsd.org Wed Jul 8 14:22:56 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8A4A0996522 for ; Wed, 8 Jul 2015 14:22:56 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 505031F68 for ; Wed, 8 Jul 2015 14:22:55 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.14.9/8.14.8) with ESMTP id t68EMsw0056880 for ; Wed, 8 Jul 2015 09:22:54 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Wed Jul 8 09:22:54 2015 Message-ID: <559D3236.1060102@denninger.net> Date: Wed, 08 Jul 2015 09:22:46 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> <559D2C7C.6060201@denninger.net> <559D3167.1000705@infracaninophile.co.uk> In-Reply-To: <559D3167.1000705@infracaninophile.co.uk> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms080401080600040505030604" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:22:56 -0000 This is a cryptographically signed message in MIME format. --------------ms080401080600040505030604 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/8/2015 09:19, Matthew Seaman wrote: > On 2015/07/08 14:58, Karl Denninger wrote: >> On 7/8/2015 08:50, Matthew Seaman wrote: >>> Hi, >>> >>> I've a zpool which is taking an inordinately long time to resilver an= d >>> replace a device. It had only got to about 8% completion after a day= , >>> implying over a week to resilver about 6TB data. >>> >>> Now, I've applied the resilver performance tuning sysctls from >>> https://wiki.freebsd.org/ZFSTuningGuide: >>> >>> vfs.zfs.scrub_delay=3D0 >>> vfs.zfs.top_maxinflight=3D128 >>> vfs.zfs.resilver_min_time_ms=3D5000 >>> vfs.zfs.resilver_delay=3D0 >>> >>> but it doesn't seem to have made a great deal of difference. Most of= >>> the benefit there would apparently come from reducing scrub_delay or >>> resilver_delay -- but this system is only booted to single user mode,= so >>> the drives are otherwise idle and the resilver should have automatica= lly >>> switched to running full throttle anyhow. >>> >>> Given the machine is not going to be doing anything else other than >>> resilvering for the time being, is there any more aggressive tuning o= r >>> tricks to get it to go faster that people would recommend? >>> >>> Cheers, >>> >>> Matthew >>> >> What OS version? >> > 10.1-RELEASE-p10 > > Cheers, > > Matthew Look at the IO saturation on the disk channel(s) involved with either systat -vm or iostat. If the channel is saturated then there's nothing you can do in terms of tuning; the question then turns to why actual I/O performance is so poor and has to be addressed there. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms080401080600040505030604 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9 TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv /e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4 k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1 o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm 7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+ fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+ ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko 3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8 2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc BgkqhkiG9w0BCQUxDxcNMTUwNzA4MTQyMjQ2WjAjBgkqhkiG9w0BCQQxFgQU2HeGWrYUg7jF OyBXVOB9SDd4fwYwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgBzQ8Dm 2Gr4EQVlsu+GVkRLVb/3EZvVR+aHr0PNYH2l1gqtl7BAmfwigG4j6UNYVxc0FoRtQ3t/zmSF 2RPiw7RJTHJE2pUvbyA1Nd1vyoayQtQ76323taGg0GVm0cthHlC47917J57SXutESWck44oO 6wnoESr8pzejSH/RCDgAYM28/SqSMhdUmbEFsuFBZqp1lrUNf0yq5UUEXLJQlKjrGdS6xdzo pNNdrecGNQSLNUCsMR1f2lLHgfPo/JmvYXftScNNobAj4r54yAQYWGtkaCxKWXCZ9obeBppP QLG8FdUlu1iRWyrq3GQIYV19J8HKXH9jME0iFgAmT1cs+NU9O/kGtqb3FnFBx6F3IeIzkx+q coCrc3NHogKV/aun1TPEn5e+AVc8l6j6qOoJaAoNImGbqrlj8+qDhappSs2gfYf4wo1O/Oq6 r2yqChq9stifHQMb9sjznkBwqF7+g03chv2kczz4upaMrOArhv9DWuBdlZfnZhVgjBRvyFHX hABT8hZgTVThcgpwSJL39ou2fCvTt/wdjbkTJZloblgzks0ZFU6wBM/JxAklUbev7zejsxKF yW45ZJcYMiPU2p2O5VIKBQyS5H23hIiYT9mX2mbtXXE6WAbaE+jNglu2VPzGvGLIIrkRHwfb 67EfDvuXojdglRlkhzTfXNLhmnFZ6wAAAAAAAA== --------------ms080401080600040505030604-- From owner-freebsd-fs@freebsd.org Wed Jul 8 14:27:59 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 98D2899658B for ; Wed, 8 Jul 2015 14:27:59 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [IPv6:2607:f3e0:0:1::12]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client CN "smarthost.sentex.ca", Issuer "smarthost.sentex.ca" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 63F241077 for ; Wed, 8 Jul 2015 14:27:59 +0000 (UTC) (envelope-from mike@sentex.net) Received: from [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a] (saphire3.sentex.ca [IPv6:2607:f3e0:0:4:f025:8813:7603:7e4a]) by smarthost1.sentex.ca (8.14.9/8.14.9) with ESMTP id t68ERwwW058738 for ; Wed, 8 Jul 2015 10:27:58 -0400 (EDT) (envelope-from mike@sentex.net) Message-ID: <559D3380.8050703@sentex.net> Date: Wed, 08 Jul 2015 10:28:16 -0400 From: Mike Tancsa Organization: Sentex Communications User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> <559D2C7C.6060201@denninger.net> <559D3167.1000705@infracaninophile.co.uk> <559D3236.1060102@denninger.net> In-Reply-To: <559D3236.1060102@denninger.net> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.75 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:27:59 -0000 On 7/8/2015 10:22 AM, Karl Denninger wrote: >>> What OS version? >>> >> 10.1-RELEASE-p10 >> >> Cheers, >> >> Matthew > Look at the IO saturation on the disk channel(s) involved with either > systat -vm or iostat. If the channel is saturated then there's nothing > you can do in terms of tuning; the question then turns to why actual I/O > performance is so poor and has to be addressed there. I had one server that was taking ages, and it turned out to be the controller, not the disk that was hosed. As Karl suggested, take a look at the throughput on gstat. Is anyone disk or groups of disks lagging far behind on write speeds ? In my case, it was a very obvious and glaring outlier. ---Mike -- ------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet services since 1994 www.sentex.net Cambridge, Ontario Canada http://www.tancsa.com/ From owner-freebsd-fs@freebsd.org Wed Jul 8 14:57:40 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3857C996A32 for ; Wed, 8 Jul 2015 14:57:40 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [IPv6:2a00:1450:400c:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B502512E6; Wed, 8 Jul 2015 14:57:39 +0000 (UTC) (envelope-from email.ahmedkamal@googlemail.com) Received: by wiclp1 with SMTP id lp1so83171833wic.0; Wed, 08 Jul 2015 07:57:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=WewXYziLqNzfN93R2XQB1uX7oljD4Q6H21edUciYGNs=; b=DkVxyVHzcML0zRDUvDnDm8nNh6bsu+g1i1RPAByPblEp+cwjdwjhGzAQe3UU315vls Sq9JljeM+GXnP1KLx1jcyq0oa/2r/m76Ndk2xwC2anID1qxrbCFAGVETkqSU20rpxOyy 7pEgZpaxNK7q4cd5d2hMa4pgT5Itfn53fgRyWGnj8yhpnJBqMe7mic3C6n3AcFUNDpBj jqp2Oi5mUvvd1bjjwo/gqmeAN5OzxImvIBbpmZu00gIzsr6v/SoTqaHLq2GMh8vhZWOj lBJ1SXmucCN23+MOoyyco9ebKw4aG3slmqiRwKARL7bSvti9p4F1RxbaDe0opMDVSxnr jKXw== X-Received: by 10.194.192.33 with SMTP id hd1mr21034299wjc.96.1436367458211; Wed, 08 Jul 2015 07:57:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.28.6.143 with HTTP; Wed, 8 Jul 2015 07:57:18 -0700 (PDT) In-Reply-To: References: <1022558302.2863702.1435838360534.JavaMail.zimbra@uoguelph.ca> <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> From: Ahmed Kamal Date: Wed, 8 Jul 2015 16:57:18 +0200 Message-ID: Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) To: Rick Macklem Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 14:57:40 -0000 I have a test rhel6 box (one that can mount nfs with vers=4.1) .. However this is an old server with no users on it .. Can you kindly show me how to stress test this mount to either induce the bad sequence error, or prove nfs-4.1 is rock solid ? If upgrading all boxes to rhel-6 and nfs-4.1 is the only way to solve this .. then so be it .. I just want to be sure it's solid before the upgrade Thanks folks! On Wed, Jul 8, 2015 at 4:20 PM, Ahmed Kamal wrote: > Another note .. is that the linux boxes when they have hung processes .. > They have a process (rpciod) taking 10-15% CPU > > On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal < > email.ahmedkamal@googlemail.com> wrote: > >> Hi folks, >> >> I have tested Xin's patches .. Unfortunately the problem didn't go away >> :/ Many users are still reporting hung processes. If it would help, can you >> show me how to dump a network trace that would help you identify the issue ? >> >> Also, is it possible in any way to have my trusted nfs3, handle the case >> where every zfs /home folder is its own dataset ? >> >> On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem >> wrote: >> >>> Ahmed Kamal wrote: >>> > Hi folks, >>> > >>> > Just a quick update. I did not test Xin's patches yet .. What I did so >>> far >>> > is to increase the tcp highwater tunable and increase nfsd threads to >>> 60. >>> > Today (a working day) I noticed I only got one bad sequence error >>> message! >>> > Check this: >>> > >>> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c >>> > 1 messages:Jul5 >>> > 39 messages.1:Jun28 >>> > 15 messages.1:Jun29 >>> > 4 messages.1:Jun30 >>> > 9 messages.1:Jul1 >>> > 23 messages.1:Jul2 >>> > 1 messages.1:Jul4 >>> > 1 messages.2:Jun28 >>> > >>> > So there seems to be an improvement! Not sure if the Linux nfs4 client >>> is >>> > able to somehow recover from those bad-sequence situations or not .. I >>> did >>> > get some user complaints that running "ls -l" is sometimes slow and >>> takes a >>> > couple of seconds to finish. >>> > >>> > One final question .. Do you folks think nfs4.1 is more reliable in >>> general >>> > than nfs4 .. I've always only used nfs3 (I guess it can't work here >>> with >>> > /home/* being separate zfs filesystems) .. So should I go through the >>> pain >>> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do >>> you >>> > expect the protocol to be more solid ? I know it's a fluffy question, >>> just >>> > give me your thoughts. Thanks a lot! >>> > >>> All I can say is that the "bad seqid" errors should not occur, since >>> NFSv4.1 >>> doesn't use the seqid#s to order RPCs. >>> >>> Also I would say that a correctly implemented NFSv4.1 protocol should >>> function >>> "more correctly" since all RPCs and performed "exactly once". (How much >>> effect >>> this will have in practice, I can't say.) >>> >>> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over >>> 500pages), >>> so it is hard to say how mature the implementations are. >>> I think only testing will give you the answer. >>> >>> I would suggest that you test Xi Lin's patch that allows the "seqid + 2" >>> case >>> and see if that makes the "bad seqid" errors go away. (Even though I >>> think this >>> would indicate a client bug, adding this in way that it can be enabled >>> via a sysctl >>> seems reasonable.) >>> >>> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, >>> rick >>> >>> > >>> > >>> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem >>> wrote: >>> > >>> > > Ahmed Kamal wrote: >>> > > > PS: Today (after adjusting tcp.highwater) I didn't get any >>> screaming >>> > > > reports from users about hung vnc sessions. So maybe just maybe, >>> linux >>> > > > clients are able to somehow recover from this bad sequence >>> messages. I >>> > > > could still see the bad sequence error message in logs though >>> > > > >>> > > > Why isn't the highwater tunable set to something better by default >>> ? I >>> > > mean >>> > > > this server is certainly not under a high or unusual load (it's >>> only 40 >>> > > PCs >>> > > > mounting from it) >>> > > > >>> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < >>> > > email.ahmedkamal@googlemail.com >>> > > > > wrote: >>> > > > >>> > > > > Thanks all .. I understand now we're doing the "right thing" .. >>> > > Although >>> > > > > if mounting keeps wedging, I will have to solve it somehow! >>> Either >>> > > using >>> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. >>> > > > > >>> > > > > Regarding Xin's patch, is it possible to build the patched nfsd >>> code, >>> > > as a >>> > > > > kernel module ? I'm looking to minimize my delta to upstream. >>> > > > > >>> > > Yes, you can build the nfsd as a module. If your kernel config does >>> not >>> > > include >>> > > "options NFSD" the module will get loaded/used. It is also possible >>> to >>> > > replace >>> > > the module without rebooting, but you need to kill of the nfsd >>> daemon then >>> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In >>> > > /boot/.) >>> > > >>> > > > > Also would adopting Xin's patch and hiding it behind a >>> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably >>> not the >>> > > last >>> > > > > person on earth to hit this) ? >>> > > > > >>> > > If it fixes your problem, I think this is reasonable. >>> > > I'm also hoping that someone that works on the Linux client reports >>> > > if/when this >>> > > was changed. >>> > > >>> > > rick >>> > > >>> > > > > Thanks a lot for all the help! >>> > > > > >>> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < >>> rmacklem@uoguelph.ca> >>> > > > > wrote: >>> > > > > >>> > > > >> Ahmed Kamal wrote: >>> > > > >> > Appreciating the fruitful discussion! Can someone please >>> explain to >>> > > me, >>> > > > >> > what would happen in the current situation (linux client >>> doing this >>> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the >>> effect of >>> > > that? >>> > > > >> Well, as you've seen, the Linux client doesn't function >>> correctly >>> > > against >>> > > > >> the FreeBSD server (and probably others that don't support this >>> > > > >> "skip-by-1" >>> > > > >> case). >>> > > > >> >>> > > > >> > What do users see? Any chances of data loss? >>> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what >>> the >>> > > Linux >>> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're >>> the guy >>> > > > >> observing >>> > > > >> it. >>> > > > >> >>> > > > >> > >>> > > > >> > Also, I find it strange that netapp have acknowledged this is >>> a bug >>> > > on >>> > > > >> > their side, which has been fixed since then! >>> > > > >> Yea, I think Netapp screwed up. For some reason their server >>> allowed >>> > > this, >>> > > > >> then was fixed to not allow it and then someone decided that was >>> > > broken >>> > > > >> and >>> > > > >> reversed it. >>> > > > >> >>> > > > >> > I also find it strange that I'm the first to hit this :) Is >>> no one >>> > > > >> running >>> > > > >> > nfs4 yet! >>> > > > >> > >>> > > > >> Well, it seems to be slowly catching on. I suspect that the >>> Linux >>> > > client >>> > > > >> mounting a Netapp is the most common use of it. Since it >>> appears that >>> > > they >>> > > > >> flip flopped w.r.t. who's bug this is, it has probably >>> persisted. >>> > > > >> >>> > > > >> It may turn out that the Linux client has been fixed or it may >>> turn >>> > > out >>> > > > >> that most servers allowed this "skip-by-1" even though David >>> Noveck >>> > > (one >>> > > > >> of the main authors of the protocol) seems to agree with me >>> that it >>> > > should >>> > > > >> not be allowed. >>> > > > >> >>> > > > >> It is possible that others have bumped into this, but it wasn't >>> > > isolated >>> > > > >> (I wouldn't have guessed it, so it was good you pointed to the >>> RedHat >>> > > > >> discussion) >>> > > > >> and they worked around it by reverting to NFSv3 or similar. >>> > > > >> The protocol is rather complex in this area and changed >>> completely for >>> > > > >> NFSv4.1, >>> > > > >> so many have also probably moved onto NFSv4.1 where this won't >>> be an >>> > > > >> issue. >>> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and >>> > > doesn't >>> > > > >> use >>> > > > >> these seqid fields.) >>> > > > >> >>> > > > >> This is all just mho, rick >>> > > > >> >>> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < >>> rmacklem@uoguelph.ca> >>> > > > >> wrote: >>> > > > >> > >>> > > > >> > > Julian Elischer wrote: >>> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: >>> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they >>> say. >>> > > Please >>> > > > >> > > > > let me know if Xin Li's patch resolves your problem, >>> even >>> > > though I >>> > > > >> > > > > don't believe it is correct except for the UINT32_MAX >>> case. >>> > > Good >>> > > > >> > > > > luck with it, rick >>> > > > >> > > > and please keep us all in the loop as to what they say! >>> > > > >> > > > >>> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its >>> always N+1 >>> > > in a >>> > > > >> > > > number field that has a >>> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity >>> in the >>> > > > >> > > > original spec). >>> > > > >> > > > >>> > > > >> > > Actually, since N is the lock op already done, N + 1 is the >>> next >>> > > lock >>> > > > >> > > operation in order. Since lock ops need to be strictly >>> ordered, >>> > > > >> allowing >>> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes >>> no >>> > > sense. >>> > > > >> > > >>> > > > >> > > I think the author of the RFC meant that N + 2 or greater >>> fails, >>> > > but >>> > > > >> it >>> > > > >> > > was poorly worded. >>> > > > >> > > >>> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. >>> (There is >>> > > an >>> > > > >> archive >>> > > > >> > > of it somewhere, but I can't remember where.;-) >>> > > > >> > > >>> > > > >> > > rick >>> > > > >> > > _______________________________________________ >>> > > > >> > > freebsd-fs@freebsd.org mailing list >>> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs >>> > > > >> > > To unsubscribe, send any mail to " >>> > > freebsd-fs-unsubscribe@freebsd.org" >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> > From owner-freebsd-fs@freebsd.org Wed Jul 8 16:20:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 72DA1996912 for ; Wed, 8 Jul 2015 16:20:51 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 19EE914F4 for ; Wed, 8 Jul 2015 16:20:50 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from ox-dell39.ox.adestra.com (no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged)) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.1/8.15.1) with ESMTPSA id t68GKbtb065258 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Wed, 8 Jul 2015 17:20:39 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=infracaninophile.co.uk DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t68GKbtb065258 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infracaninophile.co.uk; s=201001-infracaninophile; t=1436372440; bh=+Ikz/eR9WdypCqz9yso8QSSQFMnPU4AbXmBUDhpA/lw=; h=Date:From:To:Subject:References:In-Reply-To; z=Date:=20Wed,=2008=20Jul=202015=2017:20:35=20+0100|From:=20Matthew =20Seaman=20|To:=20freebsd-fs@fre ebsd.org|Subject:=20Re:=20Speeding=20up=20resilvering|References:= 20<559D2AB6.5070007@FreeBSD.org>=20<559D2C7C.6060201@denninger.net >=20<559D3167.1000705@infracaninophile.co.uk>=20<559D3236.1060102@ denninger.net>=20<559D3380.8050703@sentex.net>|In-Reply-To:=20<559 D3380.8050703@sentex.net>; b=GjhnYvSX+WZqyB3xKJBifrZQBDOUIndavDzj1Te6M3R5ga1ozwUFT3t86SPBVl9DX kqcBkmcuJcCfLK1fJ8170ZnYVNkdvyDUmhWmPxME072bTCvitbT4hmY+R2THGDnLvb HG76zJ7JAbyDSP6PpImLjtvu2QVBaSrpAyn31xaY= X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged) claimed to be ox-dell39.ox.adestra.com Message-ID: <559D4DD3.10903@infracaninophile.co.uk> Date: Wed, 08 Jul 2015 17:20:35 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> <559D2C7C.6060201@denninger.net> <559D3167.1000705@infracaninophile.co.uk> <559D3236.1060102@denninger.net> <559D3380.8050703@sentex.net> In-Reply-To: <559D3380.8050703@sentex.net> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="060uth7oEQeKXehX28voNVE36nwF4EXRs" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 16:20:51 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --060uth7oEQeKXehX28voNVE36nwF4EXRs Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 07/08/15 15:28, Mike Tancsa wrote: >> Look at the IO saturation on the disk channel(s) involved with either >> > systat -vm or iostat. If the channel is saturated then there's noth= ing >> > you can do in terms of tuning; the question then turns to why actual= I/O >> > performance is so poor and has to be addressed there. > I had one server that was taking ages, and it turned out to be the > controller, not the disk that was hosed. As Karl suggested, take a look= > at the throughput on gstat. Is anyone disk or groups of disks lagging > far behind on write speeds ? In my case, it was a very obvious and > glaring outlier. Thanks for the suggestions. I've been playing with gstat et al, and as far as I can tell, all the drives are behaving reasonably well. I'm certainly getting 90-100% capacity (mostly reads) continually on the original drives whilst the new one (mostly writes) seems to go in bursts. Which is pretty much what I'd expect when resilvering a RAIDZ. So, having exhausted that, I actually sat down and timed what progress it was making rather more carefully. Turns out my impression that applying the sysctl tweaks I mentioned previously had little effect was wrong. Current projection is about 50h total to do the resilvering, which is much, much better than the approx 12days I was expecting previously. In fact, that's pretty much inline with what I'd expect from this hardware. Cheers, Matthew --060uth7oEQeKXehX28voNVE36nwF4EXRs Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJVnU3TAAoJEABRPxDgqeTnv8AP/jYepeX0oqLGnpr8kCEbyJOM ZYyKNMqo8li86H7nyMC4gpOBd4nbrfZt7mr7phqujXTTT9oVRFL4aD6biFxmQWMJ oFmNr2btAJWKn8XneYUyJDFEPRb/4hBaLTshuDTLXygUCLf5lkjhMEcASU1EwJQf BVpAWf3T/nPT5ZhaWsL8H/mp+dgGb2Zn72SBjeXjBE7hQe02Ay4q/s9nZ9emhi1z WG8f0khUxHu45DP3coHnQpssTYTiNeT5X1S1h6dHXxDoY+3NrYFCE4MmP9/YWw7I pPtSqiXa0Val+0Tt0yVWuvswjrZxTPvrgBs+7zbclhmypfK1sK06d8rITh1xGkv8 7YDRl3o1g8a+Ef7TwFN7uhShREeJMbTlGRbi4L3IgYs/FGyla/qM2D2EhVy/X4wH DI+YI+HKlTl7peI8P69mFjMAvUQdHkdYLaumzZoJDfiuHXeCLqo81GdlDkXXsYA+ Rq0XIfkeAq7WOUmv7v/vXwooeHEIEVrO+dQJy6ytv9Z3Xr3YNs464sEq8lnjWehn ZUgfyJkgwJVXELoR3xMgZJoXNVEvHYeorIrPQqRW+mGb/gRcnPYN8Wh08QkG4z0u mwYbr5sd0fF+BqXkVIEH/6xrogW2Bo/0zyu55woiW2kRFFfYk2no0as2/5wHHIBA fowqAS61LzIy7MgpFd74 =72FI -----END PGP SIGNATURE----- --060uth7oEQeKXehX28voNVE36nwF4EXRs-- From owner-freebsd-fs@freebsd.org Wed Jul 8 16:37:19 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 59DB0996B8C for ; Wed, 8 Jul 2015 16:37:19 +0000 (UTC) (envelope-from karl@denninger.net) Received: from fs.denninger.net (wsip-70-169-168-7.pn.at.cox.net [70.169.168.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "NewFS.denninger.net", Issuer "NewFS.denninger.net" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id 2760C11FA for ; Wed, 8 Jul 2015 16:37:18 +0000 (UTC) (envelope-from karl@denninger.net) Received: from [192.168.1.40] (localhost [127.0.0.1]) by fs.denninger.net (8.14.9/8.14.8) with ESMTP id t68GbHlx022866 for ; Wed, 8 Jul 2015 11:37:17 -0500 (CDT) (envelope-from karl@denninger.net) Received: from [192.168.1.40] [192.168.1.40] (Via SSLv3 AES128-SHA) ; by Spamblock-sys (LOCAL/AUTH) Wed Jul 8 11:37:17 2015 Message-ID: <559D51B5.3050709@denninger.net> Date: Wed, 08 Jul 2015 11:37:09 -0500 From: Karl Denninger User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> <559D2C7C.6060201@denninger.net> <559D3167.1000705@infracaninophile.co.uk> <559D3236.1060102@denninger.net> <559D3380.8050703@sentex.net> <559D4DD3.10903@infracaninophile.co.uk> In-Reply-To: <559D4DD3.10903@infracaninophile.co.uk> Content-Type: multipart/signed; protocol="application/pkcs7-signature"; micalg=sha1; boundary="------------ms030104030200090303040803" X-Content-Filtered-By: Mailman/MimeDel 2.1.20 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 16:37:19 -0000 This is a cryptographically signed message in MIME format. --------------ms030104030200090303040803 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 7/8/2015 11:20, Matthew Seaman wrote: > On 07/08/15 15:28, Mike Tancsa wrote: >>> Look at the IO saturation on the disk channel(s) involved with either= >>>> systat -vm or iostat. If the channel is saturated then there's noth= ing >>>> you can do in terms of tuning; the question then turns to why actual= I/O >>>> performance is so poor and has to be addressed there. >> I had one server that was taking ages, and it turned out to be the >> controller, not the disk that was hosed. As Karl suggested, take a loo= k >> at the throughput on gstat. Is anyone disk or groups of disks lagging >> far behind on write speeds ? In my case, it was a very obvious and >> glaring outlier. > Thanks for the suggestions. I've been playing with gstat et al, and as= > far as I can tell, all the drives are behaving reasonably well. I'm > certainly getting 90-100% capacity (mostly reads) continually on the > original drives whilst the new one (mostly writes) seems to go in > bursts. Which is pretty much what I'd expect when resilvering a RAIDZ.= > > So, having exhausted that, I actually sat down and timed what progress > it was making rather more carefully. Turns out my impression that > applying the sysctl tweaks I mentioned previously had little effect was= > wrong. Current projection is about 50h total to do the resilvering, > which is much, much better than the approx 12days I was expecting > previously. In fact, that's pretty much inline with what I'd expect > from this hardware. > > Cheers, > > Matthew > > > Be aware that when a resilver /starts /it does a number of very small and diverse I/O operations. Throughput during that time _*stinks*_, and as a result in the first few minutes to half-hour or so (with a very large vdev) you will see utterly ridiculous projections of completion tim= e. Once that part of the process completes (usually within a few minutes on modest to moderate size vdevs, but it will be longer on very large ones) the performance level of the resilver will go up a lot and the projected completion time will become far more reasonable. --=20 Karl Denninger karl@denninger.net /The Market Ticker/ /[S/MIME encrypted email preferred]/ --------------ms030104030200090303040803 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9 TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv /e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4 k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1 o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm 7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+ fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+ ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko 3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8 2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc BgkqhkiG9w0BCQUxDxcNMTUwNzA4MTYzNzA5WjAjBgkqhkiG9w0BCQQxFgQUXlQIRyU8E7oX TkaE5I0JUnF50XUwbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgA4VGYV JD17B2xeT8x9fAawSjdp1wtlzdQXVOULaa5CQF3VEM9sI8crtynceHfzeH8OrQLiCZdA2U4z 0v2mu8YxSR6U79t49/aPCHqw6vJvYd5AZjNH/1DPPcu8c81YaM/tZI3R7bg5RiDXFcx0Z+hY wutsOkKLn7ZvVAB3H031BlBZI7r38cBQBAIQ6843S1OJAxyNGlwfOxc6k4p4SxJjfdaz/YHz X6NY6nxDR+AKJ5g3uhgs8LnwqTQUbiAovsLTrQZwxgUk7z8PZUQN6eKs7gUh4cz8Aw/xsck6 +Lq6qBrJrFrygBSxUyK6kflyN8lbGNJyb/fTWEr0voVX81Nyp8htewtp+wa+ov2XU8JKcTOv FDJhHkNSvKyPrkn7gHGyLL/JRpoP8X9qncAjJ2ABj7p1LLPYMbMQzs3LRjXlX8DgIOB+c+iX D3MfXmwNHAamg1fsCjD339YY/HR7bRrLAxATdopQ3/RzaBcfngtTBLWYOisDQau9AAeAXU03 PlIepUQsxU4mzam5UXPIA138npmT0IojlURL+Stn6OiVuNNGiTv/fSwr9+GYm0nTWAubdy/c OxK6ZcTJK1nj7mfTS6Rxfo3IhGjZTJP9T3hbSyGOT3uKRyP53yWa2D5Ocanqhr8EOpghUN+9 c7VL8vKASZomZAtkXd9m34oaFAcQ9gAAAAAAAA== --------------ms030104030200090303040803-- From owner-freebsd-fs@freebsd.org Wed Jul 8 16:54:29 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4065F996074 for ; Wed, 8 Jul 2015 16:54:29 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from smtp.infracaninophile.co.uk (smtp.infracaninophile.co.uk [IPv6:2001:8b0:151:1:3cd3:cd67:fafa:3d78]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.infracaninophile.co.uk", Issuer "infracaninophile.co.uk" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id D59AD15DF for ; Wed, 8 Jul 2015 16:54:28 +0000 (UTC) (envelope-from m.seaman@infracaninophile.co.uk) Received: from ox-dell39.ox.adestra.com (no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged)) (authenticated bits=0) by smtp.infracaninophile.co.uk (8.15.1/8.15.1) with ESMTPSA id t68GsHwT065943 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for ; Wed, 8 Jul 2015 17:54:17 +0100 (BST) (envelope-from m.seaman@infracaninophile.co.uk) Authentication-Results: smtp.infracaninophile.co.uk; dmarc=none header.from=infracaninophile.co.uk DKIM-Filter: OpenDKIM Filter v2.9.2 smtp.infracaninophile.co.uk t68GsHwT065943 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=infracaninophile.co.uk; s=201001-infracaninophile; t=1436374458; bh=sg6aKJ/eL4Jwr1dOKMRZQ+de+JwWTCXeo8e17okTkZQ=; h=Date:From:To:Subject:References:In-Reply-To; z=Date:=20Wed,=2008=20Jul=202015=2017:54:10=20+0100|From:=20Matthew =20Seaman=20|To:=20freebsd-fs@fre ebsd.org|Subject:=20Re:=20Speeding=20up=20resilvering|References:= 20<559D2AB6.5070007@FreeBSD.org>=20<559D2C7C.6060201@denninger.net >=20<559D3167.1000705@infracaninophile.co.uk>=20<559D3236.1060102@ denninger.net>=20<559D3380.8050703@sentex.net>=20<559D4DD3.10903@i nfracaninophile.co.uk>=20<559D51B5.3050709@denninger.net>|In-Reply -To:=20<559D51B5.3050709@denninger.net>; b=WfmmxVH3kUqb7GKellqkp62uwX4V/GVPOKG8U49j5o8E4i6hIdK2l8xvkHtx6+jus dyHP4USBTVVgh/WgTThqIcNFqiCAXa+UeE1YfAR6jjhGIBj6M4ZDc5BoLMPYiC5KOu ufL8puwLvjj0DfB2QZyd5iY6BZLS8oRus73cF7uI= X-Authentication-Warning: lucid-nonsense.infracaninophile.co.uk: Host no-reverse-dns.metronet-uk.com [85.199.232.226] (may be forged) claimed to be ox-dell39.ox.adestra.com Message-ID: <559D55B2.3010607@infracaninophile.co.uk> Date: Wed, 08 Jul 2015 17:54:10 +0100 From: Matthew Seaman User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Speeding up resilvering References: <559D2AB6.5070007@FreeBSD.org> <559D2C7C.6060201@denninger.net> <559D3167.1000705@infracaninophile.co.uk> <559D3236.1060102@denninger.net> <559D3380.8050703@sentex.net> <559D4DD3.10903@infracaninophile.co.uk> <559D51B5.3050709@denninger.net> In-Reply-To: <559D51B5.3050709@denninger.net> Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="9dcXhIptaMNTSJ1xpS9EbFKokD1psgkg6" X-Virus-Scanned: clamav-milter 0.98.7 at lucid-nonsense.infracaninophile.co.uk X-Virus-Status: Clean X-Spam-Status: No, score=-1.5 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU autolearn=ham autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on lucid-nonsense.infracaninophile.co.uk X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 16:54:29 -0000 This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --9dcXhIptaMNTSJ1xpS9EbFKokD1psgkg6 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 07/08/15 17:37, Karl Denninger wrote: > Be aware that when a resilver /starts /it does a number of very small > and diverse I/O operations. Throughput during that time _*stinks*_, an= d > as a result in the first few minutes to half-hour or so (with a very > large vdev) you will see utterly ridiculous projections of completion t= ime. >=20 > Once that part of the process completes (usually within a few minutes o= n > modest to moderate size vdevs, but it will be longer on very large ones= ) > the performance level of the resilver will go up a lot and the projecte= d > completion time will become far more reasonable. Yeah. This resilver has been running for over a day already, and it is refusing to project what the finishing time will be, mostly because I had to yank it out of the DC in order to substitute another machine to do what this one should have been doing. (So it spent a night powered down in he middle of the resilver operation.) The VDEV is about 20TB which I suppose is pretty big, but possibly not what some ZFS practicioners would call "very large..." 6 x 4TB drives in a RAIDZ Cheers, Matthew --9dcXhIptaMNTSJ1xpS9EbFKokD1psgkg6 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJVnVWyAAoJEABRPxDgqeTnn98QALaFPeYM3UAqDvgmNmDGVvcc XFfxcVZ55TQvxnNeG6za2qkcWwgnnRUyjPPq+F89bKsnTcng0BujoYBIbZ6HIGum GjOh5i6k/yZ+AEvuFd0BnXP9LwoEnXXhKJp+bCGrCmfHOPoettRZr+meYNpv+2eJ YbKmXrYpyEziuSLz8yU48rxXjlsQqUxFKscciKM/1J1vFx7UEkpI9bHY5DYgzGsc g2dxXhizlWtShVvB+TjJKD+NdGvKI9xPd6Hkr/d3cspm9nEg6yF2qLm7OToxzgoi Dsc4Awr6kfrGp2QFS7rP4EH29CQWiRoXoP8Qyk5nrVhUzz6N36T61CMLu84VI54J KeQohL7rkZwgEx2vswl+qrs/HCIHrJAKMc8Gw3SOm7itSw+JbWvX3pWShYEcc50p ivnmFHn+B1OLI4gK+Wg314qDH4KNGBaJqTyKMWKqdxc6Tp/gFrrh71qFau/v74av zV37fZXcN1QfYqqyrhibO9xkMeXV37sPabrgELoOkUpZmiIbpWCAQ5X9J+zSoZPJ bNGbBYswPunZGGAl8OP+lObubc9yry9OV06tGrjnzCqnIqhACqvyk7rSEJBqFdgd 1A1hf79Hb0YCWCNbrTMsKUXduyPJIrdU2qqCorLo9DaG+X3DFwjev9UOehxJwWvU 3WEJL6rrDXbvB99RwQFo =0Wk2 -----END PGP SIGNATURE----- --9dcXhIptaMNTSJ1xpS9EbFKokD1psgkg6-- From owner-freebsd-fs@freebsd.org Wed Jul 8 18:12:25 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B8E7E995470 for ; Wed, 8 Jul 2015 18:12:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A54B61EA6 for ; Wed, 8 Jul 2015 18:12:25 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t68ICPGH021848 for ; Wed, 8 Jul 2015 18:12:25 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 194587] [zfs] kernel panic when running zpool/add/open-f_type_mismatch.t Date: Wed, 08 Jul 2015 18:12:25 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: gjb@FreeBSD.org X-Bugzilla-Status: Closed X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 18:12:25 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194587 Glen Barber changed: What |Removed |Added ---------------------------------------------------------------------------- Status|New |Closed Resolution|--- |FIXED --- Comment #2 from Glen Barber --- Close PRs that have had a corresponding fix committed. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Jul 8 18:12:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0B60C9954C7 for ; Wed, 8 Jul 2015 18:12:51 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id EBBC4107C for ; Wed, 8 Jul 2015 18:12:50 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t68ICoa0040817 for ; Wed, 8 Jul 2015 18:12:50 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 194589] [zfs] kernel panic when running zpool/create/files.t Date: Wed, 08 Jul 2015 18:12:51 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: gjb@FreeBSD.org X-Bugzilla-Status: Closed X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 18:12:51 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194589 Glen Barber changed: What |Removed |Added ---------------------------------------------------------------------------- Status|New |Closed Resolution|--- |FIXED --- Comment #2 from Glen Barber --- Close PRs that have had a corresponding fix committed. -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Jul 8 18:32:10 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BC80D9959FD for ; Wed, 8 Jul 2015 18:32:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A828E1E16 for ; Wed, 8 Jul 2015 18:32:10 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t68IWAtU057799 for ; Wed, 8 Jul 2015 18:32:10 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 194586] [zfs] kernel panic when running zpool/add/option-f_size_mismatch.t Date: Wed, 08 Jul 2015 18:32:10 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 11.0-CURRENT X-Bugzilla-Keywords: X-Bugzilla-Severity: Affects Some People X-Bugzilla-Who: gjb@FreeBSD.org X-Bugzilla-Status: New X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 18:32:10 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=194586 --- Comment #3 from Glen Barber --- To originators/assignees of this PR: A commit to the tree references this PR, however the PR is still in a non-closed state. Please review this PR and close as appropriate, or if closing the PR requires a merge to stable/10, please let re@ know as soon as possible. Thank you. Glen -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Jul 8 18:32:18 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7911995A89 for ; Wed, 8 Jul 2015 18:32:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from kenobi.freebsd.org (kenobi.freebsd.org [IPv6:2001:1900:2254:206a::16:76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B3AA51EF1 for ; Wed, 8 Jul 2015 18:32:18 +0000 (UTC) (envelope-from bugzilla-noreply@freebsd.org) Received: from bugs.freebsd.org ([127.0.1.118]) by kenobi.freebsd.org (8.14.9/8.14.9) with ESMTP id t68IWI7o057990 for ; Wed, 8 Jul 2015 18:32:18 GMT (envelope-from bugzilla-noreply@freebsd.org) From: bugzilla-noreply@freebsd.org To: freebsd-fs@FreeBSD.org Subject: [Bug 193803] fix zvol rename failing due to out of order locking Date: Wed, 08 Jul 2015 18:32:19 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: Base System X-Bugzilla-Component: kern X-Bugzilla-Version: 10.1-RELEASE X-Bugzilla-Keywords: patch X-Bugzilla-Severity: Affects Many People X-Bugzilla-Who: gjb@FreeBSD.org X-Bugzilla-Status: Open X-Bugzilla-Priority: --- X-Bugzilla-Assigned-To: freebsd-fs@FreeBSD.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: https://bugs.freebsd.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 18:32:18 -0000 https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193803 --- Comment #35 from Glen Barber --- To originators/assignees of this PR: A commit to the tree references this PR, however the PR is still in a non-closed state. Please review this PR and close as appropriate, or if closing the PR requires a merge to stable/10, please let re@ know as soon as possible. Thank you. Glen -- You are receiving this mail because: You are the assignee for the bug. From owner-freebsd-fs@freebsd.org Wed Jul 8 22:07:17 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 77DED996509 for ; Wed, 8 Jul 2015 22:07:17 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22e.google.com (mail-wg0-x22e.google.com [IPv6:2a00:1450:400c:c00::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 116223084; Wed, 8 Jul 2015 22:07:17 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgov12 with SMTP id v12so23027257wgo.1; Wed, 08 Jul 2015 15:07:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=xPDiIhuPmA9c8YtF3XJrW3Tfst+wQXiSu7NJ7Yjzk0I=; b=k52YzkcwzSlJUXp8pr7BnGp+CTjX5sBPz9SnsZIiuC7aP6ZhU8K6dXsfQGIwxnv17z RxFF+bF4paILRlXoMVykrMNEicenLnY3d1szT5TKIH/+djgICpJ8XG8c72C6jwvmLDx8 A9NRjAweOGN23lr0aOvIuWgn0/ilHO9Uk0uML5CyjrTaufOCbCh0aV+UU52WeFTh3TOH 5lqNfoCNoqHgsz0LB+148/yrTLZPAhQWkergRUV0e52sg4q/LZki3REpuyv2XwUp5Nf6 M7RL94/OS+0x/0EyBNPABz699jM4tToYuYRtnKphWgVTgSDa9SGO8KtjRgm6HTfkUyER d6GQ== X-Received: by 10.180.82.199 with SMTP id k7mr55241414wiy.54.1436393235602; Wed, 08 Jul 2015 15:07:15 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id fo17sm5483921wjc.46.2015.07.08.15.07.13 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2015 15:07:14 -0700 (PDT) From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: [PATCH 0/4] namei + audit changes to prepare for getting rid of filedesc lock Date: Thu, 9 Jul 2015 00:07:07 +0200 Message-Id: <1436393231-5831-1-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <20150707085857.GZ2080@kib.kiev.ua> References: <20150707085857.GZ2080@kib.kiev.ua> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 22:07:17 -0000 From: Mateusz Guzik On Tue, Jul 07, 2015 at 11:58:57AM +0300, Konstantin Belousov wrote: > On Mon, Jul 06, 2015 at 05:07:14AM +0200, Mateusz Guzik wrote: > > From: Mateusz Guzik > > > > namei used to vref fd_cdir, which was immediatley vrele'd on entry to > > the loop. > Does it make sense to do this, if the other patch, for interlock-less > vref/vrele on holdcount > 0, is in progress ? > Well it is optional, but I would argue it makes the code more readable. It also simplifies future code which may remove the need to vref root vnode for lookups. > > > > Simplify error handling and remove type checking for ni_startdir vnode. > > It is only set by nfs which does the check on its own. Assert the > > correct type instead. > > --- > > sys/kern/vfs_lookup.c | 92 ++++++++++++++++++++++++++++----------------------- > > 1 file changed, 51 insertions(+), 41 deletions(-) > > > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > > index 5dc07dc..c5218ec 100644 > > --- a/sys/kern/vfs_lookup.c > > +++ b/sys/kern/vfs_lookup.c > > @@ -109,6 +109,27 @@ namei_cleanup_cnp(struct componentname *cnp) > > #endif > > } > > > > +static int > > +namei_handle_root(struct nameidata *ndp, struct vnode **dpp) > > +{ > > + struct componentname *cnp = &ndp->ni_cnd; > > + > > + if (ndp->ni_strictrelative != 0) { > > +#ifdef KTRACE > > + if (KTRPOINT(curthread, KTR_CAPFAIL)) > > + ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); > > +#endif > > + return (ENOTCAPABLE); > > + } > > + while (*(cnp->cn_nameptr) == '/') { > > + cnp->cn_nameptr++; > > + ndp->ni_pathlen--; > > + } > > + *dpp = ndp->ni_rootdir; > > + VREF(*dpp); > > + return (0); > > +} > > + > > /* > > * Convert a pathname into a pointer to a locked vnode. > > * > > @@ -148,6 +169,8 @@ namei(struct nameidata *ndp) > > ("namei: nameiop contaminated with flags")); > > KASSERT((cnp->cn_flags & OPMASK) == 0, > > ("namei: flags contaminated with nameiops")); > > + if (ndp->ni_startdir != NULL) > > + MPASS(ndp->ni_startdir->v_type == VDIR); > ni_startdir is not locked, am I correct ? If yes, the assert is not safe. > Added a || v_type == BAD check. > > if (!lookup_shared) > > cnp->cn_flags &= ~LOCKSHARED; > > fdp = p->p_fd; > > Could this patch be further split ? E.g. could the introduction of the > namei_handle_root() and its use twice be done in the first patch, while > the loop logic reorganization come into the follow-up ? > > As it is now, the patch is almost impossible to review without rewriting > the logic independently. Patch split. I completely forgot about a pre-existing bug with a use-after-free of fd_rdir vnode when writing the previous patchset. see the first patch in this one. Mateusz Guzik (4): vfs: plug a use-after-free of fd_rdir in namei vfs: avoid spurious vref/vrele for absolute lookups vfs: simplify error handling in namei audit: utilize vnode pointer found by namei instead of looking it up again sys/kern/vfs_lookup.c | 127 +++++++++++++++++++++--------------- sys/security/audit/audit.h | 14 ++++ sys/security/audit/audit_arg.c | 36 ++++++++++ sys/security/audit/audit_bsm_klib.c | 82 +++++++++++++++-------- sys/security/audit/audit_private.h | 2 + 5 files changed, 181 insertions(+), 80 deletions(-) -- 2.4.5 From owner-freebsd-fs@freebsd.org Wed Jul 8 22:07:19 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 63B6F996515 for ; Wed, 8 Jul 2015 22:07:19 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22c.google.com (mail-wg0-x22c.google.com [IPv6:2a00:1450:400c:c00::22c]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id F20F3308A; Wed, 8 Jul 2015 22:07:18 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgov12 with SMTP id v12so23027652wgo.1; Wed, 08 Jul 2015 15:07:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vVh5qwdnhdbrklFRsIxSAWyra/kw/uYJn6cJi0fxIqA=; b=sNLqDpWqF6QTpytS81+xazo3aBfQBLQPwSCOi4jTTEAT71RH7Wmy++/Y1Wg/2bQYo9 x3nhZgITngAW6MHGZdWIpa2hkvemFm8pomzEkYLYgX1Oo3xbjdtgXsKs/W/62B1xMAr4 15WkZ8hUJ8bxEzDm5D+qOtH4baDaMCrRCDxxealmP83scliZO4KuNilwmBD0QqQ4BuY+ JAVki2k/UqXu4jkoxOJuDvs9rT9qphsAmYENBLwkQmS8tHaWSV0YiwbhsC0ehDJOmHOZ KKcwsvpNtX5KRl0trFH4LLiBdCSvxIcHlpwhJYh0KPvidxZv3TyzVveTix9S4UISXTIX /sNA== X-Received: by 10.194.220.100 with SMTP id pv4mr25141887wjc.71.1436393237190; Wed, 08 Jul 2015 15:07:17 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id fo17sm5483921wjc.46.2015.07.08.15.07.15 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2015 15:07:16 -0700 (PDT) From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: [PATCH 1/4] vfs: plug a use-after-free of fd_rdir in namei Date: Thu, 9 Jul 2015 00:07:08 +0200 Message-Id: <1436393231-5831-2-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436393231-5831-1-git-send-email-mjguzik@gmail.com> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 22:07:19 -0000 From: Mateusz Guzik fd_rdir vnode was stored in ni_rootdir without refing it in any way, after which the filedsc lock was being dropped. The vnode could have been freed by mountcheckdirs or another thread doing chroot. VREF the vnode while the lock is held. MFC after: 1 week --- sys/kern/vfs_lookup.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 5dc07dc..20f8e96 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -210,6 +210,7 @@ namei(struct nameidata *ndp) */ FILEDESC_SLOCK(fdp); ndp->ni_rootdir = fdp->fd_rdir; + VREF(ndp->ni_rootdir); ndp->ni_topdir = fdp->fd_jdir; /* @@ -260,6 +261,7 @@ namei(struct nameidata *ndp) } } if (error) { + vrele(ndp->ni_rootdir); namei_cleanup_cnp(cnp); return (error); } @@ -286,6 +288,7 @@ namei(struct nameidata *ndp) if (KTRPOINT(curthread, KTR_CAPFAIL)) ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); #endif + vrele(ndp->ni_rootdir); namei_cleanup_cnp(cnp); return (ENOTCAPABLE); } @@ -299,6 +302,7 @@ namei(struct nameidata *ndp) ndp->ni_startdir = dp; error = lookup(ndp); if (error) { + vrele(ndp->ni_rootdir); namei_cleanup_cnp(cnp); SDT_PROBE(vfs, namei, lookup, return, error, NULL, 0, 0, 0); @@ -308,6 +312,7 @@ namei(struct nameidata *ndp) * If not a symbolic link, we're done. */ if ((cnp->cn_flags & ISSYMLINK) == 0) { + vrele(ndp->ni_rootdir); if ((cnp->cn_flags & (SAVENAME | SAVESTART)) == 0) { namei_cleanup_cnp(cnp); } else @@ -371,6 +376,7 @@ namei(struct nameidata *ndp) vput(ndp->ni_vp); dp = ndp->ni_dvp; } + vrele(ndp->ni_rootdir); namei_cleanup_cnp(cnp); vput(ndp->ni_vp); ndp->ni_vp = NULL; -- 2.4.5 From owner-freebsd-fs@freebsd.org Wed Jul 8 22:07:22 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F31C99652D for ; Wed, 8 Jul 2015 22:07:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22b.google.com (mail-wg0-x22b.google.com [IPv6:2a00:1450:400c:c00::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 08ADD30D2; Wed, 8 Jul 2015 22:07:22 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgjx7 with SMTP id x7so207479549wgj.2; Wed, 08 Jul 2015 15:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jvj7cH8+cgwNkf73qZezOjznS2iqXNeBVNnPWVldj3k=; b=ASCzcuo5Kon1sQKTnLWubUvJ3WzTPI7WB+i9tuJZ0dKTnu4hvK2rnENiv5F+wz9T7L KOIIvsI7NmcjKSeDdw9hnOEuhiOIvz3HKrxG2MwGV8jkhwPk56kFeZjiXqGVcJcnqFln 8+vN8UVeu4ZBRw22bhEMmfYF4h0ibXechqc4gwXN5e2MH6O/dl1YXJWssw09ZcJcgJn1 YSMZNztdTgczAp+MRDKMYNWp/+Qbmq0YFvpmWhhdl6KooeB1/xlJm/hYM8GdfQCDllaK 3OSsEtc85c5hp08xqlhpqDdCfAPSFZRvW16AtnznxjpQFL5bIjGJmTZP/N284t+dylS0 U5vA== X-Received: by 10.194.9.161 with SMTP id a1mr23408484wjb.39.1436393240551; Wed, 08 Jul 2015 15:07:20 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id fo17sm5483921wjc.46.2015.07.08.15.07.19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2015 15:07:19 -0700 (PDT) From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: [PATCH 3/4] vfs: simplify error handling in namei Date: Thu, 9 Jul 2015 00:07:10 +0200 Message-Id: <1436393231-5831-4-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436393231-5831-1-git-send-email-mjguzik@gmail.com> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 22:07:22 -0000 From: Mateusz Guzik The logic is reorganised so that there is one exit point prior to the lookup loop. This is an intermediate step to making audit logging functions use found vnode instead of translating ni_dirfd on their own. ni_startdir validation is removed. The only in-tree consumer is nfs which already makes sure it is a directory. --- sys/kern/vfs_lookup.c | 50 +++++++++++++++++++++----------------------------- 1 file changed, 21 insertions(+), 29 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index e434464..d48fcff 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) struct vnode *dp; /* the directory we are searching */ struct iovec aiov; /* uio for reading symbolic links */ struct uio auio; - int error, linklen; + int error, linklen, startdir_used; struct componentname *cnp = &ndp->ni_cnd; struct thread *td = cnp->cn_thread; struct proc *p = td->td_proc; @@ -169,6 +169,9 @@ namei(struct nameidata *ndp) ("namei: nameiop contaminated with flags")); KASSERT((cnp->cn_flags & OPMASK) == 0, ("namei: flags contaminated with nameiops")); + if (ndp->ni_startdir != NULL) + MPASS(ndp->ni_startdir->v_type == VDIR || + ndp->ni_startdir->v_type == VBAD); if (!lookup_shared) cnp->cn_flags &= ~LOCKSHARED; fdp = p->p_fd; @@ -242,23 +245,19 @@ namei(struct nameidata *ndp) if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); + startdir_used = 0; dp = NULL; cnp->cn_nameptr = cnp->cn_pnbuf; if (cnp->cn_pnbuf[0] == '/') { error = namei_handle_root(ndp, &dp); - FILEDESC_SUNLOCK(fdp); - if (error != 0) { - vrele(ndp->ni_rootdir); - if (ndp->ni_startdir != NULL) - vrele(ndp->ni_startdir); - namei_cleanup_cnp(cnp); - return (error); - } } else { if (ndp->ni_startdir != NULL) { dp = ndp->ni_startdir; - error = 0; - } else if (ndp->ni_dirfd != AT_FDCWD) { + startdir_used = 1; + } else if (ndp->ni_dirfd == AT_FDCWD) { + dp = fdp->fd_cdir; + VREF(dp); + } else { cap_rights_t rights; rights = ndp->ni_rightsneeded; @@ -285,25 +284,18 @@ namei(struct nameidata *ndp) } #endif } - if (error != 0 || dp != NULL) { - FILEDESC_SUNLOCK(fdp); - if (error == 0 && dp->v_type != VDIR) { - vrele(dp); - error = ENOTDIR; - } - } - if (error) { - vrele(ndp->ni_rootdir); - namei_cleanup_cnp(cnp); - return (error); - } + if (error == 0 && dp->v_type != VDIR) + error = ENOTDIR; } - if (dp == NULL) { - dp = fdp->fd_cdir; - VREF(dp); - FILEDESC_SUNLOCK(fdp); - if (ndp->ni_startdir != NULL) - vrele(ndp->ni_startdir); + FILEDESC_SUNLOCK(fdp); + if (ndp->ni_startdir != NULL && !startdir_used) + vrele(ndp->ni_startdir); + if (error != 0) { + if (dp != NULL) + vrele(dp); + vrele(ndp->ni_rootdir); + namei_cleanup_cnp(cnp); + return (error); } SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, cnp->cn_flags, 0, 0); -- 2.4.5 From owner-freebsd-fs@freebsd.org Wed Jul 8 22:07:21 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 259CA996520 for ; Wed, 8 Jul 2015 22:07:21 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22d.google.com (mail-wg0-x22d.google.com [IPv6:2a00:1450:400c:c00::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B3B5E30A0; Wed, 8 Jul 2015 22:07:20 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgxm20 with SMTP id m20so24334281wgx.3; Wed, 08 Jul 2015 15:07:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=U8n52GDhp+0XaI8eZaqbkqW1/W004y/nkmgvLLFcxh8=; b=nFd+kfLfkvrSiSaRnh+2EjiBACC9mepQXcboRsS5bgRlpwilj9/E0cTUjXrEVJGi72 aa5OdVMq75R6eitqeTMB9o6xd4mOo1slopxjEoUIAOVGhrjXWn8AwHoFWHwkn51t+ptJ rCk1XNmq5yyE0fK+dbDKGsIuD2dCMHKnjiRuwYZ/peNxS3zB9sOzSpkhBuUHVXcqaPpA HIle3L9TtofoQR6t5zKRyvw0FxcOhteF5r41p9wiA//EqyGFfeqEKMOIHX7Z6NFYS9B1 OrCKFY4iRYg4uAOwGW79v+OUf7awObljbRfN82NTXeq3f85ishP/gfbmUQEg/Q+1RRbq cHZw== X-Received: by 10.181.12.20 with SMTP id em20mr29528246wid.28.1436393239034; Wed, 08 Jul 2015 15:07:19 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id fo17sm5483921wjc.46.2015.07.08.15.07.17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2015 15:07:17 -0700 (PDT) From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: [PATCH 2/4] vfs: avoid spurious vref/vrele for absolute lookups Date: Thu, 9 Jul 2015 00:07:09 +0200 Message-Id: <1436393231-5831-3-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436393231-5831-1-git-send-email-mjguzik@gmail.com> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 22:07:21 -0000 From: Mateusz Guzik namei used to vref fd_cdir, which was immediatley vrele'd on entry to the loop. Check for absolute lookup and vref the right vnode the first time. --- sys/kern/vfs_lookup.c | 70 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 46 insertions(+), 24 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index 20f8e96..e434464 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -109,6 +109,27 @@ namei_cleanup_cnp(struct componentname *cnp) #endif } +static int +namei_handle_root(struct nameidata *ndp, struct vnode **dpp) +{ + struct componentname *cnp = &ndp->ni_cnd; + + if (ndp->ni_strictrelative != 0) { +#ifdef KTRACE + if (KTRPOINT(curthread, KTR_CAPFAIL)) + ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); +#endif + return (ENOTCAPABLE); + } + while (*(cnp->cn_nameptr) == '/') { + cnp->cn_nameptr++; + ndp->ni_pathlen--; + } + *dpp = ndp->ni_rootdir; + VREF(*dpp); + return (0); +} + /* * Convert a pathname into a pointer to a locked vnode. * @@ -222,7 +243,18 @@ namei(struct nameidata *ndp) AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); dp = NULL; - if (cnp->cn_pnbuf[0] != '/') { + cnp->cn_nameptr = cnp->cn_pnbuf; + if (cnp->cn_pnbuf[0] == '/') { + error = namei_handle_root(ndp, &dp); + FILEDESC_SUNLOCK(fdp); + if (error != 0) { + vrele(ndp->ni_rootdir); + if (ndp->ni_startdir != NULL) + vrele(ndp->ni_startdir); + namei_cleanup_cnp(cnp); + return (error); + } + } else { if (ndp->ni_startdir != NULL) { dp = ndp->ni_startdir; error = 0; @@ -276,29 +308,6 @@ namei(struct nameidata *ndp) SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, cnp->cn_flags, 0, 0); for (;;) { - /* - * Check if root directory should replace current directory. - * Done at start of translation and after symbolic link. - */ - cnp->cn_nameptr = cnp->cn_pnbuf; - if (*(cnp->cn_nameptr) == '/') { - vrele(dp); - if (ndp->ni_strictrelative != 0) { -#ifdef KTRACE - if (KTRPOINT(curthread, KTR_CAPFAIL)) - ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); -#endif - vrele(ndp->ni_rootdir); - namei_cleanup_cnp(cnp); - return (ENOTCAPABLE); - } - while (*(cnp->cn_nameptr) == '/') { - cnp->cn_nameptr++; - ndp->ni_pathlen--; - } - dp = ndp->ni_rootdir; - VREF(dp); - } ndp->ni_startdir = dp; error = lookup(ndp); if (error) { @@ -375,6 +384,19 @@ namei(struct nameidata *ndp) ndp->ni_pathlen += linklen; vput(ndp->ni_vp); dp = ndp->ni_dvp; + /* + * Check if root directory should replace current directory. + */ + cnp->cn_nameptr = cnp->cn_pnbuf; + if (*(cnp->cn_nameptr) == '/') { + vrele(dp); + error = namei_handle_root(ndp, &dp); + if (error != 0) { + vrele(ndp->ni_rootdir); + namei_cleanup_cnp(cnp); + return (error); + } + } } vrele(ndp->ni_rootdir); namei_cleanup_cnp(cnp); -- 2.4.5 From owner-freebsd-fs@freebsd.org Wed Jul 8 22:07:24 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2D05B996548 for ; Wed, 8 Jul 2015 22:07:24 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x229.google.com (mail-wi0-x229.google.com [IPv6:2a00:1450:400c:c05::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id AD44130EC; Wed, 8 Jul 2015 22:07:23 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wiclp1 with SMTP id lp1so93150230wic.0; Wed, 08 Jul 2015 15:07:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=D4zuIoia69j1TtLKGQrffm2srmPYfbFbX+THxuMbwv0=; b=nTweBj1PSMgsZy9uc3FpSaRVF1KOuSkhYZvsJCoppidcfySHjAkn+GK5RQKqyYZ+cm wv7GKfXEON7/Iy303Dc5WeOnlCuAqYd/IdZAmPlCTHELHql1xwCYp9UQiwH4flvunppu 7dgbtfyt7gMSUyNG+URqZYX8t/aDk639TMuJ38IVN+2OhzBSHhRhNViMWIhmKzg7Piym i1s1Y6J/H/BEySmlFGNU7Ufhs2M2recwpMiOeP/iW7mu8vz0A7ntJCBhRZv0QI1ulwG8 KDvEEWFDU7gJkMi6vMbsgK/UBeeIeQ9dn8Zbq2+U4+Wql5ScXbURht6EtCOi1u9t5d9F lJbw== X-Received: by 10.194.192.72 with SMTP id he8mr24125279wjc.11.1436393242272; Wed, 08 Jul 2015 15:07:22 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id fo17sm5483921wjc.46.2015.07.08.15.07.20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 08 Jul 2015 15:07:21 -0700 (PDT) From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: [PATCH 4/4] audit: utilize vnode pointer found by namei instead of looking it up again Date: Thu, 9 Jul 2015 00:07:11 +0200 Message-Id: <1436393231-5831-5-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436393231-5831-1-git-send-email-mjguzik@gmail.com> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 22:07:24 -0000 From: Mateusz Guzik With the file descriptor translated only once, the code no longer longer imposes the need to hold filedesc lock, previously needed to make sure namei and audit translation return the same vnode. --- sys/kern/vfs_lookup.c | 21 ++++++---- sys/security/audit/audit.h | 14 +++++++ sys/security/audit/audit_arg.c | 36 ++++++++++++++++ sys/security/audit/audit_bsm_klib.c | 82 ++++++++++++++++++++++++------------- sys/security/audit/audit_private.h | 2 + 5 files changed, 118 insertions(+), 37 deletions(-) diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index d48fcff..48d92e6 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -237,14 +237,6 @@ namei(struct nameidata *ndp) VREF(ndp->ni_rootdir); ndp->ni_topdir = fdp->fd_jdir; - /* - * If we are auditing the kernel pathname, save the user pathname. - */ - if (cnp->cn_flags & AUDITVNODE1) - AUDIT_ARG_UPATH1(td, ndp->ni_dirfd, cnp->cn_pnbuf); - if (cnp->cn_flags & AUDITVNODE2) - AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); - startdir_used = 0; dp = NULL; cnp->cn_nameptr = cnp->cn_pnbuf; @@ -268,8 +260,12 @@ namei(struct nameidata *ndp) if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_ATFD2(ndp->ni_dirfd); error = fgetvp_rights(td, ndp->ni_dirfd, - &rights, &ndp->ni_filecaps, &dp); + NULL, &ndp->ni_filecaps, &dp); #ifdef CAPABILITIES + if (error == 0) { + error = cap_check(&ndp->ni_filecaps.fc_rights, + &rights); + } /* * If file descriptor doesn't have all rights, * all lookups relative to it must also be @@ -287,6 +283,13 @@ namei(struct nameidata *ndp) if (error == 0 && dp->v_type != VDIR) error = ENOTDIR; } + /* + * If we are auditing the kernel pathname, save the user pathname. + */ + if (cnp->cn_flags & AUDITVNODE1) + AUDIT_ARG_UPATH1_VP(td, dp, cnp->cn_pnbuf); + if (cnp->cn_flags & AUDITVNODE2) + AUDIT_ARG_UPATH2_VP(td, dp, cnp->cn_pnbuf); FILEDESC_SUNLOCK(fdp); if (ndp->ni_startdir != NULL && !startdir_used) vrele(ndp->ni_startdir); diff --git a/sys/security/audit/audit.h b/sys/security/audit/audit.h index 559d571..2f6420f 100644 --- a/sys/security/audit/audit.h +++ b/sys/security/audit/audit.h @@ -101,6 +101,10 @@ void audit_arg_auditinfo(struct auditinfo *au_info); void audit_arg_auditinfo_addr(struct auditinfo_addr *au_info); void audit_arg_upath1(struct thread *td, int dirfd, char *upath); void audit_arg_upath2(struct thread *td, int dirfd, char *upath); +void audit_arg_upath1_vp(struct thread *td, struct vnode *dirvp, + char *upath); +void audit_arg_upath2_vp(struct thread *td, struct vnode *dirvp, + char *upath); void audit_arg_vnode1(struct vnode *vp); void audit_arg_vnode2(struct vnode *vp); void audit_arg_text(char *text); @@ -297,6 +301,16 @@ void audit_thread_free(struct thread *td); audit_arg_upath2((td), (dirfd), (upath)); \ } while (0) +#define AUDIT_ARG_UPATH1_VP(td, dirvp, upath) do { \ + if (AUDITING_TD(curthread)) \ + audit_arg_upath1_vp((td), (dirvp), (upath)); \ +} while (0) + +#define AUDIT_ARG_UPATH2_VP(td, dirvp, upath) do { \ + if (AUDITING_TD(curthread)) \ + audit_arg_upath2_vp((td), (dirvp), (upath)); \ +} while (0) + #define AUDIT_ARG_VALUE(value) do { \ if (AUDITING_TD(curthread)) \ audit_arg_value((value)); \ diff --git a/sys/security/audit/audit_arg.c b/sys/security/audit/audit_arg.c index c006b90..c019bad 100644 --- a/sys/security/audit/audit_arg.c +++ b/sys/security/audit/audit_arg.c @@ -719,6 +719,16 @@ audit_arg_upath(struct thread *td, int dirfd, char *upath, char **pathp) audit_canon_path(td, dirfd, upath, *pathp); } +static void +audit_arg_upath_vp(struct thread *td, struct vnode *dirvp, char *upath, + char **pathp) +{ + + if (*pathp == NULL) + *pathp = malloc(MAXPATHLEN, M_AUDITPATH, M_WAITOK); + audit_canon_path_vp(td, dirvp, upath, *pathp); +} + void audit_arg_upath1(struct thread *td, int dirfd, char *upath) { @@ -745,6 +755,32 @@ audit_arg_upath2(struct thread *td, int dirfd, char *upath) ARG_SET_VALID(ar, ARG_UPATH2); } +void +audit_arg_upath1_vp(struct thread *td, struct vnode *dirvp, char *upath) +{ + struct kaudit_record *ar; + + ar = currecord(); + if (ar == NULL) + return; + + audit_arg_upath_vp(td, dirvp, upath, &ar->k_ar.ar_arg_upath1); + ARG_SET_VALID(ar, ARG_UPATH1); +} + +void +audit_arg_upath2_vp(struct thread *td, struct vnode *dirvp, char *upath) +{ + struct kaudit_record *ar; + + ar = currecord(); + if (ar == NULL) + return; + + audit_arg_upath_vp(td, dirvp, upath, &ar->k_ar.ar_arg_upath2); + ARG_SET_VALID(ar, ARG_UPATH2); +} + /* * Function to save the path and vnode attr information into the audit * record. diff --git a/sys/security/audit/audit_bsm_klib.c b/sys/security/audit/audit_bsm_klib.c index b687a15..7e8dac5 100644 --- a/sys/security/audit/audit_bsm_klib.c +++ b/sys/security/audit/audit_bsm_klib.c @@ -461,23 +461,19 @@ auditon_command_event(int cmd) * but this results in a volfs name written to the audit log. So we will * leave the filename starting with '/' in the audit log in this case. */ -void -audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) +static void +audit_canon_path_common(struct thread *td, struct vnode *dirvp, + char *path, char *cpath) { struct vnode *cvnp, *rvnp; - char *rbuf, *fbuf, *copy; struct filedesc *fdp; + char *rbuf, *fbuf, *copy; struct sbuf sbf; - cap_rights_t rights; int error, needslash; - WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", - __func__, __FILE__, __LINE__); - copy = path; - rvnp = cvnp = NULL; + cvnp = rvnp = NULL; fdp = td->td_proc->p_fd; - FILEDESC_SLOCK(fdp); /* * Make sure that we handle the chroot(2) case. If there is an * alternate root directory, prepend it to the audited pathname. @@ -492,22 +488,7 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) * path. */ if (*path != '/') { - if (dirfd == AT_FDCWD) { - cvnp = fdp->fd_cdir; - vhold(cvnp); - } else { - /* XXX: fgetvp() that vhold()s vnode instead of vref()ing it would be better */ - error = fgetvp(td, dirfd, cap_rights_init(&rights), &cvnp); - if (error) { - FILEDESC_SUNLOCK(fdp); - cpath[0] = '\0'; - if (rvnp != NULL) - vdrop(rvnp); - return; - } - vhold(cvnp); - vrele(cvnp); - } + cvnp = dirvp; needslash = (fdp->fd_rdir != cvnp); } else { needslash = 1; @@ -536,8 +517,6 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) vdrop(rvnp); if (error) { cpath[0] = '\0'; - if (cvnp != NULL) - vdrop(cvnp); return; } (void) sbuf_cat(&sbf, rbuf); @@ -545,7 +524,6 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) } if (cvnp != NULL) { error = vn_fullpath(td, cvnp, &rbuf, &fbuf); - vdrop(cvnp); if (error) { cpath[0] = '\0'; return; @@ -571,3 +549,51 @@ audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) } sbuf_finish(&sbf); } + +void +audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath) +{ + struct vnode *dirvp; + struct filedesc *fdp; + cap_rights_t rights; + + WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", + __func__, __FILE__, __LINE__); + + dirvp = NULL; + fdp = td->td_proc->p_fd; + FILEDESC_SLOCK(fdp); + if (*path != '/') { + if (dirfd == AT_FDCWD) { + dirvp = fdp->fd_cdir; + vhold(dirvp); + } else { + /* XXX: fgetvp() that vhold()s vnode instead of vref()ing it would be better */ + if (fgetvp(td, dirfd, cap_rights_init(&rights), &dirvp) != 0) { + FILEDESC_SUNLOCK(fdp); + cpath[0] = '\0'; + return; + } + vhold(dirvp); + vrele(dirvp); + } + } + + audit_canon_path_common(td, dirvp, path, cpath); + if (dirvp != NULL) + vdrop(dirvp); +} + +void +audit_canon_path_vp(struct thread *td, struct vnode *dirvp, char *path, char *cpath) +{ + + WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, NULL, "%s: at %s:%d", + __func__, __FILE__, __LINE__); + + if (dirvp == NULL) + return; + + FILEDESC_SLOCK(td->td_proc->p_fd); + audit_canon_path_common(td, dirvp, path, cpath); +} diff --git a/sys/security/audit/audit_private.h b/sys/security/audit/audit_private.h index b5c373a..7ecf3a6 100644 --- a/sys/security/audit/audit_private.h +++ b/sys/security/audit/audit_private.h @@ -394,6 +394,8 @@ au_event_t audit_msgctl_to_event(int cmd); au_event_t audit_semctl_to_event(int cmr); void audit_canon_path(struct thread *td, int dirfd, char *path, char *cpath); +void audit_canon_path_vp(struct thread *td, struct vnode *dirvp, + char *path, char *cpath); au_event_t auditon_command_event(int cmd); /* -- 2.4.5 From owner-freebsd-fs@freebsd.org Wed Jul 8 23:27:30 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 286EC997361 for ; Wed, 8 Jul 2015 23:27:30 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id A9435162E; Wed, 8 Jul 2015 23:27:29 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AHBQBvsJ1V/61jaINRBAaDZmAGgxq4AYFmCoUtSgKCGBIBAQEBAQEBgQqEIwEBAQMBAQEBICsgCwULAgEIDgoCAg0ZAgInAQkmAgQIBwICARwEh3gDCggNt0SQTw2FUwEBAQcBAQEBAR2BIYoqgk2BVgYKAgEFCAEONAeCaIFDBYwjiACEZ4JcgWqEDEWDU4sFhCuDXQImggwcgW8iBC0Hf0GBBAEBAQ X-IronPort-AV: E=Sophos;i="5.15,435,1432612800"; d="scan'208";a="222589959" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 08 Jul 2015 19:27:22 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id F29E615F563; Wed, 8 Jul 2015 19:27:21 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id Qs794W8mAvht; Wed, 8 Jul 2015 19:27:20 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9963A15F564; Wed, 8 Jul 2015 19:27:20 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 7H8-h1AGb7SL; Wed, 8 Jul 2015 19:27:20 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 76EC415F563; Wed, 8 Jul 2015 19:27:20 -0400 (EDT) Date: Wed, 8 Jul 2015 19:27:20 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Message-ID: <1274495343.6405799.1436398040440.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.12] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: NVLQsgvxW+S3eoQd00KVKp2fBohkYw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 23:27:30 -0000 Ahmed Kamal wrote: > I have a test rhel6 box (one that can mount nfs with vers=4.1) .. However > this is an old server with no users on it .. Can you kindly show me how to > stress test this mount to either induce the bad sequence error, or prove > nfs-4.1 is rock solid ? > Don't ask me. You are the one that sees the problem, so all I can suggest is get this client to do the same stuff as your other clients that exhibit the problem. > If upgrading all boxes to rhel-6 and nfs-4.1 is the only way to solve this > .. then so be it .. I just want to be sure it's solid before the upgrade > As I recall, you've never tried Xin Li's patch. rick > Thanks folks! > > On Wed, Jul 8, 2015 at 4:20 PM, Ahmed Kamal > wrote: > > > Another note .. is that the linux boxes when they have hung processes .. > > They have a process (rpciod) taking 10-15% CPU > > > > On Wed, Jul 8, 2015 at 4:18 PM, Ahmed Kamal < > > email.ahmedkamal@googlemail.com> wrote: > > > >> Hi folks, > >> > >> I have tested Xin's patches .. Unfortunately the problem didn't go away > >> :/ Many users are still reporting hung processes. If it would help, can > >> you > >> show me how to dump a network trace that would help you identify the issue > >> ? > >> > >> Also, is it possible in any way to have my trusted nfs3, handle the case > >> where every zfs /home folder is its own dataset ? > >> > >> On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem > >> wrote: > >> > >>> Ahmed Kamal wrote: > >>> > Hi folks, > >>> > > >>> > Just a quick update. I did not test Xin's patches yet .. What I did so > >>> far > >>> > is to increase the tcp highwater tunable and increase nfsd threads to > >>> 60. > >>> > Today (a working day) I noticed I only got one bad sequence error > >>> message! > >>> > Check this: > >>> > > >>> > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c > >>> > 1 messages:Jul5 > >>> > 39 messages.1:Jun28 > >>> > 15 messages.1:Jun29 > >>> > 4 messages.1:Jun30 > >>> > 9 messages.1:Jul1 > >>> > 23 messages.1:Jul2 > >>> > 1 messages.1:Jul4 > >>> > 1 messages.2:Jun28 > >>> > > >>> > So there seems to be an improvement! Not sure if the Linux nfs4 client > >>> is > >>> > able to somehow recover from those bad-sequence situations or not .. I > >>> did > >>> > get some user complaints that running "ls -l" is sometimes slow and > >>> takes a > >>> > couple of seconds to finish. > >>> > > >>> > One final question .. Do you folks think nfs4.1 is more reliable in > >>> general > >>> > than nfs4 .. I've always only used nfs3 (I guess it can't work here > >>> with > >>> > /home/* being separate zfs filesystems) .. So should I go through the > >>> pain > >>> > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do > >>> you > >>> > expect the protocol to be more solid ? I know it's a fluffy question, > >>> just > >>> > give me your thoughts. Thanks a lot! > >>> > > >>> All I can say is that the "bad seqid" errors should not occur, since > >>> NFSv4.1 > >>> doesn't use the seqid#s to order RPCs. > >>> > >>> Also I would say that a correctly implemented NFSv4.1 protocol should > >>> function > >>> "more correctly" since all RPCs and performed "exactly once". (How much > >>> effect > >>> this will have in practice, I can't say.) > >>> > >>> On the other hand, NFSv4.1 is a newer protocol (with an RFC of over > >>> 500pages), > >>> so it is hard to say how mature the implementations are. > >>> I think only testing will give you the answer. > >>> > >>> I would suggest that you test Xi Lin's patch that allows the "seqid + 2" > >>> case > >>> and see if that makes the "bad seqid" errors go away. (Even though I > >>> think this > >>> would indicate a client bug, adding this in way that it can be enabled > >>> via a sysctl > >>> seems reasonable.) > >>> > >>> Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, > >>> rick > >>> > >>> > > >>> > > >>> > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem > >>> wrote: > >>> > > >>> > > Ahmed Kamal wrote: > >>> > > > PS: Today (after adjusting tcp.highwater) I didn't get any > >>> screaming > >>> > > > reports from users about hung vnc sessions. So maybe just maybe, > >>> linux > >>> > > > clients are able to somehow recover from this bad sequence > >>> messages. I > >>> > > > could still see the bad sequence error message in logs though > >>> > > > > >>> > > > Why isn't the highwater tunable set to something better by default > >>> ? I > >>> > > mean > >>> > > > this server is certainly not under a high or unusual load (it's > >>> only 40 > >>> > > PCs > >>> > > > mounting from it) > >>> > > > > >>> > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > >>> > > email.ahmedkamal@googlemail.com > >>> > > > > wrote: > >>> > > > > >>> > > > > Thanks all .. I understand now we're doing the "right thing" .. > >>> > > Although > >>> > > > > if mounting keeps wedging, I will have to solve it somehow! > >>> Either > >>> > > using > >>> > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > >>> > > > > > >>> > > > > Regarding Xin's patch, is it possible to build the patched nfsd > >>> code, > >>> > > as a > >>> > > > > kernel module ? I'm looking to minimize my delta to upstream. > >>> > > > > > >>> > > Yes, you can build the nfsd as a module. If your kernel config does > >>> not > >>> > > include > >>> > > "options NFSD" the module will get loaded/used. It is also possible > >>> to > >>> > > replace > >>> > > the module without rebooting, but you need to kill of the nfsd > >>> daemon then > >>> > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > >>> > > /boot/.) > >>> > > > >>> > > > > Also would adopting Xin's patch and hiding it behind a > >>> > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably > >>> not the > >>> > > last > >>> > > > > person on earth to hit this) ? > >>> > > > > > >>> > > If it fixes your problem, I think this is reasonable. > >>> > > I'm also hoping that someone that works on the Linux client reports > >>> > > if/when this > >>> > > was changed. > >>> > > > >>> > > rick > >>> > > > >>> > > > > Thanks a lot for all the help! > >>> > > > > > >>> > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > > > > wrote: > >>> > > > > > >>> > > > >> Ahmed Kamal wrote: > >>> > > > >> > Appreciating the fruitful discussion! Can someone please > >>> explain to > >>> > > me, > >>> > > > >> > what would happen in the current situation (linux client > >>> doing this > >>> > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the > >>> effect of > >>> > > that? > >>> > > > >> Well, as you've seen, the Linux client doesn't function > >>> correctly > >>> > > against > >>> > > > >> the FreeBSD server (and probably others that don't support this > >>> > > > >> "skip-by-1" > >>> > > > >> case). > >>> > > > >> > >>> > > > >> > What do users see? Any chances of data loss? > >>> > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > >>> the > >>> > > Linux > >>> > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're > >>> the guy > >>> > > > >> observing > >>> > > > >> it. > >>> > > > >> > >>> > > > >> > > >>> > > > >> > Also, I find it strange that netapp have acknowledged this is > >>> a bug > >>> > > on > >>> > > > >> > their side, which has been fixed since then! > >>> > > > >> Yea, I think Netapp screwed up. For some reason their server > >>> allowed > >>> > > this, > >>> > > > >> then was fixed to not allow it and then someone decided that was > >>> > > broken > >>> > > > >> and > >>> > > > >> reversed it. > >>> > > > >> > >>> > > > >> > I also find it strange that I'm the first to hit this :) Is > >>> no one > >>> > > > >> running > >>> > > > >> > nfs4 yet! > >>> > > > >> > > >>> > > > >> Well, it seems to be slowly catching on. I suspect that the > >>> Linux > >>> > > client > >>> > > > >> mounting a Netapp is the most common use of it. Since it > >>> appears that > >>> > > they > >>> > > > >> flip flopped w.r.t. who's bug this is, it has probably > >>> persisted. > >>> > > > >> > >>> > > > >> It may turn out that the Linux client has been fixed or it may > >>> turn > >>> > > out > >>> > > > >> that most servers allowed this "skip-by-1" even though David > >>> Noveck > >>> > > (one > >>> > > > >> of the main authors of the protocol) seems to agree with me > >>> that it > >>> > > should > >>> > > > >> not be allowed. > >>> > > > >> > >>> > > > >> It is possible that others have bumped into this, but it wasn't > >>> > > isolated > >>> > > > >> (I wouldn't have guessed it, so it was good you pointed to the > >>> RedHat > >>> > > > >> discussion) > >>> > > > >> and they worked around it by reverting to NFSv3 or similar. > >>> > > > >> The protocol is rather complex in this area and changed > >>> completely for > >>> > > > >> NFSv4.1, > >>> > > > >> so many have also probably moved onto NFSv4.1 where this won't > >>> be an > >>> > > > >> issue. > >>> > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > >>> > > doesn't > >>> > > > >> use > >>> > > > >> these seqid fields.) > >>> > > > >> > >>> > > > >> This is all just mho, rick > >>> > > > >> > >>> > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > >>> rmacklem@uoguelph.ca> > >>> > > > >> wrote: > >>> > > > >> > > >>> > > > >> > > Julian Elischer wrote: > >>> > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > >>> > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > >>> say. > >>> > > Please > >>> > > > >> > > > > let me know if Xin Li's patch resolves your problem, > >>> even > >>> > > though I > >>> > > > >> > > > > don't believe it is correct except for the UINT32_MAX > >>> case. > >>> > > Good > >>> > > > >> > > > > luck with it, rick > >>> > > > >> > > > and please keep us all in the loop as to what they say! > >>> > > > >> > > > > >>> > > > >> > > > the general N+2 bit sounds like bullshit to me.. its > >>> always N+1 > >>> > > in a > >>> > > > >> > > > number field that has a > >>> > > > >> > > > bit of slack at wrap time (probably due to some ambiguity > >>> in the > >>> > > > >> > > > original spec). > >>> > > > >> > > > > >>> > > > >> > > Actually, since N is the lock op already done, N + 1 is the > >>> next > >>> > > lock > >>> > > > >> > > operation in order. Since lock ops need to be strictly > >>> ordered, > >>> > > > >> allowing > >>> > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes > >>> no > >>> > > sense. > >>> > > > >> > > > >>> > > > >> > > I think the author of the RFC meant that N + 2 or greater > >>> fails, > >>> > > but > >>> > > > >> it > >>> > > > >> > > was poorly worded. > >>> > > > >> > > > >>> > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. > >>> (There is > >>> > > an > >>> > > > >> archive > >>> > > > >> > > of it somewhere, but I can't remember where.;-) > >>> > > > >> > > > >>> > > > >> > > rick > >>> > > > >> > > _______________________________________________ > >>> > > > >> > > freebsd-fs@freebsd.org mailing list > >>> > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > >>> > > > >> > > To unsubscribe, send any mail to " > >>> > > freebsd-fs-unsubscribe@freebsd.org" > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >> > >> > > > From owner-freebsd-fs@freebsd.org Wed Jul 8 23:30:51 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 690059974AB for ; Wed, 8 Jul 2015 23:30:51 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D7DCA1A23; Wed, 8 Jul 2015 23:30:50 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2AIBQAbsp1V/61jaINRBAaDZmAGgxq5ZwqFLUoCghkRAQEBAQEBAYEKhCMBAQEDAQEBASArIAsFCwIBCA4KAgINGQICJwEJJgIECAcCAgEcBId4AwoIDbc/kFANhVMBAQEHAQEBAQEdgSGKKoJNgVYGBQUCAQUIAQ40B4JogUMFjCOIAIRnglyBaoQMRYNTiwWEK4NdAiaCDByBbyIELQd+AR4jgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,435,1432612800"; d="scan'208";a="224453036" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 08 Jul 2015 19:30:50 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 3C65715F563; Wed, 8 Jul 2015 19:30:49 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id pH7bwdGUBR2D; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id C693B15F564; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id NroFybeF7073; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 9140415F563; Wed, 8 Jul 2015 19:30:47 -0400 (EDT) Date: Wed, 8 Jul 2015 19:30:47 -0400 (EDT) From: Rick Macklem To: Ahmed Kamal Cc: Julian Elischer , freebsd-fs@freebsd.org, Xin LI Message-ID: <502673468.6406432.1436398247559.JavaMail.zimbra@uoguelph.ca> In-Reply-To: References: <791936587.3443190.1435873993955.JavaMail.zimbra@uoguelph.ca> <2010996878.3611963.1435884702063.JavaMail.zimbra@uoguelph.ca> <1463698530.4486572.1436135333962.JavaMail.zimbra@uoguelph.ca> Subject: Re: Linux NFSv4 clients are getting (bad sequence-id error!) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.11] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Linux NFSv4 clients are getting (bad sequence-id error!) Thread-Index: 2IXTVT1xRu0B4urmn1qoL08Y1Am1BQ== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jul 2015 23:30:51 -0000 Ahmed Kamal wrotes: > Hi folks, > > I have tested Xin's patches .. Unfortunately the problem didn't go away :/ > Many users are still reporting hung processes. If it would help, can you > show me how to dump a network trace that would help you identify the issue ? > Oops, I didn't see this. Ignore my comment w.r.t. testing it in the other post. rick > Also, is it possible in any way to have my trusted nfs3, handle the case > where every zfs /home folder is its own dataset ? > These would all need to be separate mounts. If the # of mounts is very large, maybe using an automounter would be helpful? (As far as I know, there is no limit to the # of mounts, so I don't see why you can't just mount them all.) rick > On Mon, Jul 6, 2015 at 12:28 AM, Rick Macklem wrote: > > > Ahmed Kamal wrote: > > > Hi folks, > > > > > > Just a quick update. I did not test Xin's patches yet .. What I did so > > far > > > is to increase the tcp highwater tunable and increase nfsd threads to 60. > > > Today (a working day) I noticed I only got one bad sequence error > > message! > > > Check this: > > > > > > # grep 'bad sequence' messages* | awk '{print $1 $2}' | uniq -c > > > 1 messages:Jul5 > > > 39 messages.1:Jun28 > > > 15 messages.1:Jun29 > > > 4 messages.1:Jun30 > > > 9 messages.1:Jul1 > > > 23 messages.1:Jul2 > > > 1 messages.1:Jul4 > > > 1 messages.2:Jun28 > > > > > > So there seems to be an improvement! Not sure if the Linux nfs4 client is > > > able to somehow recover from those bad-sequence situations or not .. I > > did > > > get some user complaints that running "ls -l" is sometimes slow and > > takes a > > > couple of seconds to finish. > > > > > > One final question .. Do you folks think nfs4.1 is more reliable in > > general > > > than nfs4 .. I've always only used nfs3 (I guess it can't work here with > > > /home/* being separate zfs filesystems) .. So should I go through the > > pain > > > of upgrading a few servers to RHEL-6 to try out nfs4.1 ? Basically do you > > > expect the protocol to be more solid ? I know it's a fluffy question, > > just > > > give me your thoughts. Thanks a lot! > > > > > All I can say is that the "bad seqid" errors should not occur, since > > NFSv4.1 > > doesn't use the seqid#s to order RPCs. > > > > Also I would say that a correctly implemented NFSv4.1 protocol should > > function > > "more correctly" since all RPCs and performed "exactly once". (How much > > effect > > this will have in practice, I can't say.) > > > > On the other hand, NFSv4.1 is a newer protocol (with an RFC of over > > 500pages), > > so it is hard to say how mature the implementations are. > > I think only testing will give you the answer. > > > > I would suggest that you test Xi Lin's patch that allows the "seqid + 2" > > case > > and see if that makes the "bad seqid" errors go away. (Even though I think > > this > > would indicate a client bug, adding this in way that it can be enabled via > > a sysctl > > seems reasonable.) > > > > Btw, I haven't seen any additional posts from nfsv4@ietf.org on this, rick > > > > > > > > > > > On Fri, Jul 3, 2015 at 2:51 AM, Rick Macklem > > wrote: > > > > > > > Ahmed Kamal wrote: > > > > > PS: Today (after adjusting tcp.highwater) I didn't get any screaming > > > > > reports from users about hung vnc sessions. So maybe just maybe, > > linux > > > > > clients are able to somehow recover from this bad sequence messages. > > I > > > > > could still see the bad sequence error message in logs though > > > > > > > > > > Why isn't the highwater tunable set to something better by default ? > > I > > > > mean > > > > > this server is certainly not under a high or unusual load (it's only > > 40 > > > > PCs > > > > > mounting from it) > > > > > > > > > > On Fri, Jul 3, 2015 at 1:15 AM, Ahmed Kamal < > > > > email.ahmedkamal@googlemail.com > > > > > > wrote: > > > > > > > > > > > Thanks all .. I understand now we're doing the "right thing" .. > > > > Although > > > > > > if mounting keeps wedging, I will have to solve it somehow! Either > > > > using > > > > > > Xin's patch .. or Upgrading RHEL to 6.x and using NFS4.1. > > > > > > > > > > > > Regarding Xin's patch, is it possible to build the patched nfsd > > code, > > > > as a > > > > > > kernel module ? I'm looking to minimize my delta to upstream. > > > > > > > > > > Yes, you can build the nfsd as a module. If your kernel config does not > > > > include > > > > "options NFSD" the module will get loaded/used. It is also possible to > > > > replace > > > > the module without rebooting, but you need to kill of the nfsd daemon > > then > > > > kldunload nfsd.ko and replace nfsd.ko with the new one. (In > > > > /boot/.) > > > > > > > > > > Also would adopting Xin's patch and hiding it behind a > > > > > > kern.nfs.allow_linux_broken_client be an option (I'm probably not > > the > > > > last > > > > > > person on earth to hit this) ? > > > > > > > > > > If it fixes your problem, I think this is reasonable. > > > > I'm also hoping that someone that works on the Linux client reports > > > > if/when this > > > > was changed. > > > > > > > > rick > > > > > > > > > > Thanks a lot for all the help! > > > > > > > > > > > > On Thu, Jul 2, 2015 at 11:53 PM, Rick Macklem < > > rmacklem@uoguelph.ca> > > > > > > wrote: > > > > > > > > > > > >> Ahmed Kamal wrote: > > > > > >> > Appreciating the fruitful discussion! Can someone please > > explain to > > > > me, > > > > > >> > what would happen in the current situation (linux client doing > > this > > > > > >> > skip-by-1 thing, and freebsd not doing it) ? What is the effect > > of > > > > that? > > > > > >> Well, as you've seen, the Linux client doesn't function correctly > > > > against > > > > > >> the FreeBSD server (and probably others that don't support this > > > > > >> "skip-by-1" > > > > > >> case). > > > > > >> > > > > > >> > What do users see? Any chances of data loss? > > > > > >> Hmm. Mostly it will cause Opens to fail, but I can't guess what > > the > > > > Linux > > > > > >> client behaviour is after receiving NFS4ERR_BAD_SEQID. You're the > > guy > > > > > >> observing > > > > > >> it. > > > > > >> > > > > > >> > > > > > > >> > Also, I find it strange that netapp have acknowledged this is a > > bug > > > > on > > > > > >> > their side, which has been fixed since then! > > > > > >> Yea, I think Netapp screwed up. For some reason their server > > allowed > > > > this, > > > > > >> then was fixed to not allow it and then someone decided that was > > > > broken > > > > > >> and > > > > > >> reversed it. > > > > > >> > > > > > >> > I also find it strange that I'm the first to hit this :) Is no > > one > > > > > >> running > > > > > >> > nfs4 yet! > > > > > >> > > > > > > >> Well, it seems to be slowly catching on. I suspect that the Linux > > > > client > > > > > >> mounting a Netapp is the most common use of it. Since it appears > > that > > > > they > > > > > >> flip flopped w.r.t. who's bug this is, it has probably persisted. > > > > > >> > > > > > >> It may turn out that the Linux client has been fixed or it may > > turn > > > > out > > > > > >> that most servers allowed this "skip-by-1" even though David > > Noveck > > > > (one > > > > > >> of the main authors of the protocol) seems to agree with me that > > it > > > > should > > > > > >> not be allowed. > > > > > >> > > > > > >> It is possible that others have bumped into this, but it wasn't > > > > isolated > > > > > >> (I wouldn't have guessed it, so it was good you pointed to the > > RedHat > > > > > >> discussion) > > > > > >> and they worked around it by reverting to NFSv3 or similar. > > > > > >> The protocol is rather complex in this area and changed > > completely for > > > > > >> NFSv4.1, > > > > > >> so many have also probably moved onto NFSv4.1 where this won't be > > an > > > > > >> issue. > > > > > >> (NFSv4.1 uses sessions to provide exactly once RPC semantics and > > > > doesn't > > > > > >> use > > > > > >> these seqid fields.) > > > > > >> > > > > > >> This is all just mho, rick > > > > > >> > > > > > >> > On Thu, Jul 2, 2015 at 1:59 PM, Rick Macklem < > > rmacklem@uoguelph.ca> > > > > > >> wrote: > > > > > >> > > > > > > >> > > Julian Elischer wrote: > > > > > >> > > > On 7/2/15 9:09 AM, Rick Macklem wrote: > > > > > >> > > > > I am going to post to nfsv4@ietf.org to see what they > > say. > > > > Please > > > > > >> > > > > let me know if Xin Li's patch resolves your problem, even > > > > though I > > > > > >> > > > > don't believe it is correct except for the UINT32_MAX > > case. > > > > Good > > > > > >> > > > > luck with it, rick > > > > > >> > > > and please keep us all in the loop as to what they say! > > > > > >> > > > > > > > > >> > > > the general N+2 bit sounds like bullshit to me.. its always > > N+1 > > > > in a > > > > > >> > > > number field that has a > > > > > >> > > > bit of slack at wrap time (probably due to some ambiguity > > in the > > > > > >> > > > original spec). > > > > > >> > > > > > > > > >> > > Actually, since N is the lock op already done, N + 1 is the > > next > > > > lock > > > > > >> > > operation in order. Since lock ops need to be strictly > > ordered, > > > > > >> allowing > > > > > >> > > N + 2 (which means N + 2 would be done before N + 1) makes no > > > > sense. > > > > > >> > > > > > > > >> > > I think the author of the RFC meant that N + 2 or greater > > fails, > > > > but > > > > > >> it > > > > > >> > > was poorly worded. > > > > > >> > > > > > > > >> > > I will pass along whatever I get from nfsv4@ietf.org. (There > > is > > > > an > > > > > >> archive > > > > > >> > > of it somewhere, but I can't remember where.;-) > > > > > >> > > > > > > > >> > > rick > > > > > >> > > _______________________________________________ > > > > > >> > > freebsd-fs@freebsd.org mailing list > > > > > >> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > > > > > >> > > To unsubscribe, send any mail to " > > > > freebsd-fs-unsubscribe@freebsd.org" > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > From owner-freebsd-fs@freebsd.org Thu Jul 9 01:14:06 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 17C6299647C for ; Thu, 9 Jul 2015 01:14:06 +0000 (UTC) (envelope-from michelle@sorbs.net) Received: from hades.sorbs.net (hades.sorbs.net [67.231.146.201]) by mx1.freebsd.org (Postfix) with ESMTP id 094F615D2 for ; Thu, 9 Jul 2015 01:14:05 +0000 (UTC) (envelope-from michelle@sorbs.net) MIME-version: 1.0 Content-transfer-encoding: 7BIT Content-type: text/plain; CHARSET=US-ASCII Received: from isux.com (firewall.isux.com [213.165.190.213]) by hades.sorbs.net (Oracle Communications Messaging Server 7.0.5.29.0 64bit (built Jul 9 2013)) with ESMTPSA id <0NR700B4X2908X00@hades.sorbs.net> for freebsd-fs@freebsd.org; Wed, 08 Jul 2015 17:19:50 -0700 (PDT) Message-id: <559DBCC5.3000601@sorbs.net> Date: Thu, 09 Jul 2015 02:13:57 +0200 From: Michelle Sullivan User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.24) Gecko/20100301 SeaMonkey/1.1.19 To: "freebsd-fs@freebsd.org" Subject: Thanks... X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 01:14:06 -0000 Well I'm thinking someone patched something in ZFS since my last 'trouble'... because replacing a dead drive just worked this time... (currently using 9.2-GENERIC-p15)... was so smooth I thought it had failed at first.. last time I ended up with a crashed pool that took 3 weeks to recover...! So whomever you are, thank you...!!! Posting the log just so people searching the web can find tips on how it should work.... Replaced a dead drive (slot 2) with a new one... LSI-9260-16i controller + zfs. root@colossus:~ # ./lsi.sh drives Slot Number: 0 - Online, Spun Up Slot Number: 1 - Online, Spun Up Slot Number: 2 - Unconfigured(good), Spun Up Slot Number: 3 - Online, Spun Up Slot Number: 4 - Online, Spun Up Slot Number: 5 - Online, Spun Up Slot Number: 6 - Online, Spun Up Slot Number: 7 - Online, Spun Up Slot Number: 8 - Online, Spun Up Slot Number: 9 - Online, Spun Up Slot Number: 10 - Online, Spun Up Slot Number: 11 - Online, Spun Up Slot Number: 12 - Online, Spun Up Slot Number: 13 - Online, Spun Up Slot Number: 14 - Online, Spun Up Slot Number: 15 - Online, Spun Up root@colossus:~ # praid MegaCli Tools Used to Gather Raid Info ==== Controllers ======================================================================================================================= C# Name CacheSize FirmwareVer BIOSver BBU 0 lsi megaraid sas 9260-16i 512mb 2.130.403-3066 3.30.02.0 4.16.08.00 0x06060900 Missing ---- BBUs ------------------------------------------------------------------------------------------------------------------------------ C# Battery Type Initialized Voltage Temperature Charge State Alerts 0 - ---- Virtual Disks --------------------------------------------------------------------------------------------------------------------- C# Logical Volumes Physical Disks Degraded Offline Critical Failed 0 15 16 0 0 0 0 ---- Virtual Disks Info ---------------------------------------------------------------------------------------------------------------- C# ID Name State Size Raid Level 0 L0 mfid0 optimal 2.728 tb raid-0 0 L1 optimal 2.728 tb raid-0 0 L10 optimal 2.728 tb raid-0 0 L11 optimal 2.728 tb raid-0 0 L12 optimal 2.728 tb raid-0 0 L13 optimal 2.728 tb raid-0 0 L14 optimal 2.728 tb raid-0 0 L15 optimal 2.728 tb raid-0 0 L3 optimal 2.728 tb raid-0 0 L4 optimal 2.728 tb raid-0 0 L5 optimal 2.728 tb raid-0 0 L6 optimal 2.728 tb raid-0 0 L7 optimal 2.728 tb raid-0 0 L8 optimal 2.728 tb raid-0 0 L9 optimal 2.728 tb raid-0 ---- Controller 0 Physical Drives ------------------------------------------------------------------------------------------------------ Enclosure: 245 (a0 => unavailable) C# Virtual Member Slot Model Size Serial Media Err Other Err Status F-State SAS Address 0 0 - 2.728tb - 0 0 online,spun_up none 0x500062b200320010 0 1 - 2.728tb - 0 0 online,spun_up none 0x500062b200320011 0 10 - 2.728tb - 0 0 online,spun_up none 0x500062b200320022 0 11 - 2.728tb - 0 0 online,spun_up none 0x500062b200320023 0 12 - 2.728tb - 0 0 online,spun_up none 0x500062b200320018 0 13 - 2.728tb - 0 0 online,spun_up none 0x500062b200320019 0 14 - 2.728tb - 1 0 online,spun_up none 0x500062b20032001a 0 15 - 2.728tb - 0 0 online,spun_up none 0x500062b20032001b 0 2 - 2.728tb - 0 0 unconfigured(good),spun_up none 0x500062b200320012 0 3 - 2.728tb - 0 0 online,spun_up none 0x500062b200320013 0 4 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000c 0 5 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000d 0 6 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000e 0 7 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000f 0 8 - 2.728tb - 0 0 online,spun_up none 0x500062b200320020 0 9 - 2.728tb - 0 0 online,spun_up none 0x500062b200320021 root@colossus:~ # megacli -PDMakeGood -PhysDrv\[245:2\] -a0 Adapter: 0: Failed to change PD state at EnclId-245 SlotId-2. Exit Code: 0x01 root@colossus:~ # megacli -CfgForeign -Clear -aALL -NoLog There is no foreign configuration on controller 0. Exit Code: 0x00 root@colossus:~ # megacli -cfgldadd -r0\[245:2\] WB RA Cached CachedBadBBU -strpsz512 -a0 Adapter 0: Created VD 2 Adapter 0: Configured the Adapter!! Exit Code: 0x00 root@colossus:~ # praid MegaCli Tools Used to Gather Raid Info ==== Controllers ======================================================================================================================= C# Name CacheSize FirmwareVer BIOSver BBU 0 lsi megaraid sas 9260-16i 512mb 2.130.403-3066 3.30.02.0 4.16.08.00 0x06060900 Missing ---- BBUs ------------------------------------------------------------------------------------------------------------------------------ C# Battery Type Initialized Voltage Temperature Charge State Alerts 0 - ---- Virtual Disks --------------------------------------------------------------------------------------------------------------------- C# Logical Volumes Physical Disks Degraded Offline Critical Failed 0 16 16 0 0 0 0 ---- Virtual Disks Info ---------------------------------------------------------------------------------------------------------------- C# ID Name State Size Raid Level 0 L0 mfid0 optimal 2.728 tb raid-0 0 L1 optimal 2.728 tb raid-0 0 L10 optimal 2.728 tb raid-0 0 L11 optimal 2.728 tb raid-0 0 L12 optimal 2.728 tb raid-0 0 L13 optimal 2.728 tb raid-0 0 L14 optimal 2.728 tb raid-0 0 L15 optimal 2.728 tb raid-0 0 L2 optimal 2.728 tb raid-0 0 L3 optimal 2.728 tb raid-0 0 L4 optimal 2.728 tb raid-0 0 L5 optimal 2.728 tb raid-0 0 L6 optimal 2.728 tb raid-0 0 L7 optimal 2.728 tb raid-0 0 L8 optimal 2.728 tb raid-0 0 L9 optimal 2.728 tb raid-0 ---- Controller 0 Physical Drives ------------------------------------------------------------------------------------------------------ Enclosure: 245 (a0 => unavailable) C# Virtual Member Slot Model Size Serial Media Err Other Err Status F-State SAS Address 0 0 - 2.728tb - 0 0 online,spun_up none 0x500062b200320010 0 1 - 2.728tb - 0 0 online,spun_up none 0x500062b200320011 0 10 - 2.728tb - 0 0 online,spun_up none 0x500062b200320022 0 11 - 2.728tb - 0 0 online,spun_up none 0x500062b200320023 0 12 - 2.728tb - 0 0 online,spun_up none 0x500062b200320018 0 13 - 2.728tb - 0 0 online,spun_up none 0x500062b200320019 0 14 - 2.728tb - 1 0 online,spun_up none 0x500062b20032001a 0 15 - 2.728tb - 0 0 online,spun_up none 0x500062b20032001b 0 2 - 2.728tb - 0 0 online,spun_up none 0x500062b200320012 0 3 - 2.728tb - 0 0 online,spun_up none 0x500062b200320013 0 4 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000c 0 5 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000d 0 6 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000e 0 7 - 2.728tb - 0 0 online,spun_up none 0x500062b20032000f 0 8 - 2.728tb - 0 0 online,spun_up none 0x500062b200320020 0 9 - 2.728tb - 0 0 online,spun_up none 0x500062b200320021 root@colossus:~ # zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT storage 40.8T 27.0T 13.8T 66% 1.00x DEGRADED - root@colossus:~ # zpool status -x pool: storage state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 0 in 83h12m with 0 errors on Wed Jul 8 11:12:07 2015 config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid14 ONLINE 0 0 0 mfid12 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 15820272272734706674 REMOVED 0 0 0 was /dev/mfid0 mfid15 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 mfid2 ONLINE 0 0 1 mfid3 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid11 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 mfid8 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 spares 14948854088277424304 INUSE was /dev/mfid15 errors: No known data errors root@colossus:~ # ls -l /dev/mfid* mfid0% mfid1% mfid10% mfid11% mfid12% mfid13% mfid14% mfid15% mfid2% mfid3% mfid4% mfid5% mfid6% mfid7% mfid8% mfid9% root@colossus:~ # zpool replace missing pool name argument usage: replace [-f] [new-device] root@colossus:~ # zpool replace storage 15820272272734706674 mfid0 root@colossus:~ # zpool status -x pool: storage state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Thu Jul 9 02:00:48 2015 15.9M scanned out of 27.0T at 1.99M/s, (scan is slow, no estimated time) 980K resilvered, 0.00% done config: NAME STATE READ WRITE CKSUM storage DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 mfid14 ONLINE 0 0 0 mfid12 ONLINE 0 0 0 spare-2 DEGRADED 0 0 0 replacing-0 REMOVED 0 0 0 15820272272734706674 REMOVED 0 0 0 was /dev/mfid0/old mfid0 ONLINE 0 0 0 (resilvering) mfid15 ONLINE 0 0 0 mfid1 ONLINE 0 0 0 mfid2 ONLINE 0 0 1 mfid3 ONLINE 0 0 0 mfid4 ONLINE 0 0 0 mfid11 ONLINE 0 0 0 mfid5 ONLINE 0 0 0 mfid13 ONLINE 0 0 0 mfid6 ONLINE 0 0 0 mfid7 ONLINE 0 0 0 mfid8 ONLINE 0 0 0 mfid9 ONLINE 0 0 0 mfid10 ONLINE 0 0 0 spares 14948854088277424304 INUSE was /dev/mfid15 errors: No known data errors root@colossus:~ # uname -a FreeBSD colossus 9.2-RELEASE-p15 FreeBSD 9.2-RELEASE-p15 #0: Mon Nov 3 20:31:29 UTC 2014 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 root@colossus:~ # -- Michelle Sullivan http://www.mhix.org/ From owner-freebsd-fs@freebsd.org Thu Jul 9 10:16:14 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id A8D33996EEF for ; Thu, 9 Jul 2015 10:16:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 370F21F94; Thu, 9 Jul 2015 10:16:14 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t69AG4nZ011653 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 9 Jul 2015 13:16:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t69AG4nZ011653 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t69AG47N011617; Thu, 9 Jul 2015 13:16:04 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 9 Jul 2015 13:16:04 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: Re: [PATCH 1/4] vfs: plug a use-after-free of fd_rdir in namei Message-ID: <20150709101604.GM2080@kib.kiev.ua> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-2-git-send-email-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1436393231-5831-2-git-send-email-mjguzik@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 10:16:14 -0000 On Thu, Jul 09, 2015 at 12:07:08AM +0200, Mateusz Guzik wrote: > From: Mateusz Guzik > > fd_rdir vnode was stored in ni_rootdir without refing it in any way, > after which the filedsc lock was being dropped. > > The vnode could have been freed by mountcheckdirs or another thread doing > chroot. > > VREF the vnode while the lock is held. Patch looks fine. Would it make sense to extend namei_cleanup to also handle deref ? > > MFC after: 1 week > --- > sys/kern/vfs_lookup.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > index 5dc07dc..20f8e96 100644 > --- a/sys/kern/vfs_lookup.c > +++ b/sys/kern/vfs_lookup.c > @@ -210,6 +210,7 @@ namei(struct nameidata *ndp) > */ > FILEDESC_SLOCK(fdp); > ndp->ni_rootdir = fdp->fd_rdir; > + VREF(ndp->ni_rootdir); > ndp->ni_topdir = fdp->fd_jdir; > > /* > @@ -260,6 +261,7 @@ namei(struct nameidata *ndp) > } > } > if (error) { > + vrele(ndp->ni_rootdir); > namei_cleanup_cnp(cnp); > return (error); > } > @@ -286,6 +288,7 @@ namei(struct nameidata *ndp) > if (KTRPOINT(curthread, KTR_CAPFAIL)) > ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); > #endif > + vrele(ndp->ni_rootdir); > namei_cleanup_cnp(cnp); > return (ENOTCAPABLE); > } > @@ -299,6 +302,7 @@ namei(struct nameidata *ndp) > ndp->ni_startdir = dp; > error = lookup(ndp); > if (error) { > + vrele(ndp->ni_rootdir); > namei_cleanup_cnp(cnp); > SDT_PROBE(vfs, namei, lookup, return, error, NULL, 0, > 0, 0); > @@ -308,6 +312,7 @@ namei(struct nameidata *ndp) > * If not a symbolic link, we're done. > */ > if ((cnp->cn_flags & ISSYMLINK) == 0) { > + vrele(ndp->ni_rootdir); > if ((cnp->cn_flags & (SAVENAME | SAVESTART)) == 0) { > namei_cleanup_cnp(cnp); > } else > @@ -371,6 +376,7 @@ namei(struct nameidata *ndp) > vput(ndp->ni_vp); > dp = ndp->ni_dvp; > } > + vrele(ndp->ni_rootdir); > namei_cleanup_cnp(cnp); > vput(ndp->ni_vp); > ndp->ni_vp = NULL; > -- > 2.4.5 From owner-freebsd-fs@freebsd.org Thu Jul 9 10:17:48 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 79AE2996F24 for ; Thu, 9 Jul 2015 10:17:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0465F102B; Thu, 9 Jul 2015 10:17:47 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t69AHhDh011726 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 9 Jul 2015 13:17:43 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t69AHhDh011726 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t69AHgOv011725; Thu, 9 Jul 2015 13:17:43 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 9 Jul 2015 13:17:42 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: Re: [PATCH 2/4] vfs: avoid spurious vref/vrele for absolute lookups Message-ID: <20150709101742.GN2080@kib.kiev.ua> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-3-git-send-email-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1436393231-5831-3-git-send-email-mjguzik@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 10:17:48 -0000 On Thu, Jul 09, 2015 at 12:07:09AM +0200, Mateusz Guzik wrote: > From: Mateusz Guzik > > namei used to vref fd_cdir, which was immediatley vrele'd on entry to > the loop. > > Check for absolute lookup and vref the right vnode the first time. > --- > sys/kern/vfs_lookup.c | 70 +++++++++++++++++++++++++++++++++------------------ > 1 file changed, 46 insertions(+), 24 deletions(-) > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > index 20f8e96..e434464 100644 > --- a/sys/kern/vfs_lookup.c > +++ b/sys/kern/vfs_lookup.c > @@ -109,6 +109,27 @@ namei_cleanup_cnp(struct componentname *cnp) > #endif > } > > +static int > +namei_handle_root(struct nameidata *ndp, struct vnode **dpp) > +{ > + struct componentname *cnp = &ndp->ni_cnd; Do not put initialization into declaration. Otherwise, the patch looks fine. > + > + if (ndp->ni_strictrelative != 0) { > +#ifdef KTRACE > + if (KTRPOINT(curthread, KTR_CAPFAIL)) > + ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); > +#endif > + return (ENOTCAPABLE); > + } > + while (*(cnp->cn_nameptr) == '/') { > + cnp->cn_nameptr++; > + ndp->ni_pathlen--; > + } > + *dpp = ndp->ni_rootdir; > + VREF(*dpp); > + return (0); > +} > + > /* > * Convert a pathname into a pointer to a locked vnode. > * > @@ -222,7 +243,18 @@ namei(struct nameidata *ndp) > AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); > > dp = NULL; > - if (cnp->cn_pnbuf[0] != '/') { > + cnp->cn_nameptr = cnp->cn_pnbuf; > + if (cnp->cn_pnbuf[0] == '/') { > + error = namei_handle_root(ndp, &dp); > + FILEDESC_SUNLOCK(fdp); > + if (error != 0) { > + vrele(ndp->ni_rootdir); > + if (ndp->ni_startdir != NULL) > + vrele(ndp->ni_startdir); > + namei_cleanup_cnp(cnp); > + return (error); > + } > + } else { > if (ndp->ni_startdir != NULL) { > dp = ndp->ni_startdir; > error = 0; > @@ -276,29 +308,6 @@ namei(struct nameidata *ndp) > SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, > cnp->cn_flags, 0, 0); > for (;;) { > - /* > - * Check if root directory should replace current directory. > - * Done at start of translation and after symbolic link. > - */ > - cnp->cn_nameptr = cnp->cn_pnbuf; > - if (*(cnp->cn_nameptr) == '/') { > - vrele(dp); > - if (ndp->ni_strictrelative != 0) { > -#ifdef KTRACE > - if (KTRPOINT(curthread, KTR_CAPFAIL)) > - ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); > -#endif > - vrele(ndp->ni_rootdir); > - namei_cleanup_cnp(cnp); > - return (ENOTCAPABLE); > - } > - while (*(cnp->cn_nameptr) == '/') { > - cnp->cn_nameptr++; > - ndp->ni_pathlen--; > - } > - dp = ndp->ni_rootdir; > - VREF(dp); > - } > ndp->ni_startdir = dp; > error = lookup(ndp); > if (error) { > @@ -375,6 +384,19 @@ namei(struct nameidata *ndp) > ndp->ni_pathlen += linklen; > vput(ndp->ni_vp); > dp = ndp->ni_dvp; > + /* > + * Check if root directory should replace current directory. > + */ > + cnp->cn_nameptr = cnp->cn_pnbuf; > + if (*(cnp->cn_nameptr) == '/') { > + vrele(dp); > + error = namei_handle_root(ndp, &dp); > + if (error != 0) { > + vrele(ndp->ni_rootdir); > + namei_cleanup_cnp(cnp); > + return (error); > + } > + } > } > vrele(ndp->ni_rootdir); > namei_cleanup_cnp(cnp); > -- > 2.4.5 From owner-freebsd-fs@freebsd.org Thu Jul 9 10:19:44 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 0E0C2996F97 for ; Thu, 9 Jul 2015 10:19:44 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x234.google.com (mail-wg0-x234.google.com [IPv6:2a00:1450:400c:c00::234]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 9976D10BE; Thu, 9 Jul 2015 10:19:43 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgxm20 with SMTP id m20so35928510wgx.3; Thu, 09 Jul 2015 03:19:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=QT94yjnO6DKnTitoibu6VQxJINAI5BNmY0FM03HcQK8=; b=WCejUtSsExkfmisD+zS2QOm7ZxhCSg1481kZDofVxU/cZ9jZw6QxlYfV2K7WKt5gwB WgbNayQP4CLPh/L1Q+kAMuDAO1/PJm1uAoe2ao/nCPsfHCtaunGCWguFnmvck4SUEujk 2MjXkvGHuUNbPKdIDAka1zRcZYX0yNs+IkCFobinnpXOM+LXvg3yvtA4elK+RYkemGnl /1DvrshyLbUULvTBA2/F2I7kWz1CNGFJw8BmDPe3lm4HAwNICCvxGKS1kkZ8MuK1VqTQ DB0MEpvOwmomBQeqM5lHun7qchHVsliE/qAkXYqTQfitNbonL5mdICdMu+DYqme5dGV9 s7zw== X-Received: by 10.194.58.69 with SMTP id o5mr29414898wjq.22.1436437182041; Thu, 09 Jul 2015 03:19:42 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id ee1sm7489358wic.8.2015.07.09.03.19.40 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 09 Jul 2015 03:19:40 -0700 (PDT) Date: Thu, 9 Jul 2015 12:19:38 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: Re: [PATCH 1/4] vfs: plug a use-after-free of fd_rdir in namei Message-ID: <20150709101937.GB1718@dft-labs.eu> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-2-git-send-email-mjguzik@gmail.com> <20150709101604.GM2080@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150709101604.GM2080@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 10:19:44 -0000 On Thu, Jul 09, 2015 at 01:16:04PM +0300, Konstantin Belousov wrote: > On Thu, Jul 09, 2015 at 12:07:08AM +0200, Mateusz Guzik wrote: > > From: Mateusz Guzik > > > > fd_rdir vnode was stored in ni_rootdir without refing it in any way, > > after which the filedsc lock was being dropped. > > > > The vnode could have been freed by mountcheckdirs or another thread doing > > chroot. > > > > VREF the vnode while the lock is held. > Patch looks fine. > > Would it make sense to extend namei_cleanup to also handle deref ? > namei_cleanup_cnp is possibly called once prior to obtaining the reference, also within the lookup loop there is one call which may or may not call it prior to exiting. > > > > MFC after: 1 week > > --- > > sys/kern/vfs_lookup.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > > index 5dc07dc..20f8e96 100644 > > --- a/sys/kern/vfs_lookup.c > > +++ b/sys/kern/vfs_lookup.c > > @@ -210,6 +210,7 @@ namei(struct nameidata *ndp) > > */ > > FILEDESC_SLOCK(fdp); > > ndp->ni_rootdir = fdp->fd_rdir; > > + VREF(ndp->ni_rootdir); > > ndp->ni_topdir = fdp->fd_jdir; > > > > /* > > @@ -260,6 +261,7 @@ namei(struct nameidata *ndp) > > } > > } > > if (error) { > > + vrele(ndp->ni_rootdir); > > namei_cleanup_cnp(cnp); > > return (error); > > } > > @@ -286,6 +288,7 @@ namei(struct nameidata *ndp) > > if (KTRPOINT(curthread, KTR_CAPFAIL)) > > ktrcapfail(CAPFAIL_LOOKUP, NULL, NULL); > > #endif > > + vrele(ndp->ni_rootdir); > > namei_cleanup_cnp(cnp); > > return (ENOTCAPABLE); > > } > > @@ -299,6 +302,7 @@ namei(struct nameidata *ndp) > > ndp->ni_startdir = dp; > > error = lookup(ndp); > > if (error) { > > + vrele(ndp->ni_rootdir); > > namei_cleanup_cnp(cnp); > > SDT_PROBE(vfs, namei, lookup, return, error, NULL, 0, > > 0, 0); > > @@ -308,6 +312,7 @@ namei(struct nameidata *ndp) > > * If not a symbolic link, we're done. > > */ > > if ((cnp->cn_flags & ISSYMLINK) == 0) { > > + vrele(ndp->ni_rootdir); > > if ((cnp->cn_flags & (SAVENAME | SAVESTART)) == 0) { > > namei_cleanup_cnp(cnp); > > } else > > @@ -371,6 +376,7 @@ namei(struct nameidata *ndp) > > vput(ndp->ni_vp); > > dp = ndp->ni_dvp; > > } > > + vrele(ndp->ni_rootdir); > > namei_cleanup_cnp(cnp); > > vput(ndp->ni_vp); > > ndp->ni_vp = NULL; > > -- > > 2.4.5 -- Mateusz Guzik From owner-freebsd-fs@freebsd.org Thu Jul 9 10:25:39 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6787C9970F6 for ; Thu, 9 Jul 2015 10:25:39 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E767214D0; Thu, 9 Jul 2015 10:25:38 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t69APX3J013244 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 9 Jul 2015 13:25:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t69APX3J013244 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t69APXxD013243; Thu, 9 Jul 2015 13:25:33 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 9 Jul 2015 13:25:33 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org, Mateusz Guzik Subject: Re: [PATCH 3/4] vfs: simplify error handling in namei Message-ID: <20150709102533.GO2080@kib.kiev.ua> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-4-git-send-email-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1436393231-5831-4-git-send-email-mjguzik@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 10:25:39 -0000 On Thu, Jul 09, 2015 at 12:07:10AM +0200, Mateusz Guzik wrote: > From: Mateusz Guzik > > The logic is reorganised so that there is one exit point prior to the > lookup loop. This is an intermediate step to making audit logging > functions use found vnode instead of translating ni_dirfd on their own. > > ni_startdir validation is removed. The only in-tree consumer is nfs > which already makes sure it is a directory. > --- > sys/kern/vfs_lookup.c | 50 +++++++++++++++++++++----------------------------- > 1 file changed, 21 insertions(+), 29 deletions(-) > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > index e434464..d48fcff 100644 > --- a/sys/kern/vfs_lookup.c > +++ b/sys/kern/vfs_lookup.c > @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) > struct vnode *dp; /* the directory we are searching */ > struct iovec aiov; /* uio for reading symbolic links */ > struct uio auio; > - int error, linklen; > + int error, linklen, startdir_used; > struct componentname *cnp = &ndp->ni_cnd; > struct thread *td = cnp->cn_thread; > struct proc *p = td->td_proc; > @@ -169,6 +169,9 @@ namei(struct nameidata *ndp) > ("namei: nameiop contaminated with flags")); > KASSERT((cnp->cn_flags & OPMASK) == 0, > ("namei: flags contaminated with nameiops")); > + if (ndp->ni_startdir != NULL) > + MPASS(ndp->ni_startdir->v_type == VDIR || > + ndp->ni_startdir->v_type == VBAD); Write this as MPASS(ndp->ni_startdir == NULL || ... == VDIR || ... == VBAD); ? I think that the two previous patches are self-contained ? Please commit them, after that I think review of this patch can be finished. > if (!lookup_shared) > cnp->cn_flags &= ~LOCKSHARED; > fdp = p->p_fd; > @@ -242,23 +245,19 @@ namei(struct nameidata *ndp) > if (cnp->cn_flags & AUDITVNODE2) > AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); > > + startdir_used = 0; > dp = NULL; > cnp->cn_nameptr = cnp->cn_pnbuf; > if (cnp->cn_pnbuf[0] == '/') { > error = namei_handle_root(ndp, &dp); > - FILEDESC_SUNLOCK(fdp); > - if (error != 0) { > - vrele(ndp->ni_rootdir); > - if (ndp->ni_startdir != NULL) > - vrele(ndp->ni_startdir); > - namei_cleanup_cnp(cnp); > - return (error); > - } > } else { > if (ndp->ni_startdir != NULL) { > dp = ndp->ni_startdir; > - error = 0; > - } else if (ndp->ni_dirfd != AT_FDCWD) { > + startdir_used = 1; > + } else if (ndp->ni_dirfd == AT_FDCWD) { > + dp = fdp->fd_cdir; > + VREF(dp); > + } else { > cap_rights_t rights; > > rights = ndp->ni_rightsneeded; > @@ -285,25 +284,18 @@ namei(struct nameidata *ndp) > } > #endif > } > - if (error != 0 || dp != NULL) { > - FILEDESC_SUNLOCK(fdp); > - if (error == 0 && dp->v_type != VDIR) { > - vrele(dp); > - error = ENOTDIR; > - } > - } > - if (error) { > - vrele(ndp->ni_rootdir); > - namei_cleanup_cnp(cnp); > - return (error); > - } > + if (error == 0 && dp->v_type != VDIR) > + error = ENOTDIR; > } > - if (dp == NULL) { > - dp = fdp->fd_cdir; > - VREF(dp); > - FILEDESC_SUNLOCK(fdp); > - if (ndp->ni_startdir != NULL) > - vrele(ndp->ni_startdir); > + FILEDESC_SUNLOCK(fdp); > + if (ndp->ni_startdir != NULL && !startdir_used) > + vrele(ndp->ni_startdir); > + if (error != 0) { > + if (dp != NULL) > + vrele(dp); > + vrele(ndp->ni_rootdir); > + namei_cleanup_cnp(cnp); > + return (error); > } > SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, > cnp->cn_flags, 0, 0); > -- > 2.4.5 From owner-freebsd-fs@freebsd.org Thu Jul 9 15:40:27 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94099997D19 for ; Thu, 9 Jul 2015 15:40:27 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x230.google.com (mail-wi0-x230.google.com [IPv6:2a00:1450:400c:c05::230]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 3692A1AA9; Thu, 9 Jul 2015 15:40:27 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wiga1 with SMTP id a1so316554312wig.0; Thu, 09 Jul 2015 08:40:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=7i9jVucnMTv1nOzhtOFZE8szPhOAg+ow1nDpYr4HAqk=; b=BGqG368jjgzmnaoCBLzaMGGHfg8lc3zaHBBrlpkgqPdsD7rSkLRJwLtGS6dKnrkSUO Kx9rf8LriHnbVmCwrIUuKdqD4GCSoZllG3lqh0kxGzBwo3WTsWvIY47Z0syiZaW38ypE JxzuwY5yWMMC4I32m8cxUuY96UU/INho93ZjNLPAt1zVkc9toRFFqQyc/B68/jEyp6EV vCVrAus45o9VzcEnDhE+94/E+0gZrzIowDzeuUMZDNlJD78l0b+ZZhOFYE5JZyzlvGLU n4bHbVkQT146CRcu0LOpUXjeSQUsbzyi5o1POJdmpm6CaMgQPzUSpkWs52BkTdYRW200 xGxA== X-Received: by 10.194.60.226 with SMTP id k2mr30120485wjr.10.1436456425632; Thu, 09 Jul 2015 08:40:25 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id ej5sm9409291wjd.22.2015.07.09.08.40.23 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 09 Jul 2015 08:40:24 -0700 (PDT) Date: Thu, 9 Jul 2015 17:40:21 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org Subject: Re: [PATCH 3/4] vfs: simplify error handling in namei Message-ID: <20150709154021.GC1718@dft-labs.eu> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-4-git-send-email-mjguzik@gmail.com> <20150709102533.GO2080@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150709102533.GO2080@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 15:40:27 -0000 On Thu, Jul 09, 2015 at 01:25:33PM +0300, Konstantin Belousov wrote: > On Thu, Jul 09, 2015 at 12:07:10AM +0200, Mateusz Guzik wrote: > > From: Mateusz Guzik > > > > The logic is reorganised so that there is one exit point prior to the > > lookup loop. This is an intermediate step to making audit logging > > functions use found vnode instead of translating ni_dirfd on their own. > > > > ni_startdir validation is removed. The only in-tree consumer is nfs > > which already makes sure it is a directory. > > --- > > sys/kern/vfs_lookup.c | 50 +++++++++++++++++++++----------------------------- > > 1 file changed, 21 insertions(+), 29 deletions(-) > > > > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > > index e434464..d48fcff 100644 > > --- a/sys/kern/vfs_lookup.c > > +++ b/sys/kern/vfs_lookup.c > > @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) > > struct vnode *dp; /* the directory we are searching */ > > struct iovec aiov; /* uio for reading symbolic links */ > > struct uio auio; > > - int error, linklen; > > + int error, linklen, startdir_used; > > struct componentname *cnp = &ndp->ni_cnd; > > struct thread *td = cnp->cn_thread; > > struct proc *p = td->td_proc; > > @@ -169,6 +169,9 @@ namei(struct nameidata *ndp) > > ("namei: nameiop contaminated with flags")); > > KASSERT((cnp->cn_flags & OPMASK) == 0, > > ("namei: flags contaminated with nameiops")); > > + if (ndp->ni_startdir != NULL) > > + MPASS(ndp->ni_startdir->v_type == VDIR || > > + ndp->ni_startdir->v_type == VBAD); > Write this as > MPASS(ndp->ni_startdir == NULL || ... == VDIR || ... == VBAD); > ? > Done. > I think that the two previous patches are self-contained ? > Please commit them, after that I think review of this patch can > be finished. > Committed. diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c index e434464..a93314e 100644 --- a/sys/kern/vfs_lookup.c +++ b/sys/kern/vfs_lookup.c @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) struct vnode *dp; /* the directory we are searching */ struct iovec aiov; /* uio for reading symbolic links */ struct uio auio; - int error, linklen; + int error, linklen, startdir_used; struct componentname *cnp = &ndp->ni_cnd; struct thread *td = cnp->cn_thread; struct proc *p = td->td_proc; @@ -169,6 +169,8 @@ namei(struct nameidata *ndp) ("namei: nameiop contaminated with flags")); KASSERT((cnp->cn_flags & OPMASK) == 0, ("namei: flags contaminated with nameiops")); + MPASS(ndp->ni_startdir == NULL || ndp->ni_startdir->v_type == VDIR || + ndp->ni_startdir->v_type == VBAD); if (!lookup_shared) cnp->cn_flags &= ~LOCKSHARED; fdp = p->p_fd; @@ -242,23 +244,19 @@ namei(struct nameidata *ndp) if (cnp->cn_flags & AUDITVNODE2) AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); + startdir_used = 0; dp = NULL; cnp->cn_nameptr = cnp->cn_pnbuf; if (cnp->cn_pnbuf[0] == '/') { error = namei_handle_root(ndp, &dp); - FILEDESC_SUNLOCK(fdp); - if (error != 0) { - vrele(ndp->ni_rootdir); - if (ndp->ni_startdir != NULL) - vrele(ndp->ni_startdir); - namei_cleanup_cnp(cnp); - return (error); - } } else { if (ndp->ni_startdir != NULL) { dp = ndp->ni_startdir; - error = 0; - } else if (ndp->ni_dirfd != AT_FDCWD) { + startdir_used = 1; + } else if (ndp->ni_dirfd == AT_FDCWD) { + dp = fdp->fd_cdir; + VREF(dp); + } else { cap_rights_t rights; rights = ndp->ni_rightsneeded; @@ -285,25 +283,18 @@ namei(struct nameidata *ndp) } #endif } - if (error != 0 || dp != NULL) { - FILEDESC_SUNLOCK(fdp); - if (error == 0 && dp->v_type != VDIR) { - vrele(dp); - error = ENOTDIR; - } - } - if (error) { - vrele(ndp->ni_rootdir); - namei_cleanup_cnp(cnp); - return (error); - } + if (error == 0 && dp->v_type != VDIR) + error = ENOTDIR; } - if (dp == NULL) { - dp = fdp->fd_cdir; - VREF(dp); - FILEDESC_SUNLOCK(fdp); - if (ndp->ni_startdir != NULL) - vrele(ndp->ni_startdir); + FILEDESC_SUNLOCK(fdp); + if (ndp->ni_startdir != NULL && !startdir_used) + vrele(ndp->ni_startdir); + if (error != 0) { + if (dp != NULL) + vrele(dp); + vrele(ndp->ni_rootdir); + namei_cleanup_cnp(cnp); + return (error); } SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, cnp->cn_flags, 0, 0); -- Mateusz Guzik From owner-freebsd-fs@freebsd.org Thu Jul 9 15:54:55 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 84BE49962D6 for ; Thu, 9 Jul 2015 15:54:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 11A23158A; Thu, 9 Jul 2015 15:54:54 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t69FrjCY017843 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 9 Jul 2015 18:53:45 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t69FrjCY017843 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t69FrjnS017842; Thu, 9 Jul 2015 18:53:45 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Thu, 9 Jul 2015 18:53:45 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org Subject: Re: [PATCH 3/4] vfs: simplify error handling in namei Message-ID: <20150709155345.GQ2080@kib.kiev.ua> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-4-git-send-email-mjguzik@gmail.com> <20150709102533.GO2080@kib.kiev.ua> <20150709154021.GC1718@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150709154021.GC1718@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 15:54:55 -0000 On Thu, Jul 09, 2015 at 05:40:21PM +0200, Mateusz Guzik wrote: > On Thu, Jul 09, 2015 at 01:25:33PM +0300, Konstantin Belousov wrote: > > On Thu, Jul 09, 2015 at 12:07:10AM +0200, Mateusz Guzik wrote: > > > From: Mateusz Guzik > > > > > > The logic is reorganised so that there is one exit point prior to the > > > lookup loop. This is an intermediate step to making audit logging > > > functions use found vnode instead of translating ni_dirfd on their own. > > > > > > ni_startdir validation is removed. The only in-tree consumer is nfs > > > which already makes sure it is a directory. Looks fine. > diff --git a/sys/kern/vfs_lookup.c b/sys/kern/vfs_lookup.c > index e434464..a93314e 100644 > --- a/sys/kern/vfs_lookup.c > +++ b/sys/kern/vfs_lookup.c > @@ -158,7 +158,7 @@ namei(struct nameidata *ndp) > struct vnode *dp; /* the directory we are searching */ > struct iovec aiov; /* uio for reading symbolic links */ > struct uio auio; > - int error, linklen; > + int error, linklen, startdir_used; > struct componentname *cnp = &ndp->ni_cnd; > struct thread *td = cnp->cn_thread; > struct proc *p = td->td_proc; > @@ -169,6 +169,8 @@ namei(struct nameidata *ndp) > ("namei: nameiop contaminated with flags")); > KASSERT((cnp->cn_flags & OPMASK) == 0, > ("namei: flags contaminated with nameiops")); > + MPASS(ndp->ni_startdir == NULL || ndp->ni_startdir->v_type == VDIR || > + ndp->ni_startdir->v_type == VBAD); > if (!lookup_shared) > cnp->cn_flags &= ~LOCKSHARED; > fdp = p->p_fd; > @@ -242,23 +244,19 @@ namei(struct nameidata *ndp) > if (cnp->cn_flags & AUDITVNODE2) > AUDIT_ARG_UPATH2(td, ndp->ni_dirfd, cnp->cn_pnbuf); > > + startdir_used = 0; > dp = NULL; > cnp->cn_nameptr = cnp->cn_pnbuf; > if (cnp->cn_pnbuf[0] == '/') { > error = namei_handle_root(ndp, &dp); > - FILEDESC_SUNLOCK(fdp); > - if (error != 0) { > - vrele(ndp->ni_rootdir); > - if (ndp->ni_startdir != NULL) > - vrele(ndp->ni_startdir); > - namei_cleanup_cnp(cnp); > - return (error); > - } > } else { > if (ndp->ni_startdir != NULL) { > dp = ndp->ni_startdir; > - error = 0; > - } else if (ndp->ni_dirfd != AT_FDCWD) { > + startdir_used = 1; > + } else if (ndp->ni_dirfd == AT_FDCWD) { > + dp = fdp->fd_cdir; > + VREF(dp); > + } else { > cap_rights_t rights; > > rights = ndp->ni_rightsneeded; > @@ -285,25 +283,18 @@ namei(struct nameidata *ndp) > } > #endif > } > - if (error != 0 || dp != NULL) { > - FILEDESC_SUNLOCK(fdp); > - if (error == 0 && dp->v_type != VDIR) { > - vrele(dp); > - error = ENOTDIR; > - } > - } > - if (error) { > - vrele(ndp->ni_rootdir); > - namei_cleanup_cnp(cnp); > - return (error); > - } > + if (error == 0 && dp->v_type != VDIR) > + error = ENOTDIR; > } > - if (dp == NULL) { > - dp = fdp->fd_cdir; > - VREF(dp); > - FILEDESC_SUNLOCK(fdp); > - if (ndp->ni_startdir != NULL) > - vrele(ndp->ni_startdir); > + FILEDESC_SUNLOCK(fdp); > + if (ndp->ni_startdir != NULL && !startdir_used) > + vrele(ndp->ni_startdir); > + if (error != 0) { > + if (dp != NULL) > + vrele(dp); > + vrele(ndp->ni_rootdir); > + namei_cleanup_cnp(cnp); > + return (error); > } > SDT_PROBE(vfs, namei, lookup, entry, dp, cnp->cn_pnbuf, > cnp->cn_flags, 0, 0); From owner-freebsd-fs@freebsd.org Thu Jul 9 16:34:13 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2A61A996EF4 for ; Thu, 9 Jul 2015 16:34:13 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x22f.google.com (mail-wi0-x22f.google.com [IPv6:2a00:1450:400c:c05::22f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id B5EBA1B2A; Thu, 9 Jul 2015 16:34:12 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by widjy10 with SMTP id jy10so249852583wid.1; Thu, 09 Jul 2015 09:34:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=YtK2sJIfQ+Sm9hCzYlWt8znOTG9GsZCsjxGJRzQ2Ifo=; b=SOXOX1UCzwVedhwsvikg5uNyfsO3fFJm+jLhwlgaB/FQ3IrHVS2IKLf3K1fDojXhF7 /wm1bjNaIKmMG8wqs6wEYz25osyv6T/ZyifbpruP7w+d2s6E1KrZ+XCPTWodfYMtJiPU T1FIm25QbOWEQWcQDux24WxF9qT89b3qxKIzgXPIdCE3LZQtdJ+NN/LBej9NkG5/pqfx 1PBd/eO7Nd2QxDvXP/hMz73Bu9uQ2WSpT4xFc4Fbu6LInEzqeMrPblGAxQRowQzfDwEs 1rfTtuWNxlXUytXH3834qJ77UMFn1Iqatxuvt1fV3o+O3AEGp/bEK4mhkGiTX7KxhjPw HbMA== X-Received: by 10.180.81.38 with SMTP id w6mr91841210wix.14.1436459651220; Thu, 09 Jul 2015 09:34:11 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id um5sm9636001wjc.1.2015.07.09.09.34.09 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Thu, 09 Jul 2015 09:34:10 -0700 (PDT) Date: Thu, 9 Jul 2015 18:34:07 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: rwatson@FreeBSD.org, freebsd-fs@freebsd.org Subject: Re: [PATCH 3/4] vfs: simplify error handling in namei Message-ID: <20150709163407.GD1718@dft-labs.eu> References: <20150707085857.GZ2080@kib.kiev.ua> <1436393231-5831-1-git-send-email-mjguzik@gmail.com> <1436393231-5831-4-git-send-email-mjguzik@gmail.com> <20150709102533.GO2080@kib.kiev.ua> <20150709154021.GC1718@dft-labs.eu> <20150709155345.GQ2080@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150709155345.GQ2080@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 16:34:13 -0000 On Thu, Jul 09, 2015 at 06:53:45PM +0300, Konstantin Belousov wrote: > On Thu, Jul 09, 2015 at 05:40:21PM +0200, Mateusz Guzik wrote: > > On Thu, Jul 09, 2015 at 01:25:33PM +0300, Konstantin Belousov wrote: > > > On Thu, Jul 09, 2015 at 12:07:10AM +0200, Mateusz Guzik wrote: > > > > From: Mateusz Guzik > > > > > > > > The logic is reorganised so that there is one exit point prior to the > > > > lookup loop. This is an intermediate step to making audit logging > > > > functions use found vnode instead of translating ni_dirfd on their own. > > > > > > > > ni_startdir validation is removed. The only in-tree consumer is nfs > > > > which already makes sure it is a directory. > > Looks fine. > Thanks, went in as https://svnweb.freebsd.org/base?view=revision&revision=285326 -- Mateusz Guzik From owner-freebsd-fs@freebsd.org Thu Jul 9 18:32:17 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4DE7999730B for ; Thu, 9 Jul 2015 18:32:17 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (khavrinen.csail.mit.edu [IPv6:2001:470:8b2d:1e1c:21b:21ff:feb8:d7b0]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "khavrinen.csail.mit.edu", Issuer "Client CA" (not verified)) by mx1.freebsd.org (Postfix) with ESMTPS id E40221E7E; Thu, 9 Jul 2015 18:32:16 +0000 (UTC) (envelope-from wollman@khavrinen.csail.mit.edu) Received: from khavrinen.csail.mit.edu (localhost [127.0.0.1]) by khavrinen.csail.mit.edu (8.14.9/8.14.9) with ESMTP id t69IWEkZ007902 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL CN=khavrinen.csail.mit.edu issuer=Client+20CA); Thu, 9 Jul 2015 14:32:14 -0400 (EDT) (envelope-from wollman@khavrinen.csail.mit.edu) Received: (from wollman@localhost) by khavrinen.csail.mit.edu (8.14.9/8.14.9/Submit) id t69IWEIX007899; Thu, 9 Jul 2015 14:32:14 -0400 (EDT) (envelope-from wollman) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <21918.48686.157217.979707@khavrinen.csail.mit.edu> Date: Thu, 9 Jul 2015 14:32:14 -0400 From: Garrett Wollman To: freebsd-fs@freebsd.org Cc: rmacklem@freebsd.org Subject: How does NFS respond when a VFS operation gives ERESTART? X-Mailer: VM 7.17 under 21.4 (patch 22) "Instant Classic" XEmacs Lucid X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.4.3 (khavrinen.csail.mit.edu [127.0.0.1]); Thu, 09 Jul 2015 14:32:14 -0400 (EDT) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 18:32:17 -0000 When networked filesystems are not involved, the special error code [ERESTART] can be returned by the implementation of any system call, with the effect of causing the system call to be restarted when execution hits the kernel-user boundary, rather than returning to userland. This is used to allow certain system calls to be restarted after being interrupted by a signal. However, this normally only applies to system calls which might potentially sleep for a long time -- such as write() to a socket or a tty -- and not to disk I/O, which is normally uninterruptible. In investigating an issue reported by our users, it appears to me from an inspection of the code that ZFS can sometimes give an [ERESTART] condition, specifically when writing to a dataset that has reached its quota, AND there are pending block free operations that would reduce usage below the quota. But I don't see any code in the NFS (or kernel RPC) implementation that would actually handle this case, and of course the NFS server doesn't normally hit the user-kernel boundary at all. So does anyone have a theory about what actually happens in this case, and what *should* happen? It doesn't seem useful to just spin on the one operation over and over again until the blocks are freed (which I think might take a full ZFS transaction sync interval). The actual symptom which I'm investigating is that sometimes -- despite my fixes to the throttling code -- the server is still getting throttled, with thousands of requests enqueued for the same file. (The FHA code does a nice job of directing them all to the appropriate set of service threads, but that doesn't help the other clients get anything done because of the global throttle.) These seem not to make any progress for a long time, but the condition ultimately clears by itself -- what I'm trying to figure out is why so many requests get queued and don't make progress, and so far this seems to be related to hitting the quota on the filesystem. So [ERESTART] may be a total red herring, but it was something that stuck out at me when I was reviewing the code paths that could set [EDQUOT]. -GAWollman From owner-freebsd-fs@freebsd.org Thu Jul 9 20:12:14 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C7F319978AD for ; Thu, 9 Jul 2015 20:12:14 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-jnhn.mail.uoguelph.ca (esa-jnhn.mail.uoguelph.ca [131.104.91.44]) by mx1.freebsd.org (Postfix) with ESMTP id 6283018C8; Thu, 9 Jul 2015 20:12:13 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DyBAAf1Z5V/61jaINbDoNYYAaDGrgQgWcKhS1KAoIaEwEBAQEBAQGBCoQjAQEBAwEBAQEgKyALBQsCAQgYAgINGQICJwEJJgIECAcEARwEiAUIDbkBljcBAQEHAQEBAR6BIYoqhDQBAQIDFzQHgi07EoExBZQthGeESIRTlmMCJmOCWloiMQd+CBcjgQQBAQE X-IronPort-AV: E=Sophos;i="5.15,441,1432612800"; d="scan'208";a="222736514" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-jnhn.mail.uoguelph.ca with ESMTP; 09 Jul 2015 16:12:12 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 0E3DD15F564; Thu, 9 Jul 2015 16:12:12 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id 4ywuhBGDCSW7; Thu, 9 Jul 2015 16:12:11 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 62A1515F565; Thu, 9 Jul 2015 16:12:11 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id ji70xgUtr_h7; Thu, 9 Jul 2015 16:12:11 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 46BD915F564; Thu, 9 Jul 2015 16:12:11 -0400 (EDT) Date: Thu, 9 Jul 2015 16:12:11 -0400 (EDT) From: Rick Macklem To: Garrett Wollman Cc: freebsd-fs@freebsd.org, rmacklem@freebsd.org Message-ID: <689709398.6876771.1436472731160.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <21918.48686.157217.979707@khavrinen.csail.mit.edu> References: <21918.48686.157217.979707@khavrinen.csail.mit.edu> Subject: Re: How does NFS respond when a VFS operation gives ERESTART? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: How does NFS respond when a VFS operation gives ERESTART? Thread-Index: 11bCQDl03U1gWGMn/4kqmO60HV0mvw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 09 Jul 2015 20:12:15 -0000 Garrett Wollman wrote: > When networked filesystems are not involved, the special error code > [ERESTART] can be returned by the implementation of any system call, > with the effect of causing the system call to be restarted when > execution hits the kernel-user boundary, rather than returning to > userland. This is used to allow certain system calls to be restarted > after being interrupted by a signal. However, this normally only > applies to system calls which might potentially sleep for a long time > -- such as write() to a socket or a tty -- and not to disk I/O, which > is normally uninterruptible. > > In investigating an issue reported by our users, it appears to me from > an inspection of the code that ZFS can sometimes give an [ERESTART] > condition, specifically when writing to a dataset that has reached its > quota, AND there are pending block free operations that would reduce > usage below the quota. But I don't see any code in the NFS (or kernel > RPC) implementation that would actually handle this case, and of > course the NFS server doesn't normally hit the user-kernel boundary at > all. So does anyone have a theory about what actually happens in this > case, and what *should* happen? It doesn't seem useful to just spin > on the one operation over and over again until the blocks are freed > (which I think might take a full ZFS transaction sync interval). > Well, I'll admit I'm not sure I really understand the situation, but... My best guess would be have the NFS server reply NFSERR_DELAY to the client. (NFSERR_DELAY doesn't exist for NFSv2, but I suspect you don't care about NFSv2?) NFSERR_DELAY - Tells the client to wait a while (the RFCs don't define how long) and then try the RPC again. Does this sound like it would work? If it sounds reasonable, I think patching the server to do this shouldn't be too hard. rick > The actual symptom which I'm investigating is that sometimes -- > despite my fixes to the throttling code -- the server is still getting > throttled, with thousands of requests enqueued for the same file. > (The FHA code does a nice job of directing them all to the appropriate > set of service threads, but that doesn't help the other clients get > anything done because of the global throttle.) These seem not to make > any progress for a long time, but the condition ultimately clears by > itself -- what I'm trying to figure out is why so many requests get > queued and don't make progress, and so far this seems to be related to > hitting the quota on the filesystem. So [ERESTART] may be a total red > herring, but it was something that stuck out at me when I was > reviewing the code paths that could set [EDQUOT]. > > -GAWollman > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" > From owner-freebsd-fs@freebsd.org Fri Jul 10 16:35:57 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 4EA8A997B41 for ; Fri, 10 Jul 2015 16:35:57 +0000 (UTC) (envelope-from arcade@b1t.name) Received: from limbo.b1t.name (limbo.b1t.name [213.111.203.167]) by mx1.freebsd.org (Postfix) with ESMTP id EBEDC16EF for ; Fri, 10 Jul 2015 16:35:56 +0000 (UTC) (envelope-from arcade@b1t.name) Received: from limbo.b1t.name (limbo.b1t.name [213.111.203.167]) by limbo.b1t.name (Postfix) with ESMTPSA id A812A6C for ; Fri, 10 Jul 2015 19:35:44 +0300 (EEST) Message-ID: <559FF45E.1090100@b1t.name> Date: Fri, 10 Jul 2015 19:35:42 +0300 From: Volodymyr Kostyrko User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: freebsd-fs@freebsd.org Subject: Re: Crashed ZFS pool References: <559CEDC3.2040107@harmless.hu> In-Reply-To: <559CEDC3.2040107@harmless.hu> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2015 16:35:57 -0000 On 08.07.2015 12:30, Gergely Czuczy wrote: > So, anyone has any idea what to do with it? It would be nice to get it > back to a functional state. Or at least to a state where the data can be > accessed. Have you tried to import pool with: 1. Read-only mode. ( -o readonly=on ) 2. Unmounted. ( -N ) 3. Checking whether some previous transactions are good. ( -Fn and then -F). There are also extra options made for debugging: 1. -X - xtreem rewind, tries transactions scrubbing each next one. 2. -T txg - specify a starting txg for import. Might be particulary useful if some previous transactions are not only damaged beyond repair but also make system dump core. Once in my life I had a problem with a pool that I ought to use -T - the controller was buggy and last few transactions were damaged badly. -- Sphinx of black quartz judge my vow. From owner-freebsd-fs@freebsd.org Fri Jul 10 23:08:11 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 33ADD997FED for ; Fri, 10 Jul 2015 23:08:11 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x232.google.com (mail-wg0-x232.google.com [IPv6:2a00:1450:400c:c00::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id C54213AB; Fri, 10 Jul 2015 23:08:10 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgjx7 with SMTP id x7so259564097wgj.2; Fri, 10 Jul 2015 16:08:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=sdeUlmmrlzDm05mO+XvLPvykROOxIsMwYKHQ/h1oJTs=; b=oRI8MlHqeEVgeSBWN/frEJ5xMc5H76icjY/i7CSpeavf0fCxwy1o+w+I8ycaWhvNeN aF9pnTmXTPq583fOlApl8mZASl10i2lQ+RfWaha7kd3CJUiaZzd6Rus2h5IW0Mgh9UkG /RS6Uen60ZYmaDxYYnvAnpx4nFCT72InTHypDNVJ1GEoQYqQVxQFN0X0OCQTAAv7MBWl 4BUlSi2kmxXOkRJtFXgTH8TDEYRVpqkJg9aXzk72F9o55u5b20eBFA+NLMJB3qQz/drG wXeaerPYKt68bV/j1x4AQ3MTv5iT5ACCZqrE+Gy1ETwmin7OOkrByUTpBqMv8qyICp5J XqXw== X-Received: by 10.180.82.230 with SMTP id l6mr1890768wiy.61.1436569688214; Fri, 10 Jul 2015 16:08:08 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id se11sm1052887wic.2.2015.07.10.16.08.06 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Jul 2015 16:08:07 -0700 (PDT) From: Mateusz Guzik To: kib@freebsd.org Cc: freebsd-fs@freebsd.org Subject: [PATCH 0/2] start consolidatin code manipulating fd_*dir vnodes Date: Sat, 11 Jul 2015 01:08:02 +0200 Message-Id: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2015 23:08:11 -0000 From: Mateusz Guzik Primary purpose is to ease future work implementing a copy-on-write struct for fd_*dir vnodes. IMHO even if that idea turns out to be bad/rejected after all, these patches provide a cleanup which should be done regardless, especially the second one. It can be modified later to warn, and eventually panic, as the condition it tests for should not realistically be present. Mateusz Guzik (2): Move chdir/chroot-related fdp manipulation to kern_descrip.c Create a dedicated function for ensuring that cdir and rdir are populated. sys/cam/ctl/ctl_backend_block.c | 13 +-- .../compat/opensolaris/kern/opensolaris_kobj.c | 13 +-- sys/cddl/compat/opensolaris/sys/vnode.h | 13 +-- sys/compat/ndis/subr_ndis.c | 5 +- sys/compat/svr4/svr4_misc.c | 2 +- sys/dev/xen/blkback/blkback.c | 13 +-- sys/kern/kern_descrip.c | 107 +++++++++++++++++++++ sys/kern/kern_jail.c | 2 +- sys/kern/subr_firmware.c | 13 +-- sys/kern/vfs_syscalls.c | 96 +----------------- sys/sys/filedesc.h | 5 + sys/sys/vnode.h | 1 - sys/ufs/ffs/ffs_alloc.c | 10 +- 13 files changed, 126 insertions(+), 167 deletions(-) -- 2.4.5 From owner-freebsd-fs@freebsd.org Fri Jul 10 23:08:13 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D9CA6997FFC for ; Fri, 10 Jul 2015 23:08:13 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wg0-x22d.google.com (mail-wg0-x22d.google.com [IPv6:2a00:1450:400c:c00::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 694913B2; Fri, 10 Jul 2015 23:08:13 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wgjx7 with SMTP id x7so259564716wgj.2; Fri, 10 Jul 2015 16:08:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RnQEiCYWsG+p9y1/4ctvVeMO6BKZBdzk70mdZazcHx4=; b=XLLe2XT6raWxkXdPc0xttdmdheUOyphbGKYfkJHAi63P9VqMPdmisyTLthBt3+hLVM QJtXaqlcxt5KeYd6tXhTgAHqrUqg8MVNDcnejmy8Uaz95nmt+wAA5xCgTfzXpvBH+P53 Ijb25ZGJwG0/rZixhgVxB+s2QrB2eJ5eP0jfzHpDSpsjM0xB4A0idb1xjqaABUJTRvD5 j1Z7VwBD8yl+cjUyZIOpdFAZJGZLns7Itiqj2Mjb9SI8gTdIGV1KPK4VQMXHMmXhHnXw iSbFlfCMb3C4aELXr8csZQ3+euUgMbWOWMKXngnrbb6WaY9/jfnDe1UaDOLcGc1bQDIN PCJA== X-Received: by 10.180.99.196 with SMTP id es4mr1961820wib.57.1436569691820; Fri, 10 Jul 2015 16:08:11 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id se11sm1052887wic.2.2015.07.10.16.08.10 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Jul 2015 16:08:10 -0700 (PDT) From: Mateusz Guzik To: kib@freebsd.org Cc: freebsd-fs@freebsd.org Subject: [PATCH 2/2] Create a dedicated function for ensuring that cdir and rdir are populated. Date: Sat, 11 Jul 2015 01:08:04 +0200 Message-Id: <1436569684-3939-3-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2015 23:08:13 -0000 From: Mateusz Guzik Previously several places were doing it on its own, partially incorrectly (e.g. without the filedesc locked) or even actively harmful by assigning rootvnode without vreling it or populating jdir. This functionality should not exist and will be garbage collected after all callers are properly reviewed. --- sys/cam/ctl/ctl_backend_block.c | 13 +------------ sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c | 13 +------------ sys/cddl/compat/opensolaris/sys/vnode.h | 13 +------------ sys/compat/ndis/subr_ndis.c | 5 +---- sys/dev/xen/blkback/blkback.c | 13 +------------ sys/kern/kern_descrip.c | 19 +++++++++++++++++++ sys/kern/subr_firmware.c | 13 +------------ sys/sys/filedesc.h | 1 + 8 files changed, 26 insertions(+), 64 deletions(-) diff --git a/sys/cam/ctl/ctl_backend_block.c b/sys/cam/ctl/ctl_backend_block.c index c56023b..8ea52aa 100644 --- a/sys/cam/ctl/ctl_backend_block.c +++ b/sys/cam/ctl/ctl_backend_block.c @@ -2123,18 +2123,7 @@ ctl_be_block_open(struct ctl_be_block_softc *softc, return (1); } - if (!curthread->td_proc->p_fd->fd_cdir) { - curthread->td_proc->p_fd->fd_cdir = rootvnode; - VREF(rootvnode); - } - if (!curthread->td_proc->p_fd->fd_rdir) { - curthread->td_proc->p_fd->fd_rdir = rootvnode; - VREF(rootvnode); - } - if (!curthread->td_proc->p_fd->fd_jdir) { - curthread->td_proc->p_fd->fd_jdir = rootvnode; - VREF(rootvnode); - } + pwd_ensure_dirs(); again: NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, be_lun->dev_path, curthread); diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c b/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c index 9ff798a..52d695b 100644 --- a/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c @@ -67,21 +67,10 @@ static void * kobj_open_file_vnode(const char *file) { struct thread *td = curthread; - struct filedesc *fd; struct nameidata nd; int error, flags; - fd = td->td_proc->p_fd; - FILEDESC_XLOCK(fd); - if (fd->fd_rdir == NULL) { - fd->fd_rdir = rootvnode; - vref(fd->fd_rdir); - } - if (fd->fd_cdir == NULL) { - fd->fd_cdir = rootvnode; - vref(fd->fd_cdir); - } - FILEDESC_XUNLOCK(fd); + pwd_ensure_dirs(); flags = FREAD | O_NOFOLLOW; NDINIT(&nd, LOOKUP, 0, UIO_SYSSPACE, file, td); diff --git a/sys/cddl/compat/opensolaris/sys/vnode.h b/sys/cddl/compat/opensolaris/sys/vnode.h index 22256cf..d7bc7f7 100644 --- a/sys/cddl/compat/opensolaris/sys/vnode.h +++ b/sys/cddl/compat/opensolaris/sys/vnode.h @@ -162,7 +162,6 @@ vn_openat(char *pnamep, enum uio_seg seg, int filemode, int createmode, int fd) { struct thread *td = curthread; - struct filedesc *fdc; struct nameidata nd; int error, operation; @@ -179,17 +178,7 @@ vn_openat(char *pnamep, enum uio_seg seg, int filemode, int createmode, } ASSERT(umask == 0); - fdc = td->td_proc->p_fd; - FILEDESC_XLOCK(fdc); - if (fdc->fd_rdir == NULL) { - fdc->fd_rdir = rootvnode; - vref(fdc->fd_rdir); - } - if (fdc->fd_cdir == NULL) { - fdc->fd_cdir = rootvnode; - vref(fdc->fd_rdir); - } - FILEDESC_XUNLOCK(fdc); + pwd_ensure_dirs(); if (startvp != NULL) vref(startvp); diff --git a/sys/compat/ndis/subr_ndis.c b/sys/compat/ndis/subr_ndis.c index f3ba700..ac26a2e 100644 --- a/sys/compat/ndis/subr_ndis.c +++ b/sys/compat/ndis/subr_ndis.c @@ -2817,10 +2817,7 @@ NdisOpenFile(status, filehandle, filelength, filename, highestaddr) /* Some threads don't have a current working directory. */ - if (td->td_proc->p_fd->fd_rdir == NULL) - td->td_proc->p_fd->fd_rdir = rootvnode; - if (td->td_proc->p_fd->fd_cdir == NULL) - td->td_proc->p_fd->fd_cdir = rootvnode; + pwd_ensure_dirs(); NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, path, td); diff --git a/sys/dev/xen/blkback/blkback.c b/sys/dev/xen/blkback/blkback.c index 459271e..f266ffd 100644 --- a/sys/dev/xen/blkback/blkback.c +++ b/sys/dev/xen/blkback/blkback.c @@ -2692,18 +2692,7 @@ xbb_open_backend(struct xbb_softc *xbb) if ((xbb->flags & XBBF_READ_ONLY) == 0) flags |= FWRITE; - if (!curthread->td_proc->p_fd->fd_cdir) { - curthread->td_proc->p_fd->fd_cdir = rootvnode; - VREF(rootvnode); - } - if (!curthread->td_proc->p_fd->fd_rdir) { - curthread->td_proc->p_fd->fd_rdir = rootvnode; - VREF(rootvnode); - } - if (!curthread->td_proc->p_fd->fd_jdir) { - curthread->td_proc->p_fd->fd_jdir = rootvnode; - VREF(rootvnode); - } + pwd_ensure_dirs(); again: NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, xbb->dev_name, curthread); diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c index 37381ee..dea9d35 100644 --- a/sys/kern/kern_descrip.c +++ b/sys/kern/kern_descrip.c @@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include #include @@ -308,6 +309,24 @@ fdfree(struct filedesc *fdp, int fd) #endif } +void +pwd_ensure_dirs(void) +{ + struct filedesc *fdp; + + fdp = curproc->p_fd; + FILEDESC_XLOCK(fdp); + if (fdp->fd_cdir == NULL) { + fdp->fd_cdir = rootvnode; + VREF(rootvnode); + } + if (fdp->fd_rdir == NULL) { + fdp->fd_rdir = rootvnode; + VREF(rootvnode); + } + FILEDESC_XUNLOCK(fdp); +} + /* * System calls on descriptors. */ diff --git a/sys/kern/subr_firmware.c b/sys/kern/subr_firmware.c index 20ab76e..172d719 100644 --- a/sys/kern/subr_firmware.c +++ b/sys/kern/subr_firmware.c @@ -383,19 +383,8 @@ firmware_put(const struct firmware *p, int flags) static void set_rootvnode(void *arg, int npending) { - struct thread *td = curthread; - struct proc *p = td->td_proc; - FILEDESC_XLOCK(p->p_fd); - if (p->p_fd->fd_cdir == NULL) { - p->p_fd->fd_cdir = rootvnode; - VREF(rootvnode); - } - if (p->p_fd->fd_rdir == NULL) { - p->p_fd->fd_rdir = rootvnode; - VREF(rootvnode); - } - FILEDESC_XUNLOCK(p->p_fd); + pwd_ensure_dirs(); free(arg, M_TEMP); } diff --git a/sys/sys/filedesc.h b/sys/sys/filedesc.h index e569a3b..727a098 100644 --- a/sys/sys/filedesc.h +++ b/sys/sys/filedesc.h @@ -208,6 +208,7 @@ fd_modified(struct filedesc *fdp, int fd, seq_t seq) /* cdir/rdir/jdir manipulation functions. */ void pwd_chdir(struct thread *td, struct vnode *vp); int pwd_chroot(struct thread *td, struct vnode *vp); +void pwd_ensure_dirs(void); #endif /* _KERNEL */ -- 2.4.5 From owner-freebsd-fs@freebsd.org Fri Jul 10 23:08:12 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48AF3997FF1 for ; Fri, 10 Jul 2015 23:08:12 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x233.google.com (mail-wi0-x233.google.com [IPv6:2a00:1450:400c:c05::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id CBE523B0; Fri, 10 Jul 2015 23:08:11 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wicmz13 with SMTP id mz13so24002551wic.0; Fri, 10 Jul 2015 16:08:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=NQ0+aplfSuzAFUjrvjEDByxpFgKeSf3NpmuD8DOskR0=; b=wosFBa2i+bzOmncC8l9JMFYa8yVCk8aATyJyHW9Y2wWtrdnVZXl+apt3b5PrVppBWm fqWYMxS5pgqtDe8V2lPS7OUk7AbxyJgvKVnWMWJFkETYjTrh4ycXYKxQLxD12Ixq2dMP ZJEXAHLuCfbymUKyagpfH0sScNmsG1GXu8xdO4rvy80CagMRhG0JxR15WytT5RELQfbc 6EsYuP0FcRptYfFJAxYR7YK9Hyx2YPBb/hhY/0vHaYXumPhXfDECYRiwI7p0jAwNaa9z ArU435japVUhF6QYazPFFQPh7bMKbqcs5Vs3tNc2rRJReYbIjaC6tie0AchNGnqVGPgH 8c8Q== X-Received: by 10.180.206.84 with SMTP id lm20mr1954620wic.48.1436569690339; Fri, 10 Jul 2015 16:08:10 -0700 (PDT) Received: from localhost.localdomain (ip-89-102-11-63.net.upcbroadband.cz. [89.102.11.63]) by smtp.gmail.com with ESMTPSA id se11sm1052887wic.2.2015.07.10.16.08.08 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Jul 2015 16:08:08 -0700 (PDT) From: Mateusz Guzik To: kib@freebsd.org Cc: freebsd-fs@freebsd.org Subject: [PATCH 1/2] Move chdir/chroot-related fdp manipulation to kern_descrip.c Date: Sat, 11 Jul 2015 01:08:03 +0200 Message-Id: <1436569684-3939-2-git-send-email-mjguzik@gmail.com> X-Mailer: git-send-email 2.4.3 In-Reply-To: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 10 Jul 2015 23:08:12 -0000 From: Mateusz Guzik Prefix exported functions with pwd_. This also adds a helper which sets fd_cdir which deduplicates some code. --- sys/compat/svr4/svr4_misc.c | 2 +- sys/kern/kern_descrip.c | 88 +++++++++++++++++++++++++++++++++++++++++ sys/kern/kern_jail.c | 2 +- sys/kern/vfs_syscalls.c | 96 ++------------------------------------------- sys/sys/filedesc.h | 4 ++ sys/sys/vnode.h | 1 - sys/ufs/ffs/ffs_alloc.c | 10 +---- 7 files changed, 100 insertions(+), 103 deletions(-) diff --git a/sys/compat/svr4/svr4_misc.c b/sys/compat/svr4/svr4_misc.c index ec4504e..5e53874 100644 --- a/sys/compat/svr4/svr4_misc.c +++ b/sys/compat/svr4/svr4_misc.c @@ -643,7 +643,7 @@ svr4_sys_fchroot(td, uap) goto fail; #endif VOP_UNLOCK(vp, 0); - error = change_root(vp, td); + error = pwd_chroot(td, vp); vrele(vp); return (error); fail: diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c index 0d5ce41..37381ee 100644 --- a/sys/kern/kern_descrip.c +++ b/sys/kern/kern_descrip.c @@ -2855,6 +2855,94 @@ dupfdopen(struct thread *td, struct filedesc *fdp, int dfd, int mode, } /* + * This sysctl determines if we will allow a process to chroot(2) if it + * has a directory open: + * 0: disallowed for all processes. + * 1: allowed for processes that were not already chroot(2)'ed. + * 2: allowed for all processes. + */ + +static int chroot_allow_open_directories = 1; + +SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, + &chroot_allow_open_directories, 0, + "Allow a process to chroot(2) if it has a directory open"); + +/* + * Helper function for raised chroot(2) security function: Refuse if + * any filedescriptors are open directories. + */ +static int +chroot_refuse_vdir_fds(struct filedesc *fdp) +{ + struct vnode *vp; + struct file *fp; + int fd; + + FILEDESC_LOCK_ASSERT(fdp); + + for (fd = 0; fd <= fdp->fd_lastfile; fd++) { + fp = fget_locked(fdp, fd); + if (fp == NULL) + continue; + if (fp->f_type == DTYPE_VNODE) { + vp = fp->f_vnode; + if (vp->v_type == VDIR) + return (EPERM); + } + } + return (0); +} + +/* + * Common routine for kern_chroot() and jail_attach(). The caller is + * responsible for invoking priv_check() and mac_vnode_check_chroot() to + * authorize this operation. + */ +int +pwd_chroot(struct thread *td, struct vnode *vp) +{ + struct filedesc *fdp; + struct vnode *oldvp; + int error; + + fdp = td->td_proc->p_fd; + FILEDESC_XLOCK(fdp); + if (chroot_allow_open_directories == 0 || + (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { + error = chroot_refuse_vdir_fds(fdp); + if (error != 0) { + FILEDESC_XUNLOCK(fdp); + return (error); + } + } + oldvp = fdp->fd_rdir; + fdp->fd_rdir = vp; + VREF(fdp->fd_rdir); + if (!fdp->fd_jdir) { + fdp->fd_jdir = vp; + VREF(fdp->fd_jdir); + } + FILEDESC_XUNLOCK(fdp); + vrele(oldvp); + return (0); +} + +void +pwd_chdir(struct thread *td, struct vnode *vp) +{ + struct filedesc *fdp; + struct vnode *oldvp; + + fdp = td->td_proc->p_fd; + FILEDESC_XLOCK(fdp); + oldvp = fdp->fd_cdir; + fdp->fd_cdir = vp; + FILEDESC_XUNLOCK(fdp); + vrele(oldvp); +} + +/* * Scan all active processes and prisons to see if any of them have a current * or root directory of `olddp'. If so, replace them with the new mount point. */ diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c index c118d74..6bdddd7 100644 --- a/sys/kern/kern_jail.c +++ b/sys/kern/kern_jail.c @@ -2432,7 +2432,7 @@ do_jail_attach(struct thread *td, struct prison *pr) goto e_unlock; #endif VOP_UNLOCK(pr->pr_root, 0); - if ((error = change_root(pr->pr_root, td))) + if ((error = pwd_chroot(td, pr->pr_root))) goto e_revert_osd; newcred = crget(); diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c index 9088017..258a1ef 100644 --- a/sys/kern/vfs_syscalls.c +++ b/sys/kern/vfs_syscalls.c @@ -728,8 +728,7 @@ sys_fchdir(td, uap) int fd; } */ *uap; { - register struct filedesc *fdp = td->td_proc->p_fd; - struct vnode *vp, *tdp, *vpold; + struct vnode *vp, *tdp; struct mount *mp; struct file *fp; cap_rights_t rights; @@ -761,11 +760,7 @@ sys_fchdir(td, uap) return (error); } VOP_UNLOCK(vp, 0); - FILEDESC_XLOCK(fdp); - vpold = fdp->fd_cdir; - fdp->fd_cdir = vp; - FILEDESC_XUNLOCK(fdp); - vrele(vpold); + pwd_chdir(td, vp); return (0); } @@ -791,9 +786,7 @@ sys_chdir(td, uap) int kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) { - register struct filedesc *fdp = td->td_proc->p_fd; struct nameidata nd; - struct vnode *vp; int error; NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF | AUDITVNODE1, @@ -807,56 +800,11 @@ kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) } VOP_UNLOCK(nd.ni_vp, 0); NDFREE(&nd, NDF_ONLY_PNBUF); - FILEDESC_XLOCK(fdp); - vp = fdp->fd_cdir; - fdp->fd_cdir = nd.ni_vp; - FILEDESC_XUNLOCK(fdp); - vrele(vp); - return (0); -} - -/* - * Helper function for raised chroot(2) security function: Refuse if - * any filedescriptors are open directories. - */ -static int -chroot_refuse_vdir_fds(fdp) - struct filedesc *fdp; -{ - struct vnode *vp; - struct file *fp; - int fd; - - FILEDESC_LOCK_ASSERT(fdp); - - for (fd = 0; fd <= fdp->fd_lastfile; fd++) { - fp = fget_locked(fdp, fd); - if (fp == NULL) - continue; - if (fp->f_type == DTYPE_VNODE) { - vp = fp->f_vnode; - if (vp->v_type == VDIR) - return (EPERM); - } - } + pwd_chdir(td, nd.ni_vp); return (0); } /* - * This sysctl determines if we will allow a process to chroot(2) if it - * has a directory open: - * 0: disallowed for all processes. - * 1: allowed for processes that were not already chroot(2)'ed. - * 2: allowed for all processes. - */ - -static int chroot_allow_open_directories = 1; - -SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, - &chroot_allow_open_directories, 0, - "Allow a process to chroot(2) if it has a directory open"); - -/* * Change notion of root (``/'') directory. */ #ifndef _SYS_SYSPROTO_H_ @@ -891,7 +839,7 @@ sys_chroot(td, uap) goto e_vunlock; #endif VOP_UNLOCK(nd.ni_vp, 0); - error = change_root(nd.ni_vp, td); + error = pwd_chroot(td, nd.ni_vp); vrele(nd.ni_vp); NDFREE(&nd, NDF_ONLY_PNBUF); return (error); @@ -926,42 +874,6 @@ change_dir(vp, td) return (VOP_ACCESS(vp, VEXEC, td->td_ucred, td)); } -/* - * Common routine for kern_chroot() and jail_attach(). The caller is - * responsible for invoking priv_check() and mac_vnode_check_chroot() to - * authorize this operation. - */ -int -change_root(vp, td) - struct vnode *vp; - struct thread *td; -{ - struct filedesc *fdp; - struct vnode *oldvp; - int error; - - fdp = td->td_proc->p_fd; - FILEDESC_XLOCK(fdp); - if (chroot_allow_open_directories == 0 || - (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { - error = chroot_refuse_vdir_fds(fdp); - if (error != 0) { - FILEDESC_XUNLOCK(fdp); - return (error); - } - } - oldvp = fdp->fd_rdir; - fdp->fd_rdir = vp; - VREF(fdp->fd_rdir); - if (!fdp->fd_jdir) { - fdp->fd_jdir = vp; - VREF(fdp->fd_jdir); - } - FILEDESC_XUNLOCK(fdp); - vrele(oldvp); - return (0); -} - static __inline void flags_to_rights(int flags, cap_rights_t *rightsp) { diff --git a/sys/sys/filedesc.h b/sys/sys/filedesc.h index ab7ce9f..e569a3b 100644 --- a/sys/sys/filedesc.h +++ b/sys/sys/filedesc.h @@ -205,6 +205,10 @@ fd_modified(struct filedesc *fdp, int fd, seq_t seq) return (!seq_consistent(fd_seq(fdp->fd_files, fd), seq)); } +/* cdir/rdir/jdir manipulation functions. */ +void pwd_chdir(struct thread *td, struct vnode *vp); +int pwd_chroot(struct thread *td, struct vnode *vp); + #endif /* _KERNEL */ #endif /* !_SYS_FILEDESC_H_ */ diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 36ef8af..6d5da32 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -616,7 +616,6 @@ void cache_purge(struct vnode *vp); void cache_purge_negative(struct vnode *vp); void cache_purgevfs(struct mount *mp); int change_dir(struct vnode *vp, struct thread *td); -int change_root(struct vnode *vp, struct thread *td); void cvtstat(struct stat *st, struct ostat *ost); void cvtnstat(struct stat *sb, struct nstat *nsb); int getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c index 2b9c334..c587dfb 100644 --- a/sys/ufs/ffs/ffs_alloc.c +++ b/sys/ufs/ffs/ffs_alloc.c @@ -2748,13 +2748,12 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) struct thread *td = curthread; struct fsck_cmd cmd; struct ufsmount *ump; - struct vnode *vp, *vpold, *dvp, *fdvp; + struct vnode *vp, *dvp, *fdvp; struct inode *ip, *dp; struct mount *mp; struct fs *fs; ufs2_daddr_t blkno; long blkcnt, blksize; - struct filedesc *fdp; struct file *fp, *vfp; cap_rights_t rights; int filetype, error; @@ -2968,12 +2967,7 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) break; } VOP_UNLOCK(vp, 0); - fdp = td->td_proc->p_fd; - FILEDESC_XLOCK(fdp); - vpold = fdp->fd_cdir; - fdp->fd_cdir = vp; - FILEDESC_XUNLOCK(fdp); - vrele(vpold); + pwd_chdir(td, vp); break; case FFS_SET_DOTDOT: -- 2.4.5 From owner-freebsd-fs@freebsd.org Sat Jul 11 12:43:48 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 7566F3A8A for ; Sat, 11 Jul 2015 12:43:48 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 00FAA110F for ; Sat, 11 Jul 2015 12:43:47 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t6BChabr043863 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 11 Jul 2015 15:43:36 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t6BChabr043863 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t6BChZvU043862; Sat, 11 Jul 2015 15:43:35 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 11 Jul 2015 15:43:35 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org Subject: Re: [PATCH 1/2] Move chdir/chroot-related fdp manipulation to kern_descrip.c Message-ID: <20150711124335.GF2080@kib.kiev.ua> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> <1436569684-3939-2-git-send-email-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1436569684-3939-2-git-send-email-mjguzik@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jul 2015 12:43:48 -0000 On Sat, Jul 11, 2015 at 01:08:03AM +0200, Mateusz Guzik wrote: > From: Mateusz Guzik > > Prefix exported functions with pwd_. > > This also adds a helper which sets fd_cdir which deduplicates some code. Patch is fine, I like the cleanup. Please use the opportunity to fix minor existing issues, mostly with style, see below. > --- > sys/compat/svr4/svr4_misc.c | 2 +- > sys/kern/kern_descrip.c | 88 +++++++++++++++++++++++++++++++++++++++++ > sys/kern/kern_jail.c | 2 +- > sys/kern/vfs_syscalls.c | 96 ++------------------------------------------- > sys/sys/filedesc.h | 4 ++ > sys/sys/vnode.h | 1 - > sys/ufs/ffs/ffs_alloc.c | 10 +---- > 7 files changed, 100 insertions(+), 103 deletions(-) > > diff --git a/sys/compat/svr4/svr4_misc.c b/sys/compat/svr4/svr4_misc.c > index ec4504e..5e53874 100644 > --- a/sys/compat/svr4/svr4_misc.c > +++ b/sys/compat/svr4/svr4_misc.c > @@ -643,7 +643,7 @@ svr4_sys_fchroot(td, uap) > goto fail; > #endif > VOP_UNLOCK(vp, 0); > - error = change_root(vp, td); > + error = pwd_chroot(td, vp); > vrele(vp); > return (error); > fail: > diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c > index 0d5ce41..37381ee 100644 > --- a/sys/kern/kern_descrip.c > +++ b/sys/kern/kern_descrip.c > @@ -2855,6 +2855,94 @@ dupfdopen(struct thread *td, struct filedesc *fdp, int dfd, int mode, > } > > /* > + * This sysctl determines if we will allow a process to chroot(2) if it > + * has a directory open: > + * 0: disallowed for all processes. > + * 1: allowed for processes that were not already chroot(2)'ed. > + * 2: allowed for all processes. > + */ > + > +static int chroot_allow_open_directories = 1; > + > +SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, > + &chroot_allow_open_directories, 0, > + "Allow a process to chroot(2) if it has a directory open"); > + > +/* > + * Helper function for raised chroot(2) security function: Refuse if > + * any filedescriptors are open directories. > + */ > +static int > +chroot_refuse_vdir_fds(struct filedesc *fdp) > +{ > + struct vnode *vp; > + struct file *fp; > + int fd; > + > + FILEDESC_LOCK_ASSERT(fdp); > + > + for (fd = 0; fd <= fdp->fd_lastfile; fd++) { > + fp = fget_locked(fdp, fd); > + if (fp == NULL) > + continue; > + if (fp->f_type == DTYPE_VNODE) { > + vp = fp->f_vnode; > + if (vp->v_type == VDIR) > + return (EPERM); > + } > + } > + return (0); > +} > + > +/* > + * Common routine for kern_chroot() and jail_attach(). The caller is > + * responsible for invoking priv_check() and mac_vnode_check_chroot() to > + * authorize this operation. > + */ > +int > +pwd_chroot(struct thread *td, struct vnode *vp) > +{ > + struct filedesc *fdp; > + struct vnode *oldvp; > + int error; > + > + fdp = td->td_proc->p_fd; > + FILEDESC_XLOCK(fdp); > + if (chroot_allow_open_directories == 0 || > + (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { > + error = chroot_refuse_vdir_fds(fdp); > + if (error != 0) { > + FILEDESC_XUNLOCK(fdp); > + return (error); > + } > + } > + oldvp = fdp->fd_rdir; VREF(vp); > + fdp->fd_rdir = vp; > + VREF(fdp->fd_rdir); Remove this line. > + if (!fdp->fd_jdir) { if (fdp->fd_jdir == NULL) { > + fdp->fd_jdir = vp; > + VREF(fdp->fd_jdir); Again, swap the order: do VREF(vp), then assign. > + } > + FILEDESC_XUNLOCK(fdp); > + vrele(oldvp); > + return (0); > +} > + > +void > +pwd_chdir(struct thread *td, struct vnode *vp) > +{ > + struct filedesc *fdp; > + struct vnode *oldvp; > + > + fdp = td->td_proc->p_fd; > + FILEDESC_XLOCK(fdp); > + oldvp = fdp->fd_cdir; > + fdp->fd_cdir = vp; Assert that vp v_usecount > 0, as the inaccurate but still useful check. We have no way to ensure that the reference is owned by the code, but we do know that there must be such reference. > + FILEDESC_XUNLOCK(fdp); > + vrele(oldvp); > +} > + > +/* > * Scan all active processes and prisons to see if any of them have a current > * or root directory of `olddp'. If so, replace them with the new mount point. > */ > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c > index c118d74..6bdddd7 100644 > --- a/sys/kern/kern_jail.c > +++ b/sys/kern/kern_jail.c > @@ -2432,7 +2432,7 @@ do_jail_attach(struct thread *td, struct prison *pr) > goto e_unlock; > #endif > VOP_UNLOCK(pr->pr_root, 0); > - if ((error = change_root(pr->pr_root, td))) > + if ((error = pwd_chroot(td, pr->pr_root))) > goto e_revert_osd; > > newcred = crget(); > diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c > index 9088017..258a1ef 100644 > --- a/sys/kern/vfs_syscalls.c > +++ b/sys/kern/vfs_syscalls.c > @@ -728,8 +728,7 @@ sys_fchdir(td, uap) > int fd; > } */ *uap; > { > - register struct filedesc *fdp = td->td_proc->p_fd; > - struct vnode *vp, *tdp, *vpold; > + struct vnode *vp, *tdp; > struct mount *mp; > struct file *fp; > cap_rights_t rights; > @@ -761,11 +760,7 @@ sys_fchdir(td, uap) > return (error); > } > VOP_UNLOCK(vp, 0); > - FILEDESC_XLOCK(fdp); > - vpold = fdp->fd_cdir; > - fdp->fd_cdir = vp; > - FILEDESC_XUNLOCK(fdp); > - vrele(vpold); > + pwd_chdir(td, vp); > return (0); > } > > @@ -791,9 +786,7 @@ sys_chdir(td, uap) > int > kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) > { > - register struct filedesc *fdp = td->td_proc->p_fd; > struct nameidata nd; > - struct vnode *vp; > int error; > > NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF | AUDITVNODE1, > @@ -807,56 +800,11 @@ kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) > } > VOP_UNLOCK(nd.ni_vp, 0); > NDFREE(&nd, NDF_ONLY_PNBUF); > - FILEDESC_XLOCK(fdp); > - vp = fdp->fd_cdir; > - fdp->fd_cdir = nd.ni_vp; > - FILEDESC_XUNLOCK(fdp); > - vrele(vp); > - return (0); > -} > - > -/* > - * Helper function for raised chroot(2) security function: Refuse if > - * any filedescriptors are open directories. > - */ > -static int > -chroot_refuse_vdir_fds(fdp) > - struct filedesc *fdp; > -{ > - struct vnode *vp; > - struct file *fp; > - int fd; > - > - FILEDESC_LOCK_ASSERT(fdp); > - > - for (fd = 0; fd <= fdp->fd_lastfile; fd++) { > - fp = fget_locked(fdp, fd); > - if (fp == NULL) > - continue; > - if (fp->f_type == DTYPE_VNODE) { > - vp = fp->f_vnode; > - if (vp->v_type == VDIR) > - return (EPERM); > - } > - } > + pwd_chdir(td, nd.ni_vp); > return (0); > } > > /* > - * This sysctl determines if we will allow a process to chroot(2) if it > - * has a directory open: > - * 0: disallowed for all processes. > - * 1: allowed for processes that were not already chroot(2)'ed. > - * 2: allowed for all processes. > - */ > - > -static int chroot_allow_open_directories = 1; > - > -SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, > - &chroot_allow_open_directories, 0, > - "Allow a process to chroot(2) if it has a directory open"); > - > -/* > * Change notion of root (``/'') directory. > */ > #ifndef _SYS_SYSPROTO_H_ > @@ -891,7 +839,7 @@ sys_chroot(td, uap) > goto e_vunlock; > #endif > VOP_UNLOCK(nd.ni_vp, 0); > - error = change_root(nd.ni_vp, td); > + error = pwd_chroot(td, nd.ni_vp); > vrele(nd.ni_vp); > NDFREE(&nd, NDF_ONLY_PNBUF); > return (error); > @@ -926,42 +874,6 @@ change_dir(vp, td) > return (VOP_ACCESS(vp, VEXEC, td->td_ucred, td)); > } > > -/* > - * Common routine for kern_chroot() and jail_attach(). The caller is > - * responsible for invoking priv_check() and mac_vnode_check_chroot() to > - * authorize this operation. > - */ > -int > -change_root(vp, td) > - struct vnode *vp; > - struct thread *td; > -{ > - struct filedesc *fdp; > - struct vnode *oldvp; > - int error; > - > - fdp = td->td_proc->p_fd; > - FILEDESC_XLOCK(fdp); > - if (chroot_allow_open_directories == 0 || > - (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { > - error = chroot_refuse_vdir_fds(fdp); > - if (error != 0) { > - FILEDESC_XUNLOCK(fdp); > - return (error); > - } > - } > - oldvp = fdp->fd_rdir; > - fdp->fd_rdir = vp; > - VREF(fdp->fd_rdir); > - if (!fdp->fd_jdir) { > - fdp->fd_jdir = vp; > - VREF(fdp->fd_jdir); > - } > - FILEDESC_XUNLOCK(fdp); > - vrele(oldvp); > - return (0); > -} > - > static __inline void > flags_to_rights(int flags, cap_rights_t *rightsp) > { > diff --git a/sys/sys/filedesc.h b/sys/sys/filedesc.h > index ab7ce9f..e569a3b 100644 > --- a/sys/sys/filedesc.h > +++ b/sys/sys/filedesc.h > @@ -205,6 +205,10 @@ fd_modified(struct filedesc *fdp, int fd, seq_t seq) > return (!seq_consistent(fd_seq(fdp->fd_files, fd), seq)); > } > > +/* cdir/rdir/jdir manipulation functions. */ > +void pwd_chdir(struct thread *td, struct vnode *vp); > +int pwd_chroot(struct thread *td, struct vnode *vp); > + > #endif /* _KERNEL */ > > #endif /* !_SYS_FILEDESC_H_ */ > diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h > index 36ef8af..6d5da32 100644 > --- a/sys/sys/vnode.h > +++ b/sys/sys/vnode.h > @@ -616,7 +616,6 @@ void cache_purge(struct vnode *vp); > void cache_purge_negative(struct vnode *vp); > void cache_purgevfs(struct mount *mp); > int change_dir(struct vnode *vp, struct thread *td); > -int change_root(struct vnode *vp, struct thread *td); > void cvtstat(struct stat *st, struct ostat *ost); > void cvtnstat(struct stat *sb, struct nstat *nsb); > int getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c > index 2b9c334..c587dfb 100644 > --- a/sys/ufs/ffs/ffs_alloc.c > +++ b/sys/ufs/ffs/ffs_alloc.c > @@ -2748,13 +2748,12 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) > struct thread *td = curthread; > struct fsck_cmd cmd; > struct ufsmount *ump; > - struct vnode *vp, *vpold, *dvp, *fdvp; > + struct vnode *vp, *dvp, *fdvp; > struct inode *ip, *dp; > struct mount *mp; > struct fs *fs; > ufs2_daddr_t blkno; > long blkcnt, blksize; > - struct filedesc *fdp; > struct file *fp, *vfp; > cap_rights_t rights; > int filetype, error; > @@ -2968,12 +2967,7 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) > break; > } > VOP_UNLOCK(vp, 0); > - fdp = td->td_proc->p_fd; > - FILEDESC_XLOCK(fdp); > - vpold = fdp->fd_cdir; > - fdp->fd_cdir = vp; > - FILEDESC_XUNLOCK(fdp); > - vrele(vpold); > + pwd_chdir(td, vp); > break; > > case FFS_SET_DOTDOT: > -- > 2.4.5 From owner-freebsd-fs@freebsd.org Sat Jul 11 12:48:55 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id BF9E83ADF for ; Sat, 11 Jul 2015 12:48:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 655BA11C0 for ; Sat, 11 Jul 2015 12:48:55 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t6BCmmR3045277 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 11 Jul 2015 15:48:49 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t6BCmmR3045277 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t6BCmmGr045276; Sat, 11 Jul 2015 15:48:48 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 11 Jul 2015 15:48:48 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org Subject: Re: [PATCH 2/2] Create a dedicated function for ensuring that cdir and rdir are populated. Message-ID: <20150711124848.GG2080@kib.kiev.ua> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> <1436569684-3939-3-git-send-email-mjguzik@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1436569684-3939-3-git-send-email-mjguzik@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jul 2015 12:48:55 -0000 On Sat, Jul 11, 2015 at 01:08:04AM +0200, Mateusz Guzik wrote: > From: Mateusz Guzik > > Previously several places were doing it on its own, partially > incorrectly (e.g. without the filedesc locked) or even actively harmful > by assigning rootvnode without vreling it or populating jdir. > > This functionality should not exist and will be garbage collected after > all callers are properly reviewed. Why do you think that this code 'should not exist' ? The code comes due to the need of some kernel processes to do file i/o, but also from the desire to avoid having all kernel processes reference some vnode, even if not needed. How do you propose to make it possible to use lookups etc from a kernel process ? Regardless of the note above, the patch looks fine, it is the good cleanup. > --- > sys/cam/ctl/ctl_backend_block.c | 13 +------------ > sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c | 13 +------------ > sys/cddl/compat/opensolaris/sys/vnode.h | 13 +------------ > sys/compat/ndis/subr_ndis.c | 5 +---- > sys/dev/xen/blkback/blkback.c | 13 +------------ > sys/kern/kern_descrip.c | 19 +++++++++++++++++++ > sys/kern/subr_firmware.c | 13 +------------ > sys/sys/filedesc.h | 1 + > 8 files changed, 26 insertions(+), 64 deletions(-) > > diff --git a/sys/cam/ctl/ctl_backend_block.c b/sys/cam/ctl/ctl_backend_block.c > index c56023b..8ea52aa 100644 > --- a/sys/cam/ctl/ctl_backend_block.c > +++ b/sys/cam/ctl/ctl_backend_block.c > @@ -2123,18 +2123,7 @@ ctl_be_block_open(struct ctl_be_block_softc *softc, > return (1); > } > > - if (!curthread->td_proc->p_fd->fd_cdir) { > - curthread->td_proc->p_fd->fd_cdir = rootvnode; > - VREF(rootvnode); > - } > - if (!curthread->td_proc->p_fd->fd_rdir) { > - curthread->td_proc->p_fd->fd_rdir = rootvnode; > - VREF(rootvnode); > - } > - if (!curthread->td_proc->p_fd->fd_jdir) { > - curthread->td_proc->p_fd->fd_jdir = rootvnode; > - VREF(rootvnode); > - } > + pwd_ensure_dirs(); > > again: > NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, be_lun->dev_path, curthread); > diff --git a/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c b/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c > index 9ff798a..52d695b 100644 > --- a/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c > +++ b/sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c > @@ -67,21 +67,10 @@ static void * > kobj_open_file_vnode(const char *file) > { > struct thread *td = curthread; > - struct filedesc *fd; > struct nameidata nd; > int error, flags; > > - fd = td->td_proc->p_fd; > - FILEDESC_XLOCK(fd); > - if (fd->fd_rdir == NULL) { > - fd->fd_rdir = rootvnode; > - vref(fd->fd_rdir); > - } > - if (fd->fd_cdir == NULL) { > - fd->fd_cdir = rootvnode; > - vref(fd->fd_cdir); > - } > - FILEDESC_XUNLOCK(fd); > + pwd_ensure_dirs(); > > flags = FREAD | O_NOFOLLOW; > NDINIT(&nd, LOOKUP, 0, UIO_SYSSPACE, file, td); > diff --git a/sys/cddl/compat/opensolaris/sys/vnode.h b/sys/cddl/compat/opensolaris/sys/vnode.h > index 22256cf..d7bc7f7 100644 > --- a/sys/cddl/compat/opensolaris/sys/vnode.h > +++ b/sys/cddl/compat/opensolaris/sys/vnode.h > @@ -162,7 +162,6 @@ vn_openat(char *pnamep, enum uio_seg seg, int filemode, int createmode, > int fd) > { > struct thread *td = curthread; > - struct filedesc *fdc; > struct nameidata nd; > int error, operation; > > @@ -179,17 +178,7 @@ vn_openat(char *pnamep, enum uio_seg seg, int filemode, int createmode, > } > ASSERT(umask == 0); > > - fdc = td->td_proc->p_fd; > - FILEDESC_XLOCK(fdc); > - if (fdc->fd_rdir == NULL) { > - fdc->fd_rdir = rootvnode; > - vref(fdc->fd_rdir); > - } > - if (fdc->fd_cdir == NULL) { > - fdc->fd_cdir = rootvnode; > - vref(fdc->fd_rdir); > - } > - FILEDESC_XUNLOCK(fdc); > + pwd_ensure_dirs(); > > if (startvp != NULL) > vref(startvp); > diff --git a/sys/compat/ndis/subr_ndis.c b/sys/compat/ndis/subr_ndis.c > index f3ba700..ac26a2e 100644 > --- a/sys/compat/ndis/subr_ndis.c > +++ b/sys/compat/ndis/subr_ndis.c > @@ -2817,10 +2817,7 @@ NdisOpenFile(status, filehandle, filelength, filename, highestaddr) > > /* Some threads don't have a current working directory. */ > > - if (td->td_proc->p_fd->fd_rdir == NULL) > - td->td_proc->p_fd->fd_rdir = rootvnode; > - if (td->td_proc->p_fd->fd_cdir == NULL) > - td->td_proc->p_fd->fd_cdir = rootvnode; > + pwd_ensure_dirs(); > > NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, path, td); > > diff --git a/sys/dev/xen/blkback/blkback.c b/sys/dev/xen/blkback/blkback.c > index 459271e..f266ffd 100644 > --- a/sys/dev/xen/blkback/blkback.c > +++ b/sys/dev/xen/blkback/blkback.c > @@ -2692,18 +2692,7 @@ xbb_open_backend(struct xbb_softc *xbb) > if ((xbb->flags & XBBF_READ_ONLY) == 0) > flags |= FWRITE; > > - if (!curthread->td_proc->p_fd->fd_cdir) { > - curthread->td_proc->p_fd->fd_cdir = rootvnode; > - VREF(rootvnode); > - } > - if (!curthread->td_proc->p_fd->fd_rdir) { > - curthread->td_proc->p_fd->fd_rdir = rootvnode; > - VREF(rootvnode); > - } > - if (!curthread->td_proc->p_fd->fd_jdir) { > - curthread->td_proc->p_fd->fd_jdir = rootvnode; > - VREF(rootvnode); > - } > + pwd_ensure_dirs(); > > again: > NDINIT(&nd, LOOKUP, FOLLOW, UIO_SYSSPACE, xbb->dev_name, curthread); > diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c > index 37381ee..dea9d35 100644 > --- a/sys/kern/kern_descrip.c > +++ b/sys/kern/kern_descrip.c > @@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$"); > #include > #include > #include > +#include > #include > #include > #include > @@ -308,6 +309,24 @@ fdfree(struct filedesc *fdp, int fd) > #endif > } > > +void > +pwd_ensure_dirs(void) > +{ > + struct filedesc *fdp; > + > + fdp = curproc->p_fd; > + FILEDESC_XLOCK(fdp); > + if (fdp->fd_cdir == NULL) { > + fdp->fd_cdir = rootvnode; > + VREF(rootvnode); > + } > + if (fdp->fd_rdir == NULL) { > + fdp->fd_rdir = rootvnode; > + VREF(rootvnode); > + } > + FILEDESC_XUNLOCK(fdp); > +} > + > /* > * System calls on descriptors. > */ > diff --git a/sys/kern/subr_firmware.c b/sys/kern/subr_firmware.c > index 20ab76e..172d719 100644 > --- a/sys/kern/subr_firmware.c > +++ b/sys/kern/subr_firmware.c > @@ -383,19 +383,8 @@ firmware_put(const struct firmware *p, int flags) > static void > set_rootvnode(void *arg, int npending) > { > - struct thread *td = curthread; > - struct proc *p = td->td_proc; > > - FILEDESC_XLOCK(p->p_fd); > - if (p->p_fd->fd_cdir == NULL) { > - p->p_fd->fd_cdir = rootvnode; > - VREF(rootvnode); > - } > - if (p->p_fd->fd_rdir == NULL) { > - p->p_fd->fd_rdir = rootvnode; > - VREF(rootvnode); > - } > - FILEDESC_XUNLOCK(p->p_fd); > + pwd_ensure_dirs(); > > free(arg, M_TEMP); > } > diff --git a/sys/sys/filedesc.h b/sys/sys/filedesc.h > index e569a3b..727a098 100644 > --- a/sys/sys/filedesc.h > +++ b/sys/sys/filedesc.h > @@ -208,6 +208,7 @@ fd_modified(struct filedesc *fdp, int fd, seq_t seq) > /* cdir/rdir/jdir manipulation functions. */ > void pwd_chdir(struct thread *td, struct vnode *vp); > int pwd_chroot(struct thread *td, struct vnode *vp); > +void pwd_ensure_dirs(void); > > #endif /* _KERNEL */ > > -- > 2.4.5 From owner-freebsd-fs@freebsd.org Sat Jul 11 13:40:30 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 9F8E4998039 for ; Sat, 11 Jul 2015 13:40:30 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x236.google.com (mail-wi0-x236.google.com [IPv6:2a00:1450:400c:c05::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 218FC1BC0 for ; Sat, 11 Jul 2015 13:40:30 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by wifm2 with SMTP id m2so67440735wif.1 for ; Sat, 11 Jul 2015 06:40:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=/AkbFUdwD1umZA2datA7Kpjawzi+yyXqbKaFf+0yXcQ=; b=V/Kn58YCxkah9vk28KN0G47F2Nc1EmgPDFghcwVMbDz5nbnGuKePMJzpsO5AIjT43y dSfLnW5RNysP6EAVo4bYrQ0LEuEJpGvCa+5WY/CrH+LgYnA/aZxQzMFv5ElnPV3uSOzS rSANuelgu0Iku02R6GBbs39cAHV1IUP48oNCd3GZATwKqu3S6S/Hmx30oim1MkAe6iiC 25uWQ5Wh11Q0C1ERM98fEi5J2DYeK6fq8G+HqxoLpyOlh5IRc3bpIu7lxO5mj2DVd59U BsltmDGJOX13mWUtsL+nGPQ7N/eBwpeJ5/4D92hm0slD52DB/o5gAV8xInneiM42otq7 4Q4A== X-Received: by 10.194.172.130 with SMTP id bc2mr54011086wjc.85.1436622026259; Sat, 11 Jul 2015 06:40:26 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id d7sm3674535wij.0.2015.07.11.06.40.23 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 11 Jul 2015 06:40:24 -0700 (PDT) Date: Sat, 11 Jul 2015 15:40:21 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: freebsd-fs@freebsd.org Subject: Re: [PATCH 1/2] Move chdir/chroot-related fdp manipulation to kern_descrip.c Message-ID: <20150711134021.GA1433@dft-labs.eu> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> <1436569684-3939-2-git-send-email-mjguzik@gmail.com> <20150711124335.GF2080@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150711124335.GF2080@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jul 2015 13:40:30 -0000 On Sat, Jul 11, 2015 at 03:43:35PM +0300, Konstantin Belousov wrote: > On Sat, Jul 11, 2015 at 01:08:03AM +0200, Mateusz Guzik wrote: > > From: Mateusz Guzik > > > > Prefix exported functions with pwd_. > > > > This also adds a helper which sets fd_cdir which deduplicates some code. > Patch is fine, I like the cleanup. Please use the opportunity to fix > minor existing issues, mostly with style, see below. > > > + oldvp = fdp->fd_rdir; > VREF(vp); > > + fdp->fd_rdir = vp; > > + VREF(fdp->fd_rdir); > Remove this line. > > > + if (!fdp->fd_jdir) { > if (fdp->fd_jdir == NULL) { > > > + fdp->fd_jdir = vp; > > + VREF(fdp->fd_jdir); > Again, swap the order: do VREF(vp), then assign. > Done. > > + } > > + FILEDESC_XUNLOCK(fdp); > > + vrele(oldvp); > > + return (0); > > +} > > + > > +void > > +pwd_chdir(struct thread *td, struct vnode *vp) > > +{ > > + struct filedesc *fdp; > > + struct vnode *oldvp; > > + > > + fdp = td->td_proc->p_fd; > > + FILEDESC_XLOCK(fdp); > > + oldvp = fdp->fd_cdir; > > + fdp->fd_cdir = vp; > Assert that vp v_usecount > 0, as the inaccurate but still useful check. > We have no way to ensure that the reference is owned by the code, but > we do know that there must be such reference. > Done. diff --git a/sys/compat/svr4/svr4_misc.c b/sys/compat/svr4/svr4_misc.c index ec4504e..5e53874 100644 --- a/sys/compat/svr4/svr4_misc.c +++ b/sys/compat/svr4/svr4_misc.c @@ -643,7 +643,7 @@ svr4_sys_fchroot(td, uap) goto fail; #endif VOP_UNLOCK(vp, 0); - error = change_root(vp, td); + error = pwd_chroot(td, vp); vrele(vp); return (error); fail: diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c index e8f0014..51e8b3c 100644 --- a/sys/kern/kern_descrip.c +++ b/sys/kern/kern_descrip.c @@ -2875,6 +2875,96 @@ dupfdopen(struct thread *td, struct filedesc *fdp, int dfd, int mode, } /* + * This sysctl determines if we will allow a process to chroot(2) if it + * has a directory open: + * 0: disallowed for all processes. + * 1: allowed for processes that were not already chroot(2)'ed. + * 2: allowed for all processes. + */ + +static int chroot_allow_open_directories = 1; + +SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, + &chroot_allow_open_directories, 0, + "Allow a process to chroot(2) if it has a directory open"); + +/* + * Helper function for raised chroot(2) security function: Refuse if + * any filedescriptors are open directories. + */ +static int +chroot_refuse_vdir_fds(struct filedesc *fdp) +{ + struct vnode *vp; + struct file *fp; + int fd; + + FILEDESC_LOCK_ASSERT(fdp); + + for (fd = 0; fd <= fdp->fd_lastfile; fd++) { + fp = fget_locked(fdp, fd); + if (fp == NULL) + continue; + if (fp->f_type == DTYPE_VNODE) { + vp = fp->f_vnode; + if (vp->v_type == VDIR) + return (EPERM); + } + } + return (0); +} + +/* + * Common routine for kern_chroot() and jail_attach(). The caller is + * responsible for invoking priv_check() and mac_vnode_check_chroot() to + * authorize this operation. + */ +int +pwd_chroot(struct thread *td, struct vnode *vp) +{ + struct filedesc *fdp; + struct vnode *oldvp; + int error; + + fdp = td->td_proc->p_fd; + FILEDESC_XLOCK(fdp); + if (chroot_allow_open_directories == 0 || + (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { + error = chroot_refuse_vdir_fds(fdp); + if (error != 0) { + FILEDESC_XUNLOCK(fdp); + return (error); + } + } + oldvp = fdp->fd_rdir; + VREF(vp); + fdp->fd_rdir = vp; + if (fdp->fd_jdir == NULL) { + VREF(vp); + fdp->fd_jdir = vp; + } + FILEDESC_XUNLOCK(fdp); + vrele(oldvp); + return (0); +} + +void +pwd_chdir(struct thread *td, struct vnode *vp) +{ + struct filedesc *fdp; + struct vnode *oldvp; + + fdp = td->td_proc->p_fd; + FILEDESC_XLOCK(fdp); + VNASSERT(vp->v_usecount > 0, vp, + ("chdir to a vnode with zero usecount")); + oldvp = fdp->fd_cdir; + fdp->fd_cdir = vp; + FILEDESC_XUNLOCK(fdp); + vrele(oldvp); +} + +/* * Scan all active processes and prisons to see if any of them have a current * or root directory of `olddp'. If so, replace them with the new mount point. */ diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c index c118d74..6bdddd7 100644 --- a/sys/kern/kern_jail.c +++ b/sys/kern/kern_jail.c @@ -2432,7 +2432,7 @@ do_jail_attach(struct thread *td, struct prison *pr) goto e_unlock; #endif VOP_UNLOCK(pr->pr_root, 0); - if ((error = change_root(pr->pr_root, td))) + if ((error = pwd_chroot(td, pr->pr_root))) goto e_revert_osd; newcred = crget(); diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c index 9088017..258a1ef 100644 --- a/sys/kern/vfs_syscalls.c +++ b/sys/kern/vfs_syscalls.c @@ -728,8 +728,7 @@ sys_fchdir(td, uap) int fd; } */ *uap; { - register struct filedesc *fdp = td->td_proc->p_fd; - struct vnode *vp, *tdp, *vpold; + struct vnode *vp, *tdp; struct mount *mp; struct file *fp; cap_rights_t rights; @@ -761,11 +760,7 @@ sys_fchdir(td, uap) return (error); } VOP_UNLOCK(vp, 0); - FILEDESC_XLOCK(fdp); - vpold = fdp->fd_cdir; - fdp->fd_cdir = vp; - FILEDESC_XUNLOCK(fdp); - vrele(vpold); + pwd_chdir(td, vp); return (0); } @@ -791,9 +786,7 @@ sys_chdir(td, uap) int kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) { - register struct filedesc *fdp = td->td_proc->p_fd; struct nameidata nd; - struct vnode *vp; int error; NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF | AUDITVNODE1, @@ -807,56 +800,11 @@ kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) } VOP_UNLOCK(nd.ni_vp, 0); NDFREE(&nd, NDF_ONLY_PNBUF); - FILEDESC_XLOCK(fdp); - vp = fdp->fd_cdir; - fdp->fd_cdir = nd.ni_vp; - FILEDESC_XUNLOCK(fdp); - vrele(vp); - return (0); -} - -/* - * Helper function for raised chroot(2) security function: Refuse if - * any filedescriptors are open directories. - */ -static int -chroot_refuse_vdir_fds(fdp) - struct filedesc *fdp; -{ - struct vnode *vp; - struct file *fp; - int fd; - - FILEDESC_LOCK_ASSERT(fdp); - - for (fd = 0; fd <= fdp->fd_lastfile; fd++) { - fp = fget_locked(fdp, fd); - if (fp == NULL) - continue; - if (fp->f_type == DTYPE_VNODE) { - vp = fp->f_vnode; - if (vp->v_type == VDIR) - return (EPERM); - } - } + pwd_chdir(td, nd.ni_vp); return (0); } /* - * This sysctl determines if we will allow a process to chroot(2) if it - * has a directory open: - * 0: disallowed for all processes. - * 1: allowed for processes that were not already chroot(2)'ed. - * 2: allowed for all processes. - */ - -static int chroot_allow_open_directories = 1; - -SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, - &chroot_allow_open_directories, 0, - "Allow a process to chroot(2) if it has a directory open"); - -/* * Change notion of root (``/'') directory. */ #ifndef _SYS_SYSPROTO_H_ @@ -891,7 +839,7 @@ sys_chroot(td, uap) goto e_vunlock; #endif VOP_UNLOCK(nd.ni_vp, 0); - error = change_root(nd.ni_vp, td); + error = pwd_chroot(td, nd.ni_vp); vrele(nd.ni_vp); NDFREE(&nd, NDF_ONLY_PNBUF); return (error); @@ -926,42 +874,6 @@ change_dir(vp, td) return (VOP_ACCESS(vp, VEXEC, td->td_ucred, td)); } -/* - * Common routine for kern_chroot() and jail_attach(). The caller is - * responsible for invoking priv_check() and mac_vnode_check_chroot() to - * authorize this operation. - */ -int -change_root(vp, td) - struct vnode *vp; - struct thread *td; -{ - struct filedesc *fdp; - struct vnode *oldvp; - int error; - - fdp = td->td_proc->p_fd; - FILEDESC_XLOCK(fdp); - if (chroot_allow_open_directories == 0 || - (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { - error = chroot_refuse_vdir_fds(fdp); - if (error != 0) { - FILEDESC_XUNLOCK(fdp); - return (error); - } - } - oldvp = fdp->fd_rdir; - fdp->fd_rdir = vp; - VREF(fdp->fd_rdir); - if (!fdp->fd_jdir) { - fdp->fd_jdir = vp; - VREF(fdp->fd_jdir); - } - FILEDESC_XUNLOCK(fdp); - vrele(oldvp); - return (0); -} - static __inline void flags_to_rights(int flags, cap_rights_t *rightsp) { diff --git a/sys/sys/filedesc.h b/sys/sys/filedesc.h index ab7ce9f..e569a3b 100644 --- a/sys/sys/filedesc.h +++ b/sys/sys/filedesc.h @@ -205,6 +205,10 @@ fd_modified(struct filedesc *fdp, int fd, seq_t seq) return (!seq_consistent(fd_seq(fdp->fd_files, fd), seq)); } +/* cdir/rdir/jdir manipulation functions. */ +void pwd_chdir(struct thread *td, struct vnode *vp); +int pwd_chroot(struct thread *td, struct vnode *vp); + #endif /* _KERNEL */ #endif /* !_SYS_FILEDESC_H_ */ diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h index 36ef8af..6d5da32 100644 --- a/sys/sys/vnode.h +++ b/sys/sys/vnode.h @@ -616,7 +616,6 @@ void cache_purge(struct vnode *vp); void cache_purge_negative(struct vnode *vp); void cache_purgevfs(struct mount *mp); int change_dir(struct vnode *vp, struct thread *td); -int change_root(struct vnode *vp, struct thread *td); void cvtstat(struct stat *st, struct ostat *ost); void cvtnstat(struct stat *sb, struct nstat *nsb); int getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c index 2b9c334..c587dfb 100644 --- a/sys/ufs/ffs/ffs_alloc.c +++ b/sys/ufs/ffs/ffs_alloc.c @@ -2748,13 +2748,12 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) struct thread *td = curthread; struct fsck_cmd cmd; struct ufsmount *ump; - struct vnode *vp, *vpold, *dvp, *fdvp; + struct vnode *vp, *dvp, *fdvp; struct inode *ip, *dp; struct mount *mp; struct fs *fs; ufs2_daddr_t blkno; long blkcnt, blksize; - struct filedesc *fdp; struct file *fp, *vfp; cap_rights_t rights; int filetype, error; @@ -2968,12 +2967,7 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) break; } VOP_UNLOCK(vp, 0); - fdp = td->td_proc->p_fd; - FILEDESC_XLOCK(fdp); - vpold = fdp->fd_cdir; - fdp->fd_cdir = vp; - FILEDESC_XUNLOCK(fdp); - vrele(vpold); + pwd_chdir(td, vp); break; case FFS_SET_DOTDOT: -- Mateusz Guzik From owner-freebsd-fs@freebsd.org Sat Jul 11 13:53:11 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id C246F9982B0 for ; Sat, 11 Jul 2015 13:53:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 67CF62577 for ; Sat, 11 Jul 2015 13:53:11 +0000 (UTC) (envelope-from kostikbel@gmail.com) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t6BDr3Gl025009 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Sat, 11 Jul 2015 16:53:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t6BDr3Gl025009 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t6BDr3OW024989; Sat, 11 Jul 2015 16:53:03 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Sat, 11 Jul 2015 16:53:03 +0300 From: Konstantin Belousov To: Mateusz Guzik Cc: freebsd-fs@freebsd.org Subject: Re: [PATCH 1/2] Move chdir/chroot-related fdp manipulation to kern_descrip.c Message-ID: <20150711135303.GH2080@kib.kiev.ua> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> <1436569684-3939-2-git-send-email-mjguzik@gmail.com> <20150711124335.GF2080@kib.kiev.ua> <20150711134021.GA1433@dft-labs.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150711134021.GA1433@dft-labs.eu> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.1 X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on tom.home X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jul 2015 13:53:11 -0000 On Sat, Jul 11, 2015 at 03:40:21PM +0200, Mateusz Guzik wrote: > diff --git a/sys/compat/svr4/svr4_misc.c b/sys/compat/svr4/svr4_misc.c > index ec4504e..5e53874 100644 > --- a/sys/compat/svr4/svr4_misc.c > +++ b/sys/compat/svr4/svr4_misc.c > @@ -643,7 +643,7 @@ svr4_sys_fchroot(td, uap) > goto fail; > #endif > VOP_UNLOCK(vp, 0); > - error = change_root(vp, td); > + error = pwd_chroot(td, vp); > vrele(vp); > return (error); > fail: > diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c > index e8f0014..51e8b3c 100644 > --- a/sys/kern/kern_descrip.c > +++ b/sys/kern/kern_descrip.c > @@ -2875,6 +2875,96 @@ dupfdopen(struct thread *td, struct filedesc *fdp, int dfd, int mode, > } > > /* > + * This sysctl determines if we will allow a process to chroot(2) if it > + * has a directory open: > + * 0: disallowed for all processes. > + * 1: allowed for processes that were not already chroot(2)'ed. > + * 2: allowed for all processes. > + */ > + > +static int chroot_allow_open_directories = 1; > + > +SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, > + &chroot_allow_open_directories, 0, > + "Allow a process to chroot(2) if it has a directory open"); > + > +/* > + * Helper function for raised chroot(2) security function: Refuse if > + * any filedescriptors are open directories. > + */ > +static int > +chroot_refuse_vdir_fds(struct filedesc *fdp) > +{ > + struct vnode *vp; > + struct file *fp; > + int fd; > + > + FILEDESC_LOCK_ASSERT(fdp); > + > + for (fd = 0; fd <= fdp->fd_lastfile; fd++) { > + fp = fget_locked(fdp, fd); > + if (fp == NULL) > + continue; > + if (fp->f_type == DTYPE_VNODE) { > + vp = fp->f_vnode; > + if (vp->v_type == VDIR) > + return (EPERM); > + } > + } > + return (0); > +} > + > +/* > + * Common routine for kern_chroot() and jail_attach(). The caller is > + * responsible for invoking priv_check() and mac_vnode_check_chroot() to > + * authorize this operation. > + */ > +int > +pwd_chroot(struct thread *td, struct vnode *vp) > +{ > + struct filedesc *fdp; > + struct vnode *oldvp; > + int error; > + > + fdp = td->td_proc->p_fd; > + FILEDESC_XLOCK(fdp); > + if (chroot_allow_open_directories == 0 || > + (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { > + error = chroot_refuse_vdir_fds(fdp); > + if (error != 0) { > + FILEDESC_XUNLOCK(fdp); > + return (error); > + } > + } > + oldvp = fdp->fd_rdir; > + VREF(vp); > + fdp->fd_rdir = vp; > + if (fdp->fd_jdir == NULL) { > + VREF(vp); > + fdp->fd_jdir = vp; > + } > + FILEDESC_XUNLOCK(fdp); > + vrele(oldvp); > + return (0); > +} > + > +void > +pwd_chdir(struct thread *td, struct vnode *vp) > +{ > + struct filedesc *fdp; > + struct vnode *oldvp; > + > + fdp = td->td_proc->p_fd; > + FILEDESC_XLOCK(fdp); > + VNASSERT(vp->v_usecount > 0, vp, > + ("chdir to a vnode with zero usecount")); > + oldvp = fdp->fd_cdir; > + fdp->fd_cdir = vp; > + FILEDESC_XUNLOCK(fdp); > + vrele(oldvp); > +} > + > +/* > * Scan all active processes and prisons to see if any of them have a current > * or root directory of `olddp'. If so, replace them with the new mount point. > */ > diff --git a/sys/kern/kern_jail.c b/sys/kern/kern_jail.c > index c118d74..6bdddd7 100644 > --- a/sys/kern/kern_jail.c > +++ b/sys/kern/kern_jail.c > @@ -2432,7 +2432,7 @@ do_jail_attach(struct thread *td, struct prison *pr) > goto e_unlock; > #endif > VOP_UNLOCK(pr->pr_root, 0); > - if ((error = change_root(pr->pr_root, td))) > + if ((error = pwd_chroot(td, pr->pr_root))) > goto e_revert_osd; > > newcred = crget(); > diff --git a/sys/kern/vfs_syscalls.c b/sys/kern/vfs_syscalls.c > index 9088017..258a1ef 100644 > --- a/sys/kern/vfs_syscalls.c > +++ b/sys/kern/vfs_syscalls.c > @@ -728,8 +728,7 @@ sys_fchdir(td, uap) > int fd; > } */ *uap; > { > - register struct filedesc *fdp = td->td_proc->p_fd; > - struct vnode *vp, *tdp, *vpold; > + struct vnode *vp, *tdp; > struct mount *mp; > struct file *fp; > cap_rights_t rights; > @@ -761,11 +760,7 @@ sys_fchdir(td, uap) > return (error); > } > VOP_UNLOCK(vp, 0); > - FILEDESC_XLOCK(fdp); > - vpold = fdp->fd_cdir; > - fdp->fd_cdir = vp; > - FILEDESC_XUNLOCK(fdp); > - vrele(vpold); > + pwd_chdir(td, vp); > return (0); > } > > @@ -791,9 +786,7 @@ sys_chdir(td, uap) > int > kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) > { > - register struct filedesc *fdp = td->td_proc->p_fd; > struct nameidata nd; > - struct vnode *vp; > int error; > > NDINIT(&nd, LOOKUP, FOLLOW | LOCKSHARED | LOCKLEAF | AUDITVNODE1, > @@ -807,56 +800,11 @@ kern_chdir(struct thread *td, char *path, enum uio_seg pathseg) > } > VOP_UNLOCK(nd.ni_vp, 0); > NDFREE(&nd, NDF_ONLY_PNBUF); > - FILEDESC_XLOCK(fdp); > - vp = fdp->fd_cdir; > - fdp->fd_cdir = nd.ni_vp; > - FILEDESC_XUNLOCK(fdp); > - vrele(vp); > - return (0); > -} > - > -/* > - * Helper function for raised chroot(2) security function: Refuse if > - * any filedescriptors are open directories. > - */ > -static int > -chroot_refuse_vdir_fds(fdp) > - struct filedesc *fdp; > -{ > - struct vnode *vp; > - struct file *fp; > - int fd; > - > - FILEDESC_LOCK_ASSERT(fdp); > - > - for (fd = 0; fd <= fdp->fd_lastfile; fd++) { > - fp = fget_locked(fdp, fd); > - if (fp == NULL) > - continue; > - if (fp->f_type == DTYPE_VNODE) { > - vp = fp->f_vnode; > - if (vp->v_type == VDIR) > - return (EPERM); > - } > - } > + pwd_chdir(td, nd.ni_vp); > return (0); > } > > /* > - * This sysctl determines if we will allow a process to chroot(2) if it > - * has a directory open: > - * 0: disallowed for all processes. > - * 1: allowed for processes that were not already chroot(2)'ed. > - * 2: allowed for all processes. > - */ > - > -static int chroot_allow_open_directories = 1; > - > -SYSCTL_INT(_kern, OID_AUTO, chroot_allow_open_directories, CTLFLAG_RW, > - &chroot_allow_open_directories, 0, > - "Allow a process to chroot(2) if it has a directory open"); > - > -/* > * Change notion of root (``/'') directory. > */ > #ifndef _SYS_SYSPROTO_H_ > @@ -891,7 +839,7 @@ sys_chroot(td, uap) > goto e_vunlock; > #endif > VOP_UNLOCK(nd.ni_vp, 0); > - error = change_root(nd.ni_vp, td); > + error = pwd_chroot(td, nd.ni_vp); > vrele(nd.ni_vp); > NDFREE(&nd, NDF_ONLY_PNBUF); > return (error); > @@ -926,42 +874,6 @@ change_dir(vp, td) > return (VOP_ACCESS(vp, VEXEC, td->td_ucred, td)); > } > > -/* > - * Common routine for kern_chroot() and jail_attach(). The caller is > - * responsible for invoking priv_check() and mac_vnode_check_chroot() to > - * authorize this operation. > - */ > -int > -change_root(vp, td) > - struct vnode *vp; > - struct thread *td; > -{ > - struct filedesc *fdp; > - struct vnode *oldvp; > - int error; > - > - fdp = td->td_proc->p_fd; > - FILEDESC_XLOCK(fdp); > - if (chroot_allow_open_directories == 0 || > - (chroot_allow_open_directories == 1 && fdp->fd_rdir != rootvnode)) { > - error = chroot_refuse_vdir_fds(fdp); > - if (error != 0) { > - FILEDESC_XUNLOCK(fdp); > - return (error); > - } > - } > - oldvp = fdp->fd_rdir; > - fdp->fd_rdir = vp; > - VREF(fdp->fd_rdir); > - if (!fdp->fd_jdir) { > - fdp->fd_jdir = vp; > - VREF(fdp->fd_jdir); > - } > - FILEDESC_XUNLOCK(fdp); > - vrele(oldvp); > - return (0); > -} > - > static __inline void > flags_to_rights(int flags, cap_rights_t *rightsp) > { > diff --git a/sys/sys/filedesc.h b/sys/sys/filedesc.h > index ab7ce9f..e569a3b 100644 > --- a/sys/sys/filedesc.h > +++ b/sys/sys/filedesc.h > @@ -205,6 +205,10 @@ fd_modified(struct filedesc *fdp, int fd, seq_t seq) > return (!seq_consistent(fd_seq(fdp->fd_files, fd), seq)); > } > > +/* cdir/rdir/jdir manipulation functions. */ > +void pwd_chdir(struct thread *td, struct vnode *vp); > +int pwd_chroot(struct thread *td, struct vnode *vp); > + > #endif /* _KERNEL */ > > #endif /* !_SYS_FILEDESC_H_ */ > diff --git a/sys/sys/vnode.h b/sys/sys/vnode.h > index 36ef8af..6d5da32 100644 > --- a/sys/sys/vnode.h > +++ b/sys/sys/vnode.h > @@ -616,7 +616,6 @@ void cache_purge(struct vnode *vp); > void cache_purge_negative(struct vnode *vp); > void cache_purgevfs(struct mount *mp); > int change_dir(struct vnode *vp, struct thread *td); > -int change_root(struct vnode *vp, struct thread *td); > void cvtstat(struct stat *st, struct ostat *ost); > void cvtnstat(struct stat *sb, struct nstat *nsb); > int getnewvnode(const char *tag, struct mount *mp, struct vop_vector *vops, > diff --git a/sys/ufs/ffs/ffs_alloc.c b/sys/ufs/ffs/ffs_alloc.c > index 2b9c334..c587dfb 100644 > --- a/sys/ufs/ffs/ffs_alloc.c > +++ b/sys/ufs/ffs/ffs_alloc.c > @@ -2748,13 +2748,12 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) > struct thread *td = curthread; > struct fsck_cmd cmd; > struct ufsmount *ump; > - struct vnode *vp, *vpold, *dvp, *fdvp; > + struct vnode *vp, *dvp, *fdvp; > struct inode *ip, *dp; > struct mount *mp; > struct fs *fs; > ufs2_daddr_t blkno; > long blkcnt, blksize; > - struct filedesc *fdp; > struct file *fp, *vfp; > cap_rights_t rights; > int filetype, error; > @@ -2968,12 +2967,7 @@ sysctl_ffs_fsck(SYSCTL_HANDLER_ARGS) > break; > } > VOP_UNLOCK(vp, 0); > - fdp = td->td_proc->p_fd; > - FILEDESC_XLOCK(fdp); > - vpold = fdp->fd_cdir; > - fdp->fd_cdir = vp; > - FILEDESC_XUNLOCK(fdp); > - vrele(vpold); > + pwd_chdir(td, vp); > break; > > case FFS_SET_DOTDOT: Looks good. From owner-freebsd-fs@freebsd.org Sat Jul 11 14:57:05 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id E9ABE998F64 for ; Sat, 11 Jul 2015 14:57:04 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: from mail-wi0-x22a.google.com (mail-wi0-x22a.google.com [IPv6:2a00:1450:400c:c05::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 7D58A107E for ; Sat, 11 Jul 2015 14:57:04 +0000 (UTC) (envelope-from mjguzik@gmail.com) Received: by widjy10 with SMTP id jy10so36197752wid.1 for ; Sat, 11 Jul 2015 07:57:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; bh=ySmpa49XauztUdX1z+Yo8Bnw7jFmzOE7LYYpvIbWUPE=; b=JfuGVgqJ6W2KG7lu8M1lQ76uYMz+LETiFK/2ZIxRoLdQpyt6s3bTzGZ4xP4WW1pNWk m0Ih40C4TzH+L++M926ujjwVQqP4EjskUSI908IlAfACngZR/ffXzqdfe+FFncbbhukV x3Dj3FVWXNA6GkVj6wUcEk1AniF36KX9Ud4H7Fz9VzusgyouYhe1AZ7FFuT1nBI3GJbZ zB0IYCk9uJQVI3d3oVkACmix58pwFQuQfvQ+2CIXDKzTXD6d3A/bAWEeZW9bnN+RRBsj +ofuouT7TTo8LrOZWba1dZiMEXsIEC9+Mu4S83DFeNMe5j3CpGqNBCfMBfUlPRzPK09L ozEA== X-Received: by 10.194.175.65 with SMTP id by1mr54712027wjc.152.1436626622935; Sat, 11 Jul 2015 07:57:02 -0700 (PDT) Received: from dft-labs.eu (n1x0n-1-pt.tunnel.tserv5.lon1.ipv6.he.net. [2001:470:1f08:1f7::2]) by smtp.gmail.com with ESMTPSA id j6sm3912040wix.5.2015.07.11.07.57.01 (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sat, 11 Jul 2015 07:57:01 -0700 (PDT) Date: Sat, 11 Jul 2015 16:56:59 +0200 From: Mateusz Guzik To: Konstantin Belousov Cc: freebsd-fs@freebsd.org Subject: Re: [PATCH 2/2] Create a dedicated function for ensuring that cdir and rdir are populated. Message-ID: <20150711145658.GB1433@dft-labs.eu> References: <1436569684-3939-1-git-send-email-mjguzik@gmail.com> <1436569684-3939-3-git-send-email-mjguzik@gmail.com> <20150711124848.GG2080@kib.kiev.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20150711124848.GG2080@kib.kiev.ua> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 11 Jul 2015 14:57:05 -0000 On Sat, Jul 11, 2015 at 03:48:48PM +0300, Konstantin Belousov wrote: > On Sat, Jul 11, 2015 at 01:08:04AM +0200, Mateusz Guzik wrote: > > From: Mateusz Guzik > > > > Previously several places were doing it on its own, partially > > incorrectly (e.g. without the filedesc locked) or even actively harmful > > by assigning rootvnode without vreling it or populating jdir. > > > > This functionality should not exist and will be garbage collected after > > all callers are properly reviewed. > Why do you think that this code 'should not exist' ? The code comes due > to the need of some kernel processes to do file i/o, but also from the > desire to avoid having all kernel processes reference some vnode, even > if not needed. How do you propose to make it possible to use lookups > etc from a kernel process ? > Currently random threads will get the reference at random times which imposes imposes the need to iterate over them in mountcheckdirs if rootvnode has changed, which is a rare event. If the vnode was already set for all kernel procs it would be cleaner with no serious downside that I can see. There is a an additional minor problem of rootvnode update not being synchronised with code filling cdir+rdir, i.e. there is a hypothetical potential for a use-after-free. Primarily though it is unclear if there are correctness/security issues here. Some of original offenders can be reached in both kernel and userspace process context, which may or may not be problematic for them. Preferably the code would assert the context, if any requirements are there. But that's not something I'm interested in playing with right now. > Regardless of the note above, the patch looks fine, it is the good cleanup. Ok, thanks for review. -- Mateusz Guzik