From owner-freebsd-fs@freebsd.org Thu Jul 2 00:10:34 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 30871992BCD for ; Thu, 2 Jul 2015 00:10:34 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) Received: from esa-annu.net.uoguelph.ca (esa-annu.mail.uoguelph.ca [131.104.91.36]) by mx1.freebsd.org (Postfix) with ESMTP id D5A832E6B for ; Thu, 2 Jul 2015 00:10:33 +0000 (UTC) (envelope-from rmacklem@uoguelph.ca) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: A2DBBAA1gZRV/61jaINbDoNYXwaDGbwHCoUuSgKCCBABAQEBAQEBgQqEIgEBAQMBAQEBIAQnIAsFCwIBCBgCAg0HEgICJwEJJgIECAcEARwEiAYIDbYIlmIBAQEBBgEBAQEBAQEbgSGKKYQ0AQEcNAcYglCBQwWHA4UUgSOGVoRdhDaECESSaoNbAiZjgSkcgRRaIjEHgQU6gQIBAQE X-IronPort-AV: E=Sophos;i="5.15,389,1432612800"; d="scan'208";a="223368918" Received: from nipigon.cs.uoguelph.ca (HELO zcs1.mail.uoguelph.ca) ([131.104.99.173]) by esa-annu.net.uoguelph.ca with ESMTP; 01 Jul 2015 20:10:32 -0400 Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 40A7F15F533; Wed, 1 Jul 2015 20:10:32 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id I98G6cuz1POf; Wed, 1 Jul 2015 20:10:31 -0400 (EDT) Received: from localhost (localhost [127.0.0.1]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 96F3915F54D; Wed, 1 Jul 2015 20:10:31 -0400 (EDT) X-Virus-Scanned: amavisd-new at zcs1.mail.uoguelph.ca Received: from zcs1.mail.uoguelph.ca ([127.0.0.1]) by localhost (zcs1.mail.uoguelph.ca [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id 3MarbI7qxJ9f; Wed, 1 Jul 2015 20:10:31 -0400 (EDT) Received: from zcs1.mail.uoguelph.ca (zcs1.mail.uoguelph.ca [172.17.95.18]) by zcs1.mail.uoguelph.ca (Postfix) with ESMTP id 7BDBD15F533; Wed, 1 Jul 2015 20:10:31 -0400 (EDT) Date: Wed, 1 Jul 2015 20:10:31 -0400 (EDT) From: Rick Macklem To: Graham Allan Cc: freebsd-fs@freebsd.org Message-ID: <972685551.2776991.1435795831472.JavaMail.zimbra@uoguelph.ca> In-Reply-To: <55946FFE.8070402@physics.umn.edu> References: <55946FFE.8070402@physics.umn.edu> Subject: Re: Strange NFS problem implicating nfsuserd? MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Originating-IP: [172.17.95.10] X-Mailer: Zimbra 8.0.9_GA_6191 (ZimbraWebClient - FF34 (Win)/8.0.9_GA_6191) Thread-Topic: Strange NFS problem implicating nfsuserd? Thread-Index: aL17OfM4OuGNE9XoCP/AZ+LPWwhKnw== X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Jul 2015 00:10:34 -0000 Graham Allan wrote: > I spent a few days digging into a strange NFSv4 problem at our site, > which I think I may have finally resolved but don't really understand why. > > We have a bunch of large-ish NFS servers running FreeBSD 9.3 exporting > ZFS filesystems to mostly "RHEL-clone" linux clients. Over the last few > weeks I started getting reports that peoples' jobs would fail > erratically with i/o errors, and it became apparent that they pointed in > general to all our FreeBSD NFS servers rather than just one. > > Ultimately I could trivially reproduce the problem running > "find . -type f -exec cat {} > /dev/null \;" > on one of the NFS-mounted filesystems. > > Linux clients would eventually error with "Input/output error" > FreeBSD clients would eventually error with "Permission denied" on files > or directories which should be readable. > > Reverting to earlier patch releases didn't make any difference, though > it seemed like the problem started roughly when I updated p8->p13. > > Finally I seem to have pinpointed it to one change made in rc.conf for > nfsuserd, which I committed at around the right date: > > nfsuserd_flags="-usermax 500 -usertimeout 600 16" > > became: > > nfsuserd_flags="-domain xxx.yyy.zzz -usermax 500 -usertimeout 600 16" > > probably because I saw a user mapping failure somewhere previously, and > decided to make the domain explicit. > > Undoing this change appears to eliminate the problem - but this makes no > sense to me. Starting nfsuserd with either set of options (adding > -verbose) prints the same output: > > Starting nfsuserd. > nfsuserd: domain=xxx.yyy.zzz usermax=500 usertimeout=36000 > > So the domain chosen by default is the same as the one explicitly > specified (as I would expect). > > I've reproduced this across 4-5 different servers and a similar number > of different client systems. I'm wondering if any plausible explanation > suggests itself? > As far as I know, the domain is only set when the nfsuserd is started and it just uses the domain part of the machine's host name if not explicitly defined by "-domain". Maybe there is some bug in nfsuserd.c that gets tickled by the option, although I just looked and the argument parsing looks ok. If your xxx.yyy.zzz is identical, then I can't see how this would affect anything. What will cause intermittent mapping problems is having more than one username that maps to the same uid. (One of them will be cached at random.) (There was a common case of both "root" and "toor" in the password database for uid == 0.) rick > Graham > -- > _______________________________________________ > freebsd-fs@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-fs > To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org" >