From owner-freebsd-stable@freebsd.org Mon Jun 12 19:28:57 2017 Return-Path: Delivered-To: freebsd-stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id B92BBC0AD69 for ; Mon, 12 Jun 2017 19:28:57 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mailman.ysv.freebsd.org (mailman.ysv.freebsd.org [IPv6:2001:1900:2254:206a::50:5]) by mx1.freebsd.org (Postfix) with ESMTP id 981877ADF3 for ; Mon, 12 Jun 2017 19:28:57 +0000 (UTC) (envelope-from delphij@gmail.com) Received: by mailman.ysv.freebsd.org (Postfix) id 9753CC0AD67; Mon, 12 Jun 2017 19:28:57 +0000 (UTC) Delivered-To: stable@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 94CC4C0AD66; Mon, 12 Jun 2017 19:28:57 +0000 (UTC) (envelope-from delphij@gmail.com) Received: from mail-it0-x22d.google.com (mail-it0-x22d.google.com [IPv6:2607:f8b0:4001:c0b::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 5E5C37ADF2; Mon, 12 Jun 2017 19:28:57 +0000 (UTC) (envelope-from delphij@gmail.com) Received: by mail-it0-x22d.google.com with SMTP id m47so28648448iti.1; Mon, 12 Jun 2017 12:28:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=cmM24RUV1w6UMVKTm5mewV7c4IX4CYv9ueJxMFkD5Ig=; b=hr8H+E0WUcwMFgae7syB87jd0bYxnREnur7yPYphOGQjL03OLN+1/DQG8EQAaD7foH q8AZhmvVf12/Ak6yIjg1ii7Xr1+FYZ+8htXJwFDDPnQ5TpH5YJdKlQIrQjGlPIbaenK1 UgoQXdfil/dPQ7ixd++v+7LK7WLXLwJFM5X4uB42RqQlY+ifncMgWYc8E6+LJiq6EKIJ 93fAQoPr6/VRR9K8rt6XI4ZC9ChFGD49ACLz3PtKzWCOsMvqTTvTnkEx9NI9Wa424wtz nZN2/0uV9IsmOmCbVECbIaPEdgN/sMl6VTp8VszPdJEo2eFc5WaUoJJLsvRN4y004Fs5 FqAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=cmM24RUV1w6UMVKTm5mewV7c4IX4CYv9ueJxMFkD5Ig=; b=sximuJ9+YDEa/D4tx5AAG6xG2hugZdso5ACiliq80qaB3P8MGCaGP2uzgHdA9CY+v0 TakIzSa9ZlGgpXrdKOSRHhLruQqHoNMnDwB2E+rxKDQrSXLvWU28dD3Mg6WV93lhslsw oWPI2r/nTTzVOcMTXWCLOPAUXinQYYDklbjD8VOKvxIPG/rBlb7TLRrB6COIGEetMYkI J6dDeMiR6BI2/DdyxjJBdJJhUX6RdSMtWNSEOQyY/1EFMOIKWJYSoEXL5vGYwJPcJb1o q1uKWJ8lJ/8zLgljb/Bt4aMsiq2d+oFLTa9XsTQftabdtYHaok0K/kjAqV3/bnEUPwJg OuIQ== X-Gm-Message-State: AODbwcCG5GjOyVewTZ4kGUpvhEsQt/8G5kmSFl7rGi0HHLUoM+5c5j2/ ehyc/vJofy6hJN4zRYGJ5AXvcRHGk1rbeU8= X-Received: by 10.36.29.150 with SMTP id 144mr13071745itj.71.1497295736455; Mon, 12 Jun 2017 12:28:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.33.199 with HTTP; Mon, 12 Jun 2017 12:28:55 -0700 (PDT) In-Reply-To: <2474735.4VjKMe5DLv@ralph.baldwin.cx> References: <20170611172022.GA3184@albert.catwhisker.org> <2474735.4VjKMe5DLv@ralph.baldwin.cx> From: Xin LI Date: Mon, 12 Jun 2017 12:28:55 -0700 Message-ID: Subject: Re: post ino64: lockd no runs? To: John Baldwin Cc: FreeBSD Current , stable@freebsd.org Content-Type: text/plain; charset="UTF-8" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jun 2017 19:28:57 -0000 On Mon, Jun 12, 2017 at 10:14 AM, John Baldwin wrote: > On Sunday, June 11, 2017 11:12:25 AM David Wolfskill wrote: >> On Sun, Jun 04, 2017 at 08:57:44AM -0400, Michael Butler wrote: >> > It seems that {rpc.}lockd no longer runs after the ino64 changes on any >> > of my systems after a full rebuild of src and ports. No log entries >> > offer any insight as to why :-( >> > >> > imb >> >> I don't tend to use NFS on my systems that are running head, so I >> haven't had occasion to test this as stated. >> >> However, I just completed my weekly update of the "prooduction" systems >> here at home, running stable/11. And I find that lockd seems to be ... >> claiming that all is well, but declining to run (for long). >> >> To the best of my knowledge, that was not the case until this last >> update, which was from: >> >> FreeBSD albert.catwhisker.org 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #316 r319566M/319569:1100514: Sun Jun 4 03:54:41 PDT 2017 root@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT amd64 >> >> to >> >> FreeBSD albert.catwhisker.org 11.1-BETA1 FreeBSD 11.1-BETA1 #322 r319823M/319823:1100514: Sun Jun 11 03:56:10 PDT 2017 root@freebeast.catwhisker.org:/common/S1/obj/usr/src/sys/ALBERT amd64 >> >> The "glaringly obvious" symptom in my case is that I am now unable >> to (directly) save an email message from within mutt(1) by appending >> it to an NFS-resident file. (Saving it to a local file, then using >> cat(1) to append that to the NFS- resident file & removing the local >> copy works....) >> >> After a few variations on a theme of: >> >> albert(11.1)[5] sudo service lockd restart >> lockd not running? >> Starting lockd. >> albert(11.1)[6] echo $? >> 0 >> albert(11.1)[7] service lockd status >> lockd is not running. >> >> I finally(!) thought to ask ktrace what's going on (as tailing >> /var/log/messages was completely unproductive, even after enabling >> rc_debug). >> >> So I tried: "sudo ktrace -di service lockd restart"; upon exanimation of >> the output of kdump(1), I see that the trace ends with: >> >> ... >> 2811 rpc.lockd NAMI "/var/run/logpriv" >> 2786 sh CALL read(0xa,0x627fc0,0x400) >> 2786 sh GIO fd 10 read 0 bytes >> "" >> 2811 rpc.lockd RET connect 0 >> 2786 sh RET read 0 >> 2811 rpc.lockd CALL sendto(0x3,0x7fffffffe2c0,0x27,0,0,0) >> 2786 sh CALL exit(0) >> 2811 rpc.lockd GIO fd 3 wrote 39 bytes >> "<30>Jun 11 15:43:10 rpc.lockd: Starting" >> 2811 rpc.lockd RET sendto 39/0x27 >> 2811 rpc.lockd CALL sigaction(SIGALRM,0x7fffffffec20,0) >> 2811 rpc.lockd RET sigaction 0 >> 2811 rpc.lockd CALL nlm_syscall(0,0x1e,0x4,0x801015040) >> 2811 rpc.lockd RET nlm_syscall -1 errno 14 Bad address > > This is a really good clue. nlm_syscall is dying with EFAULT. The last > argument is a pointer to an array of char * pointers, and the only way > I can see it dying is if it fails to copyin() one of the strings pointed > to by those pointers. You could try running rpc.lockd under gdb from > ports and setting a breakpoint on 'nlm_syscall' and then printing out > 'addr_count' and 'p addrs@(addr_count * 2)'. Yes, I found that the kernel was trying to copyin() from NULL, and then found that corresponds to 'uaddr'. After some tracing I found that the tightened condition for taddr2uaddr have enforced (correctly) buffer length passed from caller, which was not set correctly since ~9 years ago (r177633, which sets the size to sizeof(pointer)) but never gets noticed because there is no check on that, so the solution seems to be to correctly set the length values to (allocated size), and that have fixed the issue for me. The code could use some cleanups and I plan to do it at some later time. > Unfortunately I'm not able to reproduce the failure on a test machine > I have running head post-ino64. This should have been fixed by r319852 in -HEAD ( https://svnweb.freebsd.org/base?view=revision&revision=319852 ), and I'll MFC the change after 3 days' settle assuming there is no objections, as this is a regression. Cheers,