From owner-freebsd-current@FreeBSD.ORG Wed May 15 08:38:50 2013 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.FreeBSD.org [8.8.178.115]) by hub.freebsd.org (Postfix) with ESMTP id AC62CD32 for ; Wed, 15 May 2013 08:38:50 +0000 (UTC) (envelope-from Hartmut.Brandt@dlr.de) Received: from mailhost.dlr.de (mailhost.dlr.de [129.247.252.33]) by mx1.freebsd.org (Postfix) with ESMTP id ECA098ED for ; Wed, 15 May 2013 08:38:49 +0000 (UTC) Received: from DLREXHUB01.intra.dlr.de (172.21.152.130) by dlrexedge02.dlr.de (172.21.163.101) with Microsoft SMTP Server (TLS) id 14.2.328.9; Wed, 15 May 2013 10:38:40 +0200 Received: from KNOP-BEAGLE.kn.op.dlr.de (129.247.178.136) by smtp.dlr.de (172.21.152.151) with Microsoft SMTP Server (TLS) id 14.2.328.9; Wed, 15 May 2013 10:38:42 +0200 Date: Wed, 15 May 2013 10:38:38 +0200 From: Hartmut Brandt X-X-Sender: brandt_h@KNOP-BEAGLE.kn.op.dlr.de To: Rick Macklem Subject: Re: files disappearing from ls on NFS In-Reply-To: <1392815611.361000.1368569353890.JavaMail.root@erie.cs.uoguelph.ca> Message-ID: References: <1392815611.361000.1368569353890.JavaMail.root@erie.cs.uoguelph.ca> User-Agent: Alpine 2.00 (BSF 1167 2008-08-23) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" X-Originating-IP: [129.247.178.136] Cc: current@freebsd.org X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 15 May 2013 08:38:50 -0000 On Wed, 15 May 2013, Rick Macklem wrote: RM>Well, getdents() basically just calls kern_getdirentries() and it calls RM>VOP_READDIR() { which is called nfs_readdir() in the NFS clients }. RM>nfs_readdir() calls ncl_bioread() to do the real work of finding the RM>buffer cache blocks and copying the data out of them. RM>One thing you might check via printf()s is whether the buffer cache RM>has the zero'd data in it before it copies it to userland. I now dump the data just before the call to vn_io_fault_iomove in ncl_bioread(). So what I do: 1. reboot 2. login 3. ls -> I see that it is moving 4 blocks 4k each to the user and they look fine 4. wait half an hour 5. ls -> now there is only one block, which contains zeros starting from 0x200. Note that I don't do anything else on that machine during that time. RM>Since you get valid data sometimes and partially zero'd out data others, RM>I haven't a clue what is going on. One other person reported a problem RM>when they used a small readdirsize, but it is hard to say they saw the RM>same thing and no one else seems to be seeing this, so I have no idea RM>what it might be. RM> RM>I remember you started seeing this after an upgrade of current. Do you RM>happen to have dates (or rNNNNNN) for the old working verion vs the one that broke this? RM>(All I can think to do is scan the commits that seem to somehow relate RM> to the buffer cache or copying to userland or ???) It looks like I had copied the old kernel before installing the new one and it is from february 5th. There is no SVN revision in it - looks like that feature was added only recently. harti RM> RM>rick RM> RM>> harti RM>> RM>> -----Original Message----- RM>> From: Rick Macklem [mailto:rmacklem@uoguelph.ca] RM>> Sent: Tuesday, May 14, 2013 2:50 PM RM>> To: Brandt, Hartmut RM>> Cc: current@freebsd.org RM>> Subject: Re: files disappearing from ls on NFS RM>> RM>> Hartmut Brandt wrote: RM>> > On Mon, 13 May 2013, Rick Macklem wrote: RM>> > RM>> > RM>Hartmut Brandt wrote: RM>> > RM>> On Sun, 12 May 2013, Rick Macklem wrote: RM>> > RM>> RM>> > RM>> RM>Hartmut Brandt wrote: RM>> > RM>> RM>> Hi, RM>> > RM>> RM>> RM>> > RM>> RM>> I've updated one of my -current machines this week RM>> > (previous RM>> > RM>> update RM>> > RM>> RM>> was in RM>> > RM>> RM>> february). Now I see a strange effect (it seems only on RM>> > NFS RM>> > RM>> mounts): RM>> > RM>> RM>> ls or RM>> > RM>> RM>> even echo * will list only some files (strange enough the RM>> > first RM>> > RM>> files RM>> > RM>> RM>> from RM>> > RM>> RM>> the normal, alphabetically ordered list). If I change RM>> > something RM>> > RM>> in the RM>> > RM>> RM>> directory (delete a file or create a new one) for some RM>> > time RM>> > the RM>> > RM>> RM>> complete RM>> > RM>> RM>> listing will appear but after sime time (seconds to a RM>> > minute RM>> > or RM>> > RM>> so) RM>> > RM>> RM>> again RM>> > RM>> RM>> only part of the files is listed. RM>> > RM>> RM>> RM>> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that RM>> > getdirentries is RM>> > RM>> RM>> called RM>> > RM>> RM>> only once (returning 4096). For a full listing RM>> > getdirentries RM>> > is RM>> > RM>> called RM>> > RM>> RM>> 5 RM>> > RM>> RM>> times with the last returning 0. RM>> > RM>> RM>> RM>> > RM>> RM>> I can still open files that are not listed if I know their RM>> > name, RM>> > RM>> RM>> though. RM>> > RM>> RM>> RM>> > RM>> RM>> The NFS server is a Windows 2008 server with an OpenText RM>> > NFS RM>> > RM>> Server RM>> > RM>> RM>> which RM>> > RM>> RM>> works without problems to all the other FreeBSD machines. RM>> > RM>> RM>> RM>> > RM>> RM>> So what could that be? RM>> > RM>> RM>> RM>> > RM>> RM>I've attached a patch that might be worth trying. It is a RM>> > "shot in RM>> > RM>> the dark", RM>> > RM>> RM>but brings the new NFS client's readdir closer to the old RM>> > one RM>> > RM>> (which you RM>> > RM>> RM>mentioned still works ok). RM>> > RM>> RM> RM>> > RM>> RM>Please let me know how it goes, if you have a chance to test RM>> > it, RM>> > RM>> rick RM>> > RM>> RM>> > RM>> Hi Rick, RM>> > RM>> RM>> > RM>> the patch doesn't help. RM>> > RM>> RM>> > RM>> I wrote a small test program, which opens a directory, calls RM>> > RM>> getdents(2) RM>> > RM>> in a loop and dumps that. I figured out, that the return of the RM>> > system RM>> > RM>> call depends on the buffer size I pass to it. The directory has RM>> > a RM>> > RM>> block size of 4k according to fstat(2). If I use that, I get RM>> > some RM>> > RM>> 300 RM>> > of the RM>> > RM>> almost 500 directory entries. If I use 8k, I get just around RM>> > 200 RM>> > and RM>> > RM>> if I RM>> > RM>> use 16k I get a handfull. If I dump the buffer in this case I RM>> > see RM>> > RM>> 0x200 RM>> > RM>> bytes filled with directory entries, then a lot of zeros and RM>> > starting RM>> > RM>> from RM>> > RM>> 0x1000 again data. This is of course ignored because of the RM>> > zeros RM>> > RM>> before. RM>> > RM>> RM>> > RM>And for this case getdents(2) returned 16K? It is normal for RM>> > getdents(2) RM>> > RM>to return less than requested and when end of dir occurs, it RM>> > should RM>> > return 0. RM>> > RM> RM>> > RM>But if it returns 16K, there shouldn't be zeroed space in the RM>> > middle of RM>> > RM>it. RM>> > RM> RM>> > RM>And this always occurs or only after you wait a while? (You noted RM>> > in the RM>> > RM>above description that it would be ok for a little while after a RM>> > directory RM>> > RM>change and then would break, which suggests some kind of caching RM>> > problem.) RM>> > RM>> > Today in the morning everything was fine. After waiting 5 minutes, RM>> > again only partial directories. When I do a read with 8k buffer RM>> > size, RM>> > getdents(2) returns 8k, but starting from 0x200 until 0x1000 the RM>> > buffer is filled with zeros. The entry just before the zeroes ends RM>> > exactly at RM>> > 0x200 RM>> > (that would be the first byte of the next entry) and at 0x1000 a new RM>> > entry starts. The rest of the buffer is fine. The next read returns RM>> > only 4k and seems to be fine - altough it contains some junk RM>> > non-zero RM>> > bytes in the padding. RM>> > RM>> Directory entries should never cross DIRBLKSIZ boundaries (512 or RM>> 0x200), so it makes sense that one ends at 0x200 and one starts at RM>> 0x1000. What doesn't make sense are the 0 bytes in between. RM>> RM>> One difference between the old and new NFS clients, which the patch I RM>> sent you changed to the way the old one does it, is filling in the RM>> last block. RM>> The old NFS client just leaves the block short and depends on RM>> n_direofoffset to recognize it is the last block with b_resid RM>> indicating where it ends. RM>> For the new client (unless you've applied the patch I emailed you), it RM>> fills the rest of the last block in with "empty directories". This was RM>> in the OpenBSD code when I did the original NFSv4 stuff and port. I RM>> left it in, because I thought it might avoid problems if RM>> n_direofoffset was ever bogus. That is why there might be "different RM>> junk" at the end of the directory, but it shouldn't matter. RM>> RM>> It almost sounds like something else is bzero()ing out part of the RM>> buffer cache block. Unless the directory has changed, the getdents() RM>> after 5 minutes would just return the same buffer cache block that was RM>> read in 5 minutes earlier (unless the buffer fell out of the cache and RM>> had to be re-read from the server, which would only happen if there RM>> was a lot of other file I/O going on during that 5minutes). RM>> RM>> A couple of comments: RM>> - You can run "nfsstat -m" as root to see what the mount it actually RM>> configured to use. This might be worth looking at, to see if any RM>> of the values are "weird". RM>> - One other difference between the old and new NFS clients is the RM>> value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K. RM>> You could change this in fs/nfs/nfsport.h, where is is defined RM>> and then rebuild the sources to see if it has any effect. I can't RM>> see why it should matter, but?? RM>> - Maybe you could post your system configuration. Someone might spot RM>> something that changed in Feb.->Mar. related to your hardware/setup? RM>> RM>> > Ten minutes later again everything is fine. I tries to spy at the RM>> > NFS RM>> > communication with tcpdump, but it seems unwilling to display RM>> > something useful about the NFS. Is it able to decode the readdir RM>> > stuff? RM>> > RM>> To look at NFS packets you need wireshark. You can capture the packets RM>> with tcpdump using the -w option. Something like: RM>> # tcpdump -s 0 -w file.pcap host server RM>> - Then look at file.pcap in wireshark. (Often more convenient than RM>> installing wireshark on a particular machine.) If you'd like, you can RM>> email me the file.pcap and I can look at it. RM>> RM>> rick RM>> RM>> > harti RM>> > RM>> > _______________________________________________ RM>> > freebsd-current@freebsd.org mailing list RM>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current RM>> > To unsubscribe, send any mail to RM>> > "freebsd-current-unsubscribe@freebsd.org" RM>> RM>> _______________________________________________ RM>> freebsd-current@freebsd.org mailing list RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current RM>> To unsubscribe, send any mail to RM>> "freebsd-current-unsubscribe@freebsd.org" RM>