Date: Tue, 28 May 2013 19:39:38 -0400 (EDT) From: Rick Macklem <rmacklem@uoguelph.ca> To: Hartmut Brandt <hartmut.brandt@dlr.de> Cc: current@freebsd.org Subject: Re: files disappearing from ls on NFS Message-ID: <969497820.24821.1369784378065.JavaMail.root@erie.cs.uoguelph.ca> In-Reply-To: <alpine.BSF.2.00.1305151032540.35662@KNOP-BEAGLE.kn.op.dlr.de>
next in thread | previous in thread | raw e-mail | index | archive | help
Hartmut Brandt wrote: > On Wed, 15 May 2013, Rick Macklem wrote: > > RM>Well, getdents() basically just calls kern_getdirentries() and it > calls > RM>VOP_READDIR() { which is called nfs_readdir() in the NFS clients }. > RM>nfs_readdir() calls ncl_bioread() to do the real work of finding > the > RM>buffer cache blocks and copying the data out of them. > RM>One thing you might check via printf()s is whether the buffer cache > RM>has the zero'd data in it before it copies it to userland. > > I now dump the data just before the call to vn_io_fault_iomove in > ncl_bioread(). So what I do: > > 1. reboot > 2. login > 3. ls > -> I see that it is moving 4 blocks 4k each to the user and they look > fine > 4. wait half an hour > 5. ls > -> now there is only one block, which contains zeros starting from > 0x200. > > Note that I don't do anything else on that machine during that time. > > RM>Since you get valid data sometimes and partially zero'd out data > others, > RM>I haven't a clue what is going on. One other person reported a > problem > RM>when they used a small readdirsize, but it is hard to say they saw > the > RM>same thing and no one else seems to be seeing this, so I have no > idea > RM>what it might be. > RM> > RM>I remember you started seeing this after an upgrade of current. Do > you > RM>happen to have dates (or rNNNNNN) for the old working verion vs the > one that broke this? > RM>(All I can think to do is scan the commits that seem to somehow > relate > RM> to the buffer cache or copying to userland or ???) > > It looks like I had copied the old kernel before installing the new > one > and it is from february 5th. There is no SVN revision in it - looks > like > that feature was added only recently. > > harti > Thanks to Hartmut's testing, a patch to fix this problem has just been committed to head as r251079. The problem was caused by vnode_pager_setsize() being called for directories (where the size reported by the server can be smaller than the size of the ufs-like directory created in the client from the RPCs XDR). r251079 will be MFC'd to stable/9 in 1 week if things go smoothly. You might see this problem for head kernels between r248567-r251078 and stable/9 kernels from r249078 (Apr. 4) until a week from now. Sorry for any inconvenience and thanks go to Hartmut for his help isolating this, rick > RM> > RM>rick > RM> > RM>> harti > RM>> > RM>> -----Original Message----- > RM>> From: Rick Macklem [mailto:rmacklem@uoguelph.ca] > RM>> Sent: Tuesday, May 14, 2013 2:50 PM > RM>> To: Brandt, Hartmut > RM>> Cc: current@freebsd.org > RM>> Subject: Re: files disappearing from ls on NFS > RM>> > RM>> Hartmut Brandt wrote: > RM>> > On Mon, 13 May 2013, Rick Macklem wrote: > RM>> > > RM>> > RM>Hartmut Brandt wrote: > RM>> > RM>> On Sun, 12 May 2013, Rick Macklem wrote: > RM>> > RM>> > RM>> > RM>> RM>Hartmut Brandt wrote: > RM>> > RM>> RM>> Hi, > RM>> > RM>> RM>> > RM>> > RM>> RM>> I've updated one of my -current machines this week > RM>> > (previous > RM>> > RM>> update > RM>> > RM>> RM>> was in > RM>> > RM>> RM>> february). Now I see a strange effect (it seems only > on > RM>> > NFS > RM>> > RM>> mounts): > RM>> > RM>> RM>> ls or > RM>> > RM>> RM>> even echo * will list only some files (strange enough > the > RM>> > first > RM>> > RM>> files > RM>> > RM>> RM>> from > RM>> > RM>> RM>> the normal, alphabetically ordered list). If I change > RM>> > something > RM>> > RM>> in the > RM>> > RM>> RM>> directory (delete a file or create a new one) for > some > RM>> > time > RM>> > the > RM>> > RM>> RM>> complete > RM>> > RM>> RM>> listing will appear but after sime time (seconds to a > RM>> > minute > RM>> > or > RM>> > RM>> so) > RM>> > RM>> RM>> again > RM>> > RM>> RM>> only part of the files is listed. > RM>> > RM>> RM>> > RM>> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that > RM>> > getdirentries is > RM>> > RM>> RM>> called > RM>> > RM>> RM>> only once (returning 4096). For a full listing > RM>> > getdirentries > RM>> > is > RM>> > RM>> called > RM>> > RM>> RM>> 5 > RM>> > RM>> RM>> times with the last returning 0. > RM>> > RM>> RM>> > RM>> > RM>> RM>> I can still open files that are not listed if I know > their > RM>> > name, > RM>> > RM>> RM>> though. > RM>> > RM>> RM>> > RM>> > RM>> RM>> The NFS server is a Windows 2008 server with an > OpenText > RM>> > NFS > RM>> > RM>> Server > RM>> > RM>> RM>> which > RM>> > RM>> RM>> works without problems to all the other FreeBSD > machines. > RM>> > RM>> RM>> > RM>> > RM>> RM>> So what could that be? > RM>> > RM>> RM>> > RM>> > RM>> RM>I've attached a patch that might be worth trying. It is > a > RM>> > "shot in > RM>> > RM>> the dark", > RM>> > RM>> RM>but brings the new NFS client's readdir closer to the > old > RM>> > one > RM>> > RM>> (which you > RM>> > RM>> RM>mentioned still works ok). > RM>> > RM>> RM> > RM>> > RM>> RM>Please let me know how it goes, if you have a chance to > test > RM>> > it, > RM>> > RM>> rick > RM>> > RM>> > RM>> > RM>> Hi Rick, > RM>> > RM>> > RM>> > RM>> the patch doesn't help. > RM>> > RM>> > RM>> > RM>> I wrote a small test program, which opens a directory, > calls > RM>> > RM>> getdents(2) > RM>> > RM>> in a loop and dumps that. I figured out, that the return > of the > RM>> > system > RM>> > RM>> call depends on the buffer size I pass to it. The > directory has > RM>> > a > RM>> > RM>> block size of 4k according to fstat(2). If I use that, I > get > RM>> > some > RM>> > RM>> 300 > RM>> > of the > RM>> > RM>> almost 500 directory entries. If I use 8k, I get just > around > RM>> > 200 > RM>> > and > RM>> > RM>> if I > RM>> > RM>> use 16k I get a handfull. If I dump the buffer in this > case I > RM>> > see > RM>> > RM>> 0x200 > RM>> > RM>> bytes filled with directory entries, then a lot of zeros > and > RM>> > starting > RM>> > RM>> from > RM>> > RM>> 0x1000 again data. This is of course ignored because of > the > RM>> > zeros > RM>> > RM>> before. > RM>> > RM>> > RM>> > RM>And for this case getdents(2) returned 16K? It is normal for > RM>> > getdents(2) > RM>> > RM>to return less than requested and when end of dir occurs, it > RM>> > should > RM>> > return 0. > RM>> > RM> > RM>> > RM>But if it returns 16K, there shouldn't be zeroed space in > the > RM>> > middle of > RM>> > RM>it. > RM>> > RM> > RM>> > RM>And this always occurs or only after you wait a while? (You > noted > RM>> > in the > RM>> > RM>above description that it would be ok for a little while > after a > RM>> > directory > RM>> > RM>change and then would break, which suggests some kind of > caching > RM>> > problem.) > RM>> > > RM>> > Today in the morning everything was fine. After waiting 5 > minutes, > RM>> > again only partial directories. When I do a read with 8k buffer > RM>> > size, > RM>> > getdents(2) returns 8k, but starting from 0x200 until 0x1000 > the > RM>> > buffer is filled with zeros. The entry just before the zeroes > ends > RM>> > exactly at > RM>> > 0x200 > RM>> > (that would be the first byte of the next entry) and at 0x1000 > a new > RM>> > entry starts. The rest of the buffer is fine. The next read > returns > RM>> > only 4k and seems to be fine - altough it contains some junk > RM>> > non-zero > RM>> > bytes in the padding. > RM>> > > RM>> Directory entries should never cross DIRBLKSIZ boundaries (512 or > RM>> 0x200), so it makes sense that one ends at 0x200 and one starts > at > RM>> 0x1000. What doesn't make sense are the 0 bytes in between. > RM>> > RM>> One difference between the old and new NFS clients, which the > patch I > RM>> sent you changed to the way the old one does it, is filling in > the > RM>> last block. > RM>> The old NFS client just leaves the block short and depends on > RM>> n_direofoffset to recognize it is the last block with b_resid > RM>> indicating where it ends. > RM>> For the new client (unless you've applied the patch I emailed > you), it > RM>> fills the rest of the last block in with "empty directories". > This was > RM>> in the OpenBSD code when I did the original NFSv4 stuff and port. > I > RM>> left it in, because I thought it might avoid problems if > RM>> n_direofoffset was ever bogus. That is why there might be > "different > RM>> junk" at the end of the directory, but it shouldn't matter. > RM>> > RM>> It almost sounds like something else is bzero()ing out part of > the > RM>> buffer cache block. Unless the directory has changed, the > getdents() > RM>> after 5 minutes would just return the same buffer cache block > that was > RM>> read in 5 minutes earlier (unless the buffer fell out of the > cache and > RM>> had to be re-read from the server, which would only happen if > there > RM>> was a lot of other file I/O going on during that 5minutes). > RM>> > RM>> A couple of comments: > RM>> - You can run "nfsstat -m" as root to see what the mount it > actually > RM>> configured to use. This might be worth looking at, to see if any > RM>> of the values are "weird". > RM>> - One other difference between the old and new NFS clients is the > RM>> value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K. > RM>> You could change this in fs/nfs/nfsport.h, where is is defined > RM>> and then rebuild the sources to see if it has any effect. I can't > RM>> see why it should matter, but?? > RM>> - Maybe you could post your system configuration. Someone might > spot > RM>> something that changed in Feb.->Mar. related to your > hardware/setup? > RM>> > RM>> > Ten minutes later again everything is fine. I tries to spy at > the > RM>> > NFS > RM>> > communication with tcpdump, but it seems unwilling to display > RM>> > something useful about the NFS. Is it able to decode the > readdir > RM>> > stuff? > RM>> > > RM>> To look at NFS packets you need wireshark. You can capture the > packets > RM>> with tcpdump using the -w option. Something like: > RM>> # tcpdump -s 0 -w file.pcap host server > RM>> - Then look at file.pcap in wireshark. (Often more convenient > than > RM>> installing wireshark on a particular machine.) If you'd like, you > can > RM>> email me the file.pcap and I can look at it. > RM>> > RM>> rick > RM>> > RM>> > harti > RM>> > > RM>> > _______________________________________________ > RM>> > freebsd-current@freebsd.org mailing list > RM>> > http://lists.freebsd.org/mailman/listinfo/freebsd-current > RM>> > To unsubscribe, send any mail to > RM>> > "freebsd-current-unsubscribe@freebsd.org" > RM>> > RM>> _______________________________________________ > RM>> freebsd-current@freebsd.org mailing list > RM>> http://lists.freebsd.org/mailman/listinfo/freebsd-current > RM>> To unsubscribe, send any mail to > RM>> "freebsd-current-unsubscribe@freebsd.org" > RM> > _______________________________________________ > freebsd-current@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-current > To unsubscribe, send any mail to > "freebsd-current-unsubscribe@freebsd.org"
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?969497820.24821.1369784378065.JavaMail.root>