Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 14 May 2013 18:09:13 -0400 (EDT)
From:      Rick Macklem <rmacklem@uoguelph.ca>
To:        Hartmut Brandt <Hartmut.Brandt@dlr.de>
Cc:        current@freebsd.org
Subject:   Re: files disappearing from ls on NFS
Message-ID:  <1392815611.361000.1368569353890.JavaMail.root@erie.cs.uoguelph.ca>
In-Reply-To: <611243783F62AF48AFB07BC25FA4B1061C55188F@DLREXMBX01.intra.dlr.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Hartmut Brandt wrote:
> Hi Rick,
> 
> sorry for top-posting - this is Outlook :-(
> 
> Attached is the system configuration. I use this more or less
> unchanged since years. The machine is an 8-core AMD64 with 144GByte
> memory.
> 
> The nfsstats -m output for the two file systems I'm testing with is:
> 
> knopfs01:/OP_UserUnix on /home
> nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=6126856,timeout=120,retrans=2
> knopfs01:/op_software on /software
> nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=6126856,timeout=120,retrans=2
> 
> I did the tcpdump/wireshark thing and I'm puzzled that I see no
> readdir requests. I see a lookup, followed by getattr, access and
> fsstat for the directory and that's it. Looks that even after hours
> the stuff returned by getdirents(2) comes from the cache. I assume
> that the NFS client uses getattr to check whether
> the directory has changed? If I knew what happens when calling
> getdirents() I could add some debugging printfs() here and there to
> figure out...
> 
Yes. The NFS client will check the mtime on the directory to see if it has
changed and just use whatever is in the buffer cache otherwise.

Well, getdents() basically just calls kern_getdirentries() and it calls
VOP_READDIR() { which is called nfs_readdir() in the NFS clients }. 
nfs_readdir() calls ncl_bioread() to do the real work of finding the
buffer cache blocks and copying the data out of them.
One thing you might check via printf()s is whether the buffer cache
has the zero'd data in it before it copies it to userland.

Since you get valid data sometimes and partially zero'd out data others,
I haven't a clue what is going on. One other person reported a problem
when they used a small readdirsize, but it is hard to say they saw the
same thing and no one else seems to be seeing this, so I have no idea
what it might be.

I remember you started seeing this after an upgrade of current. Do you
happen to have dates (or rNNNNNN) for the old working verion vs the one that broke this?
(All I can think to do is scan the commits that seem to somehow relate
 to the buffer cache or copying to userland or ???)

rick

> harti
> 
> -----Original Message-----
> From: Rick Macklem [mailto:rmacklem@uoguelph.ca]
> Sent: Tuesday, May 14, 2013 2:50 PM
> To: Brandt, Hartmut
> Cc: current@freebsd.org
> Subject: Re: files disappearing from ls on NFS
> 
> Hartmut Brandt wrote:
> > On Mon, 13 May 2013, Rick Macklem wrote:
> >
> > RM>Hartmut Brandt wrote:
> > RM>> On Sun, 12 May 2013, Rick Macklem wrote:
> > RM>>
> > RM>> RM>Hartmut Brandt wrote:
> > RM>> RM>> Hi,
> > RM>> RM>>
> > RM>> RM>> I've updated one of my -current machines this week
> > (previous
> > RM>> update
> > RM>> RM>> was in
> > RM>> RM>> february). Now I see a strange effect (it seems only on
> > NFS
> > RM>> mounts):
> > RM>> RM>> ls or
> > RM>> RM>> even echo * will list only some files (strange enough the
> > first
> > RM>> files
> > RM>> RM>> from
> > RM>> RM>> the normal, alphabetically ordered list). If I change
> > something
> > RM>> in the
> > RM>> RM>> directory (delete a file or create a new one) for some
> > time
> > the
> > RM>> RM>> complete
> > RM>> RM>> listing will appear but after sime time (seconds to a
> > minute
> > or
> > RM>> so)
> > RM>> RM>> again
> > RM>> RM>> only part of the files is listed.
> > RM>> RM>>
> > RM>> RM>> A ktrace on ls /usr/src/lib/libc/gen shows that
> > getdirentries is
> > RM>> RM>> called
> > RM>> RM>> only once (returning 4096). For a full listing
> > getdirentries
> > is
> > RM>> called
> > RM>> RM>> 5
> > RM>> RM>> times with the last returning 0.
> > RM>> RM>>
> > RM>> RM>> I can still open files that are not listed if I know their
> > name,
> > RM>> RM>> though.
> > RM>> RM>>
> > RM>> RM>> The NFS server is a Windows 2008 server with an OpenText
> > NFS
> > RM>> Server
> > RM>> RM>> which
> > RM>> RM>> works without problems to all the other FreeBSD machines.
> > RM>> RM>>
> > RM>> RM>> So what could that be?
> > RM>> RM>>
> > RM>> RM>I've attached a patch that might be worth trying. It is a
> > "shot in
> > RM>> the dark",
> > RM>> RM>but brings the new NFS client's readdir closer to the old
> > one
> > RM>> (which you
> > RM>> RM>mentioned still works ok).
> > RM>> RM>
> > RM>> RM>Please let me know how it goes, if you have a chance to test
> > it,
> > RM>> rick
> > RM>>
> > RM>> Hi Rick,
> > RM>>
> > RM>> the patch doesn't help.
> > RM>>
> > RM>> I wrote a small test program, which opens a directory, calls
> > RM>> getdents(2)
> > RM>> in a loop and dumps that. I figured out, that the return of the
> > system
> > RM>> call depends on the buffer size I pass to it. The directory has
> > a
> > RM>> block size of 4k according to fstat(2). If I use that, I get
> > some
> > RM>> 300
> > of the
> > RM>> almost 500 directory entries. If I use 8k, I get just around
> > 200
> > and
> > RM>> if I
> > RM>> use 16k I get a handfull. If I dump the buffer in this case I
> > see
> > RM>> 0x200
> > RM>> bytes filled with directory entries, then a lot of zeros and
> > starting
> > RM>> from
> > RM>> 0x1000 again data. This is of course ignored because of the
> > zeros
> > RM>> before.
> > RM>>
> > RM>And for this case getdents(2) returned 16K? It is normal for
> > getdents(2)
> > RM>to return less than requested and when end of dir occurs, it
> > should
> > return 0.
> > RM>
> > RM>But if it returns 16K, there shouldn't be zeroed space in the
> > middle of
> > RM>it.
> > RM>
> > RM>And this always occurs or only after you wait a while? (You noted
> > in the
> > RM>above description that it would be ok for a little while after a
> > directory
> > RM>change and then would break, which suggests some kind of caching
> > problem.)
> >
> > Today in the morning everything was fine. After waiting 5 minutes,
> > again only partial directories. When I do a read with 8k buffer
> > size,
> > getdents(2) returns 8k, but starting from 0x200 until 0x1000 the
> > buffer is filled with zeros. The entry just before the zeroes ends
> > exactly at
> > 0x200
> > (that would be the first byte of the next entry) and at 0x1000 a new
> > entry starts. The rest of the buffer is fine. The next read returns
> > only 4k and seems to be fine - altough it contains some junk
> > non-zero
> > bytes in the padding.
> >
> Directory entries should never cross DIRBLKSIZ boundaries (512 or
> 0x200), so it makes sense that one ends at 0x200 and one starts at
> 0x1000. What doesn't make sense are the 0 bytes in between.
> 
> One difference between the old and new NFS clients, which the patch I
> sent you changed to the way the old one does it, is filling in the
> last block.
> The old NFS client just leaves the block short and depends on
> n_direofoffset to recognize it is the last block with b_resid
> indicating where it ends.
> For the new client (unless you've applied the patch I emailed you), it
> fills the rest of the last block in with "empty directories". This was
> in the OpenBSD code when I did the original NFSv4 stuff and port. I
> left it in, because I thought it might avoid problems if
> n_direofoffset was ever bogus. That is why there might be "different
> junk" at the end of the directory, but it shouldn't matter.
> 
> It almost sounds like something else is bzero()ing out part of the
> buffer cache block. Unless the directory has changed, the getdents()
> after 5 minutes would just return the same buffer cache block that was
> read in 5 minutes earlier (unless the buffer fell out of the cache and
> had to be re-read from the server, which would only happen if there
> was a lot of other file I/O going on during that 5minutes).
> 
> A couple of comments:
> - You can run "nfsstat -m" as root to see what the mount it actually
> configured to use. This might be worth looking at, to see if any
> of the values are "weird".
> - One other difference between the old and new NFS clients is the
> value of NFS_DIRBLKSIZ. For the new one, it is 8K instead of 4K.
> You could change this in fs/nfs/nfsport.h, where is is defined
> and then rebuild the sources to see if it has any effect. I can't
> see why it should matter, but??
> - Maybe you could post your system configuration. Someone might spot
> something that changed in Feb.->Mar. related to your hardware/setup?
> 
> > Ten minutes later again everything is fine. I tries to spy at the
> > NFS
> > communication with tcpdump, but it seems unwilling to display
> > something useful about the NFS. Is it able to decode the readdir
> > stuff?
> >
> To look at NFS packets you need wireshark. You can capture the packets
> with tcpdump using the -w option. Something like:
> # tcpdump -s 0 -w file.pcap host server
> - Then look at file.pcap in wireshark. (Often more convenient than
> installing wireshark on a particular machine.) If you'd like, you can
> email me the file.pcap and I can look at it.
> 
> rick
> 
> > harti
> >
> > _______________________________________________
> > freebsd-current@freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to
> > "freebsd-current-unsubscribe@freebsd.org"
> 
> _______________________________________________
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to
> "freebsd-current-unsubscribe@freebsd.org"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1392815611.361000.1368569353890.JavaMail.root>