Date: Sat, 6 Jul 2002 15:54:19 -0700 (PDT) From: John Polstra <jdp@polstra.com> To: stable@freebsd.org Subject: Re: NFS errors at high hz values with TCP mounts Message-ID: <200207062254.g66MsJPj000565@vashon.polstra.com> In-Reply-To: <200207062205.g66M5n1X000473@vashon.polstra.com> References: <XFMail.20020706145448.jdp@polstra.com> <200207062205.g66M5n1X000473@vashon.polstra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
In article <200207062205.g66M5n1X000473@vashon.polstra.com>,
John Polstra <jdp@polstra.com> wrote:
> In article <XFMail.20020706145448.jdp@polstra.com>,
> John Polstra <jdp@polstra.com> wrote:
> > Here's what happens when I try to copy a 512 kbyte file from the
> > hz=10000 client to a server that is NFS-mounted:
> >
> > thin$ dd if=/dev/zero of=/mnt/foo count=1000
> > dd: /mnt/foo: Resource temporarily unavailable
> > 61+0 records in
> > 60+0 records out
> > 30720 bytes transferred in 0.000996 secs (30843571 bytes/sec)
>
> I forget to mention that this message appears in the dmesg output on
> the client machine:
>
> nfs send error 35 for server strings:/usr/home/jdp
>
> It comes from sys/nfs/nfs_socket.c line 499.
Sorry for the extended conversation with myself. :-) I think I
found the bug. In nfs_connect() at line 300 of sys/nfs/nfs_socket.c
we have this code:
so->so_rcv.sb_timeo = (5 * hz);
so->so_snd.sb_timeo = (5 * hz);
But sb_timeo has type "short", which overflows when hz is 10000.
This is in struct sockbuf. I don't think it would break binary
compatibility with existing 3rd party modules to change it to type
"long". Are there any contrary opinions?
John
--
John Polstra
John D. Polstra & Co., Inc. Seattle, Washington USA
"Disappointment is a good sign of basic intelligence." -- Chögyam Trungpa
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200207062254.g66MsJPj000565>
