From owner-freebsd-net@FreeBSD.ORG  Thu Jul  8 08:11:42 2010
Return-Path: <owner-freebsd-net@FreeBSD.ORG>
Delivered-To: freebsd-net@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 605F11065674
	for <freebsd-net@freebsd.org>; Thu,  8 Jul 2010 08:11:42 +0000 (UTC)
	(envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
	by mx1.freebsd.org (Postfix) with ESMTP id 975E28FC1F
	for <freebsd-net@freebsd.org>; Thu,  8 Jul 2010 08:11:41 +0000 (UTC)
Received: (qmail 33874 invoked from network); 8 Jul 2010 06:47:56 -0000
Received: from localhost (HELO [127.0.0.1]) ([127.0.0.1])
	(envelope-sender <andre@freebsd.org>)
	by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
	for <kostikbel@gmail.com>; 8 Jul 2010 06:47:56 -0000
Message-ID: <4C358843.5000001@freebsd.org>
Date: Thu, 08 Jul 2010 10:11:47 +0200
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
	rv:1.9.1.10) Gecko/20100512 Thunderbird/3.0.5
MIME-Version: 1.0
To: Kostik Belousov <kostikbel@gmail.com>
References: <7C3D15DD6E8F464998CA1470D8A322F302BB9F72@ES02CO.wgti.net>
	<20100707205041.GO13238@deviant.kiev.zoral.com.ua>
In-Reply-To: <20100707205041.GO13238@deviant.kiev.zoral.com.ua>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-net@freebsd.org, bz@freebsd.org, Ming Fu <Ming.Fu@watchguard.com>,
	lstewart@freebsd.org
Subject: Re: kern/123095 kern/131602 sendfile
X-BeenThere: freebsd-net@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Networking and TCP/IP with FreeBSD <freebsd-net.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-net>
List-Post: <mailto:freebsd-net@freebsd.org>
List-Help: <mailto:freebsd-net-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-net>,
	<mailto:freebsd-net-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jul 2010 08:11:42 -0000

On 07.07.2010 22:50, Kostik Belousov wrote:
> On Wed, Jul 07, 2010 at 10:24:41AM -0700, Ming Fu wrote:
>> Hi,
>>
>>
>> I was trying to use sendfile and hit with problem very similar to the
>> 123095 and 131602.
>> It seems that when the file is large enough (in megs), the file can be
>> corrupted even if it is open read-only and exist on disk as read-only
>> file, though the filesystem is mounted read-write.
>>
>> I have a small program to reliably reproduce the problem.
>>
>> ---------- corrupt.c -----------------
>>
>> #include<sys/types.h>
>> #include<sys/socket.h>
>> #include<sys/uio.h>
>> #include<fcntl.h>
>> #include<netinet/in.h>
>> #include<sys/select.h>
>> #include<sys/stat.h>
>> #include<strings.h>
>> #include<stdio.h>
>> #include<err.h>
>> main () {
>>          int s, f;
>>          struct sockaddr_in addr;
>>          int flags;
>>          char str[32]="\r\n800\r\n";
>>          char *p = str;
>>          struct stat sb;
>>          int n;
>>          fd_set wset;
>>          int64_t size;
>>          off_t sbytes;
>>          off_t sent = 0;
>>          int chunk;
>>
>>          s = socket(AF_INET, SOCK_STREAM, 0);
>>          bzero(&addr, sizeof(addr));
>>          addr.sin_family = AF_INET;
>>          addr.sin_port = htons(7000);
>>          addr.sin_addr.s_addr = inet_addr("10.1.19.16");
>>
>>          n = connect(s, (struct sockaddr *)&addr, sizeof (addr));
>>          if (n<  0)
>>                  warn ("fail to connect");
>>          flags = fcntl(s, F_GETFL);
>>          flags |= O_NONBLOCK;
>>          fcntl(s, F_SETFL);
>>
>>          f = open("large", O_RDONLY);
>>          if (f<0)
>>                  warn("fail to open file");
>>          n = fstat(f,&sb);
>>          if (n<0)
>>                  warn("fstat failed");
>>
>>          size = sb.st_size;
>>          chunk = 0;
>>          while (size>  0) {
>>                  FD_ZERO(&wset);
>>                  FD_SET(s,&wset);
>>                  n = select(f+1, NULL,&wset, NULL, NULL);
>>                  if (n<  0)
>>                          continue;
>>                  if (chunk>  0) {
>>                          sbytes = 0;
>>                          n = sendfile(f, s, sent, chunk, NULL,&sbytes,
>> 0);
>>                          if (n<  0)
>>                                  continue;
>>                          chunk -= sbytes;
>>                          size -= sbytes;
>>                          sent += sbytes;
>>                          continue;
>>                  }
>>                  if (size>  2048)
>>                          chunk = 2048;
>>                  else
>>                          chunk = size;
>>                  n = sprintf(str, "\r\n%x\r\n", 2048);
>>                  p = str;
>>                  write(s, p, n);
>>          }
>> }
>>
>> ------------- end ---------------------------------------------
>>
>> Run nc to receive the sendfile
>> $ nc -l 7000
>>
>> Copy a large from for sendfile to send
>> $ cp /usr/lib/libc_pic.a large
>>
>> $ md5 large
>> MD5 (large) = 252def82f9d75df11df7123e9fd376f6
>>
>> $ cc -o co corrupt.c
>> $./co
>> $ md5 large
>> MD5 (large) = 81ee84e55f4611434459f637c83b892e
>>
>> I run this on 8.0-RELEASE. The same happens on 7.2 and 6.3. The disk are
>> SATA ide. I run all these command under unprivileged user account. I
>> also run the same program on several different hardware, the result is
>> the same. Although the corrupted file is not the same. The corruption
>> looks random to me.
>>
>> I know a bit of the network side of FreeBSD kernel code, but the I have
>> no idea how the filesystem side work. I can dig a bit further if someone
>> give me a hint as where to look.
>
> Right, thank you for the easy way to reproduce it. I was able to
> trigger this as well. At the
> http://people.freebsd.org/~kib/vm/sf_buf_readonly.patch
> is the patch to map some sf buffers
> readonly, in particular, for the pages that are mapped by sendfile(2).
>
> Sure enough, it triggered the panic immediately, with backtrace
> bcopy
> sbappendstream_locked
> tcp_do_segment
> tcp_input
> ip_input
> swi_net
> (swi because I tested over loopback). To be clear: the backtrace above
> points to the code path that causes modifications to the file object
> pages inserted (?) into the mbuf that are supposed to be immutable.
>
> Any help from tcp-clueful people is appreciated.

In this context the loopback case is special because the mbuf stays intact
an is directly used for input in the receiver.  The panic looks to be caused
by sbcompress() which tries to optimize mbuf and memory use in the receive
buffer.  It may get something wrong with sf buffers.  Can you give a more
complete backtrace?

Can you check whether your patch fixes the bug when you go over a real
network?

-- 
Andre