From owner-freebsd-stable Wed May 24 16: 1:41 2000 Delivered-To: freebsd-stable@freebsd.org Received: from chmls06.mediaone.net (chmls06.mediaone.net [24.128.1.71]) by hub.freebsd.org (Postfix) with ESMTP id 92C3737B86B for ; Wed, 24 May 2000 16:01:35 -0700 (PDT) (envelope-from davep@who.net) Received: from h0000f806dfda.ne.mediaone.net (h0000f806dfda.ne.mediaone.net [24.147.250.67]) by chmls06.mediaone.net (8.8.7/8.8.7) with ESMTP id TAA06967 for ; Wed, 24 May 2000 19:01:32 -0400 (EDT) Received: from h0000f806dfda.ne.mediaone.net (localhost [127.0.0.1]) by h0000f806dfda.ne.mediaone.net (8.9.3/8.9.3) with ESMTP id TAA01371 for ; Wed, 24 May 2000 19:01:32 -0400 (EDT) (envelope-from davep@who.net) Message-Id: <200005242301.TAA01371@h0000f806dfda.ne.mediaone.net> To: freebsd-stable@FreeBSD.ORG Subject: netinet bug in RELENG_4 Reply-To: "David A. Panariti" X-Attribution: davep Date: Wed, 24 May 2000 19:01:32 -0400 From: "David A. Panariti" Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG I believe I have found a bug in netinet. After a cvsup and make world/kernel + mergemaster, I started getting the followiing panics: delayed m_pullup, m->len: 40 off: 23040 p: 6 Fatal trap 12: page fault while in kernel mode fault virtual address = 0x8 fault code = supervisor read, page not present instruction pointer = 0x8:0xc01b10a8 stack pointer = 0x10:0xcd069ae4 frame pointer = 0x10:0xcd069b10 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5740 (itnd) interrupt mask = trap number = 12 panic: page fault syncing disks... done Uptime: 10h22m23s I cannot remember exactly when I cvsup'd. It was either May 17 or May 20. (Any easy way to tell?) Unfortunately, I can only reproduce the panics using an old version of the AltaVista tunnel. The tunnel worked perfectly with up to 4-RELEASE. I only have the binary of the tunnel code and it was compiled for FreeBSD2.2. The fact that it ran perfectly up to 4R is a testament to backward compatibility! Anyway, after some investigation, it looks like the m_pullup() is failing inside in_delayed_cksum(). The mbuf is then NULL and we panic when we set the csum. It looks like m_pullup() is failing since offset is very big. Some prints I added yield this: (IP_VHL_HL(ip->ip_vhl) << 2): 0, csum_data: 23040 off too big, skipping csum (I added code to return w/o setting the csum if I see a bogus offset and I no longer panic, and the ftp which was failing now works) Further investigation shows csum_data being mangled here in ip_output(): ip = mtod(m, struct ip *); /* * Fill in IP header. */ if ((flags & (IP_FORWARDING|IP_RAWOUTPUT)) == 0) { ip->ip_vhl = IP_MAKE_VHL(IPVERSION, hlen >> 2); ip->ip_off &= IP_DF; >>>>>>>>>>> ip->ip_id = htons(ip_id++); ipstat.ips_localout++; } else { hlen = IP_VHL_HL(ip->ip_vhl) << 2; dp_ck_csum_data(m, "a-7.1"); /* davep */ } More prints show: off too big @ a-7.3, off: 0x14, csum_data: 0x5a00 ip: 0xc0a91920, m: 0xc0a91900, &csum_data: 0xc0a91924 Where: ip is ip header inside mbuf m is mbuf pointer &csum_data = &m->m_pkthdr.csum_data csum_data is inside the IP header! And, coincidentally(NOT) ip_id is 4 bytes inside the struct ip, thus overlaying csum_data. So it looks like the m_data is pointing at M_databuf which should imply (as the comment states) /* !M_PKTHDR, !M_EXT */ And yet the code is using fields from struct pkthdr MH_pkthdr; /* M_PKTHDR set */ This is where I leave it for those more familiar with the code to pursue. Hopefully someone who knows the code can use this info to find and fix the bug quickly. It's taken me ~6 hrs just to find out this much. thanks, davep -- David Panariti / I can't complain, davep@who.net but sometimes I still do. (see also http://www.four11.com) / -- Joe Walsh To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message