From owner-freebsd-hackers Fri Jun 28 19:24:01 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA26113 for hackers-outgoing; Fri, 28 Jun 1996 19:24:01 -0700 (PDT) Received: from badboy.wisetech.com (badboy.wisetech.com [205.231.232.76]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id TAA26108 for ; Fri, 28 Jun 1996 19:23:55 -0700 (PDT) Received: from badboy.wisetech.com (localhost [127.0.0.1]) by badboy.wisetech.com (8.6.12/8.6.9) with SMTP id WAA26183 for ; Fri, 28 Jun 1996 22:11:07 -0400 Message-ID: <31D490BA.446B9B3D@wisetech.com> Date: Fri, 28 Jun 1996 22:11:06 -0400 From: Rick Weldon Organization: Weldon Internet SEcurity Technologies X-Mailer: Mozilla 2.0 (X11; I; FreeBSD 2.1.0-RELEASE i386) MIME-Version: 1.0 To: freebsd-hackers@freebsd.org Subject: BPF implementation questions Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk Hello Hackers, I am working on an application that allows one to capture packet data, save it off, analyze it, and play back tcp sessions. I have run into a couple of problems with the BPF implementation and I am hoping that someone here might be able to help me with a problem I am having. This will take some explainin' to get to the problem so bear with me please. Using FreeBSD 2.1 Release on a Pentium with a NE2000 clone ethernet card. I capture the data from the net using bpf and shove it into a data file. I write out the header first and then all of the ethernet frame. Pretty straight forward. The data file format looks like this. bpf_hdr|ether|ip|tcp|tcp data| The problem I am running into is that when I try to get the absolute length of the TCP data portion I can't. The bpf_hdr is lying to me on exactly how much TCP data is there in a round about sort of way. For instance here is a dump of the data file with breakouts of the fields DATE CAPLEN DATLEN BPF HDRLEN |------------------||---------||---------||----------| 2f18 31d4 1938 0006 003c 0000 003c 0000 0012 ETHERNET HEADER |---------------------------------| 4000 1405 140a 0000 2ac0 7772 0008 IP HEADER |------------------------------------------------| 0045 2c00 13b8 0040 06ff a255 e7cd 22e9 e7cd 23e9 TCP HEADER TCP DATA |----------------------------------------------------------||---| 0a80 1700 70c7 008a 0000 0000 0260 3822 4636 0000 0402 b405 2a3a ^^^^ ^^^^ NEXT BPF HEADER IN THE FILE |------------------------------------------- 2f18 31d4 253b 0006 003c 0000 003c 0000 0012 ... Okay if we do some math and walk through the headers you will see what I mean. The CAPLEN and DATLEN are the same because I don't set a snap limit. DATALEN (0x3c) is 60 bytes. If you count the bytes they do indeed total 60 which has us pointing at the next bpf header to start the next read. No problem. To calculate the actual tcp data length you would use: tcp_datalen = bh_datalen - ((sizeof(struct ether_header) + (p_ip->ip_hl * 4) + (tcp->th_off * 4))); Filling in the numbers you get: tcp_datalen = 60 - (14 + 20 + 24) or 60 - 58 With the above packet this works out to 2 bytes. This could be correct except this is a syn packet and there shouldn't be any data. If you translate the 2a3a it is ":*". What my application does is stream all of the data portions of the tcp packets together so that you can see everything the user typed/saw etc... The rub is that this extra cargo munges up the output streams with meaningless characters. On ACK packets I notice that there are 6 bytes of TCP data that have no meaning. I could understand if the BPF code was padding out to a word boundary. But this would only give me 2 extra bytes. Not 6. This would not be a problem if the BPF header told me exactly how long the data portion of the packet was. I thought that was what bh_datalen was supposed to do. But as I mentioned ack packets can have as much as 6 bytes of nothing hanging around. Under these conditions one cannot derive what exactly is meaningful in the data portions of the packet. Can anyone tell me why BPF does this, and is there a solution to this problem? The only thing I can think of would be to go in and tweak the bpf code to make sure that bh_datalen reflects actually what was captured. Another would be to check for the ACK, SYN, SYN/ACK, FIN flags and just don't attempt to look at the tcp data portions. The only thing I have run into with this is when the PUSH flag is set then there may be meaningful data in the packet. Sometimes yes, sometime no. One thing to note is that utilities like tcpdump couldn't care less about the data portions of packets. It is only concerned with the headers and hence this problem would not arise except under the conditions where you really do need the tcp data. If someone out there somewhere can solve this problem I sure would like to know about it. Thanks for any help anyone can provide. Rick Weldon