From owner-freebsd-net@FreeBSD.ORG Fri Nov 24 08:01:07 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 2ECD716A407 for ; Fri, 24 Nov 2006 08:01:07 +0000 (UTC) (envelope-from hemanth.thummala@wipro.com) Received: from wip-ec-mm01.wipro.com (wip-ec-mm01.wipro.com [203.91.193.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id 020A943D5A for ; Fri, 24 Nov 2006 08:00:22 +0000 (GMT) (envelope-from hemanth.thummala@wipro.com) Received: from wip-ec-mm01.wipro.com (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with ESMTP id 27C341F61CB for ; Fri, 24 Nov 2006 13:31:00 +0530 (IST) Received: from wipro.tcpn.com (adm.wipro.tcpn.com [172.31.41.11]) by wip-ec-mm01.wipro.com (Postfix) with ESMTP id DE39A1F6093 for ; Fri, 24 Nov 2006 13:30:59 +0530 (IST) Received: from EC4HP131244 ([172.31.44.83]) by wipro.tcpn.com (8.9.3/8.9.3) with ESMTP id NAA16199 for ; Fri, 24 Nov 2006 13:23:32 +0530 (IST) Message-Id: <200611240753.NAA16199@wipro.tcpn.com> From: "Hemanth" To: Date: Fri, 24 Nov 2006 13:34:37 +0530 Organization: Wipro MIME-Version: 1.0 X-Priority: 1 (Highest) X-MSMail-Priority: High X-Mailer: Microsoft Office Outlook, Build 11.0.5510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Thread-Index: AccPnyxp/ycRC9gFRLKQGbQWjPiihg== Importance: High Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: problem: TCPIP process loops and priority reduced to 1. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: hemanth.thummala@wipro.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Nov 2006 08:01:07 -0000 Hi All, As mentioned in the subject the TCPIP is looping and its priority is reducing to 1 by the kernal (BSD kind) >From the dump, we were able to identify the inpcbs that are causing the problem. inpcb:0x222f1a0c, lport:10143, laddr:0x7f000001 (127.0.0.1) fport: 1378 faddr:0x7f000001 (127.0.0.1) tcpcb:0x2270760c inpcb:0x222f1a0c flags:0x01e0 and inpcb:0x224a850c, lport: 1378, laddr:0x7f000001 (127.0.0.1) fport:10143 faddr:0x7f000001 (127.0.0.1) tcpcb:0x2268d10c inpcb:0x224a850c flags:0x01e0 flags:0x0000 If we see the tcpcb structure for both structures $5 = { seg_next_md = 0xfffc0000, t_state = 0x5,====>>> CLOSE_WAIT state t_rxtshift = 0x0, t_rxtcur = 0x3fc, t_dupacks = 0x0, t_maxseg = 0x7d00, t_force = 0x0, t_flags = 0x1e0, t_template = 0x22776bd4, t_inpcb = 0x224a850c, snd_una = 0xbfaeb6d9, snd_nxt = 0xbfaeb6d9, snd_up = 0xbfaeb6d9, snd_wl1 = 0xbfb19e13, snd_wl2 = 0xbfaeb6d9, iss = 0xbfaeb678, snd_wnd = 0x7d00, rcv_wnd = 0xf96, rcv_nxt = 0xbfb19e15, rcv_up = 0xbfb19e13, irs = 0xbfb0a0a2, rcv_adv = 0xbfb19daa, snd_max = 0xbfaeb6d9, inv_SYN_seqbase = 0xffaeb678, snd_cwnd = 0x17f10, snd_ssthresh = 0x3fffc000, t_idle = 0x0003934ba0b407c6, t_rtt = 0x0, t_rtseq = 0xbfaeb6c8, t_srtt = 0xba2, t_rttvar = 0x288, max_rcvd = 0x7d6a, max_sndwnd = 0x7d00, t_oobflags = 0x0, t_iobc = 0x0, tm_running = 0x0, No timer running tm_q = 0x8003378, tm_wait = 0x3fc, tm_exp_time = 0x0003934b7cb490dc, saved_rtt_clock = 0x0003934b7ca5007c, conn_exp_time = 0x0003934b811903db, tm_prev = 0x0, tm_next = 0x0, on_delay_ack_q = 0x0, dack_next = 0x0, dack_prev = 0x0, snd_scale = 0x0, rcv_scale = 0x0, request_r_scale = 0x0, requested_s_scale = 0x0, ts_recent = 0x6e49637, ts_recent_age = 0x6e49638, last_ack_sent = 0xbfb19e15 } And $8 = { seg_next_md = 0xfffc0000, t_state = 0x6, ==========>>> FIN_WAIT_1 t_rxtshift = 0x2, t_rxtcur = 0x3eb, t_dupacks = 0x0, t_maxseg = 0x7d00, t_force = 0x0, t_flags = 0x1e0, t_template = 0x22a1e1d4, t_inpcb = 0x222f1a0c, snd_una = 0xbfb19e13, snd_nxt = 0xbfb19e13, snd_up = 0xbfb19e13, snd_wl1 = 0xbfaeb6d9, snd_wl2 = 0xbfb19e13, iss = 0xbfb0a0a2, snd_wnd = 0x0, rcv_wnd = 0x7d00, rcv_nxt = 0xbfaeb6d9, rcv_up = 0xbfaeb6d9, irs = 0xbfaeb678, rcv_adv = 0xbfaf33d9, snd_max = 0xbfb19e14, inv_SYN_seqbase = 0xffaeb678, snd_cwnd = 0xffff, snd_ssthresh = 0x3fffc000, t_idle = 0x0003934ba0b407c6, t_rtt = 0x0, t_rtseq = 0xbfb17daa, t_srtt = 0x1f40, t_rttvar = 0x3, max_rcvd = 0x0, max_sndwnd = 0x7d00, t_oobflags = 0x0, t_iobc = 0x0, tm_running = 0x2,==== >> PERSIST TIMER is running on this end. tm_q = 0x8004460, tm_wait = 0x1388, tm_exp_time = 0x0003934ba1005255, saved_rtt_clock = 0x0003934b7df4342c, conn_exp_time = 0x0003934b811903de, tm_prev = 0x0, tm_next = 0x0, on_delay_ack_q = 0x0, dack_next = 0x0, dack_prev = 0x0, snd_scale = 0x0, rcv_scale = 0x0, request_r_scale = 0x0, requested_s_scale = 0x0, ts_recent = 0x6e4917e, ts_recent_age = 0x6e4917e, last_ack_sent = 0xbfaeb6d9 } If we closely look into this structure, the snd_max of one end (which is in FIN_WAIT_1 state) is less than the rcv_nxt of another end (which is in CLOSE_WAIT). When the end (present in CLOSE_WAIT) sends any frame it will put rcv_nxt field in the ti_ack field. Because of the below code, a check SEQ_GT(ti->ti_ack, tp->snd_max) at the end which is in FIN_WAIT_1 will fail resulting in it to drop the packed and send an ack with ack sequence number it is expecting. Snippet from tcp_input() 1523 tp->t_dupacks = 0; 1524 if (SEQ_GT(ti->ti_ack, tp->snd_max)) { 1525 tcpstat.tcps_rcvacktoomuch++; 1526 goto dropafterack; 1527 } If we dump the tcpstat information from the online cpu dump: $2 = { ... . . . . . . tcps_connattempt = 382657891, tcps_rcvtotal = 272941912, tcps_sndtotal = 272848321, tcps_noport = 13535, tcps_rcvurp = 0, . . . tcps_sndacks = -394474409, tcps_rcvacktoomuch = 34255849, tcps_rcvackpack = 425941141, tcps_rcvackbyte = 603776015, tcps_rcvwinupd = 651289, tcps_predack = 2280435060, .. tcps_sc_aborted = 0, tcps_sc_dupesyn = 0, tcps_sc_dropped = 0 and the tcpstat captured in the saveabend tcpstat = .. . . . . tcps_connattempt = 382657891 tcps_rcvtotal = 421811861 tcps_sndtotal = 421718270 tcps_noport = 13535 tcps_rcvurp = 0 ... ... tcps_sndacks = -245604460 tcps_rcvacktoomuch = 108690823 tcps_rcvackpack = 425941141 tcps_rcvackbyte = 603776015 tcps_rcvwinupd = 651289 tcps_predack = 2280435060 ... tcps_sc_aborted = 0 tcps_sc_dupesyn = 0 tcps_sc_dropped = 0 We see lot of difference in three values, tcps_rcvtotal,tcps_sndtotal and tcps_rcvacktoomuch and rest of the values remains the same. These values have increased ie almost doubled between time when cpu dump was taken ( Priority was 9) and the save abend (priority is 1). We presume that this observation is important clue for the problem and will work further by looking at code in what scenarios these values will increase. This will cause another end to send ack again with incorrect sequence number. These events ie of repeated acks continue which led the tcpip proc to loop and hence priority was reduced to 1. This explains the reason behind why only tcps_rcvtotal, tcps_sndtotal, tcps_rcvduppack and tcps_rcvacktoomuch are getting incremented in tcpstats. We simulated the problem by making one end to send ACK frame with ack seq number higher than the expected. This resulted in loop and priority was reduced to one. We are now looking at the scenarios when an ACK packet can come with ack sequence number higher than expected. We are wondering in which scenarios this type of problem can occur in BSD. I am looking for urgent feedbacks. Regards, Hemanth.