From owner-freebsd-net@FreeBSD.ORG Fri Nov 24 11:44:35 2006 Return-Path: X-Original-To: freebsd-net@freebsd.org Delivered-To: freebsd-net@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 023CA16A62A for ; Fri, 24 Nov 2006 11:44:35 +0000 (UTC) (envelope-from hemanth.thummala@wipro.com) Received: from wip-ec-mm01.wipro.com (wip-ec-mm01.wipro.com [203.91.193.25]) by mx1.FreeBSD.org (Postfix) with ESMTP id E6C5C43D46 for ; Fri, 24 Nov 2006 11:43:51 +0000 (GMT) (envelope-from hemanth.thummala@wipro.com) Received: from wip-ec-mm01.wipro.com (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with ESMTP id DC7F31F61D7 for ; Fri, 24 Nov 2006 17:14:30 +0530 (IST) Received: from wipro.tcpn.com (adm.wipro.tcpn.com [172.31.41.11]) by wip-ec-mm01.wipro.com (Postfix) with ESMTP id A0FC61F6231 for ; Fri, 24 Nov 2006 17:14:30 +0530 (IST) Received: from EC4HP131244 ([172.31.44.83]) by wipro.tcpn.com (8.9.3/8.9.3) with ESMTP id RAA21953 for ; Fri, 24 Nov 2006 17:07:02 +0530 (IST) Message-Id: <200611241137.RAA21953@wipro.tcpn.com> From: "Hemanth" To: Date: Fri, 24 Nov 2006 17:18:09 +0530 Organization: Wipro MIME-Version: 1.0 X-Priority: 1 (Highest) X-MSMail-Priority: High X-Mailer: Microsoft Office Outlook, Build 11.0.5510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1441 Thread-Index: AccPnyxp/ycRC9gFRLKQGbQWjPiihgAHdyPw Importance: High Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: FW: problem: TCPIP process loops and priority reduced to 1. X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: hemanth.thummala@wipro.com List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Nov 2006 11:44:35 -0000 Some more info on this: from tcpcb of FIN_WAIT_1 end i have seen t_rxtshift value is 2, means 2 probe requests have gone from this side. Normally when probe request is going the snd_nxt value wont be incremented. In our case if the probe request is sent with FIN (dont know the possiblity), then the other side while this packet is processed in tcp_input.c we can have this following scenario. /* * If FIN is received ACK the FIN and let the user know * that the connection is closing. */ if (tiflags & TH_FIN) { if (TCPS_HAVERCVDFIN(tp->t_state) == 0) { socantrcvmore(so); tp->t_flags |= TF_ACKNOW; tp->rcv_nxt++; ===========>>> here the rcv_nxt is incremented if the probe contains FIN. } This may be the reason for getting the incremented rcv_nxt(XXXX13) value than the expected snd_max(XXXX14). Now my question is : In what scenarios a FIN can be set in a probe request?? looking for urgent replies.. Regards, Hemanth, _____ From: Hemanth [mailto:hemanth.thummala@wipro.com] Sent: Friday, November 24, 2006 1:35 PM To: 'freebsd-net@freebsd.org' Subject: problem: TCPIP process loops and priority reduced to 1. Importance: High Hi All, As mentioned in the subject the TCPIP is looping and its priority is reducing to 1 by the kernal (BSD kind) >From the dump, we were able to identify the inpcbs that are causing the problem. inpcb:0x222f1a0c, lport:10143, laddr:0x7f000001 (127.0.0.1) fport: 1378 faddr:0x7f000001 (127.0.0.1) tcpcb:0x2270760c inpcb:0x222f1a0c flags:0x01e0 and inpcb:0x224a850c, lport: 1378, laddr:0x7f000001 (127.0.0.1) fport:10143 faddr:0x7f000001 (127.0.0.1) tcpcb:0x2268d10c inpcb:0x224a850c flags:0x01e0 flags:0x0000 If we see the tcpcb structure for both structures $5 = { seg_next_md = 0xfffc0000, t_state = 0x5,====>>> CLOSE_WAIT state t_rxtshift = 0x0, t_rxtcur = 0x3fc, t_dupacks = 0x0, t_maxseg = 0x7d00, t_force = 0x0, t_flags = 0x1e0, t_template = 0x22776bd4, t_inpcb = 0x224a850c, snd_una = 0xbfaeb6d9, snd_nxt = 0xbfaeb6d9, snd_up = 0xbfaeb6d9, snd_wl1 = 0xbfb19e13, snd_wl2 = 0xbfaeb6d9, iss = 0xbfaeb678, snd_wnd = 0x7d00, rcv_wnd = 0xf96, rcv_nxt = 0xbfb19e15, rcv_up = 0xbfb19e13, irs = 0xbfb0a0a2, rcv_adv = 0xbfb19daa, snd_max = 0xbfaeb6d9, inv_SYN_seqbase = 0xffaeb678, snd_cwnd = 0x17f10, snd_ssthresh = 0x3fffc000, t_idle = 0x0003934ba0b407c6, t_rtt = 0x0, t_rtseq = 0xbfaeb6c8, t_srtt = 0xba2, t_rttvar = 0x288, max_rcvd = 0x7d6a, max_sndwnd = 0x7d00, t_oobflags = 0x0, t_iobc = 0x0, tm_running = 0x0, No timer running tm_q = 0x8003378, tm_wait = 0x3fc, tm_exp_time = 0x0003934b7cb490dc, saved_rtt_clock = 0x0003934b7ca5007c, conn_exp_time = 0x0003934b811903db, tm_prev = 0x0, tm_next = 0x0, on_delay_ack_q = 0x0, dack_next = 0x0, dack_prev = 0x0, snd_scale = 0x0, rcv_scale = 0x0, request_r_scale = 0x0, requested_s_scale = 0x0, ts_recent = 0x6e49637, ts_recent_age = 0x6e49638, last_ack_sent = 0xbfb19e15 } And $8 = { seg_next_md = 0xfffc0000, t_state = 0x6, ==========>>> FIN_WAIT_1 t_rxtshift = 0x2, t_rxtcur = 0x3eb, t_dupacks = 0x0, t_maxseg = 0x7d00, t_force = 0x0, t_flags = 0x1e0, t_template = 0x22a1e1d4, t_inpcb = 0x222f1a0c, snd_una = 0xbfb19e13, snd_nxt = 0xbfb19e13, snd_up = 0xbfb19e13, snd_wl1 = 0xbfaeb6d9, snd_wl2 = 0xbfb19e13, iss = 0xbfb0a0a2, snd_wnd = 0x0, rcv_wnd = 0x7d00, rcv_nxt = 0xbfaeb6d9, rcv_up = 0xbfaeb6d9, irs = 0xbfaeb678, rcv_adv = 0xbfaf33d9, snd_max = 0xbfb19e14, inv_SYN_seqbase = 0xffaeb678, snd_cwnd = 0xffff, snd_ssthresh = 0x3fffc000, t_idle = 0x0003934ba0b407c6, t_rtt = 0x0, t_rtseq = 0xbfb17daa, t_srtt = 0x1f40, t_rttvar = 0x3, max_rcvd = 0x0, max_sndwnd = 0x7d00, t_oobflags = 0x0, t_iobc = 0x0, tm_running = 0x2,==== >> PERSIST TIMER is running on this end. tm_q = 0x8004460, tm_wait = 0x1388, tm_exp_time = 0x0003934ba1005255, saved_rtt_clock = 0x0003934b7df4342c, conn_exp_time = 0x0003934b811903de, tm_prev = 0x0, tm_next = 0x0, on_delay_ack_q = 0x0, dack_next = 0x0, dack_prev = 0x0, snd_scale = 0x0, rcv_scale = 0x0, request_r_scale = 0x0, requested_s_scale = 0x0, ts_recent = 0x6e4917e, ts_recent_age = 0x6e4917e, last_ack_sent = 0xbfaeb6d9 } If we closely look into this structure, the snd_max of one end (which is in FIN_WAIT_1 state) is less than the rcv_nxt of another end (which is in CLOSE_WAIT). When the end (present in CLOSE_WAIT) sends any frame it will put rcv_nxt field in the ti_ack field. Because of the below code, a check SEQ_GT(ti->ti_ack, tp->snd_max) at the end which is in FIN_WAIT_1 will fail resulting in it to drop the packed and send an ack with ack sequence number it is expecting. Snippet from tcp_input() 1523 tp->t_dupacks = 0; 1524 if (SEQ_GT(ti->ti_ack, tp->snd_max)) { 1525 tcpstat.tcps_rcvacktoomuch++; 1526 goto dropafterack; 1527 } If we dump the tcpstat information from the online cpu dump: $2 = { ... . . . . . . tcps_connattempt = 382657891, tcps_rcvtotal = 272941912, tcps_sndtotal = 272848321, tcps_noport = 13535, tcps_rcvurp = 0, . . . tcps_sndacks = -394474409, tcps_rcvacktoomuch = 34255849, tcps_rcvackpack = 425941141, tcps_rcvackbyte = 603776015, tcps_rcvwinupd = 651289, tcps_predack = 2280435060, .. tcps_sc_aborted = 0, tcps_sc_dupesyn = 0, tcps_sc_dropped = 0 and the tcpstat captured in the saveabend tcpstat = .. . . . . tcps_connattempt = 382657891 tcps_rcvtotal = 421811861 tcps_sndtotal = 421718270 tcps_noport = 13535 tcps_rcvurp = 0 ... ... tcps_sndacks = -245604460 tcps_rcvacktoomuch = 108690823 tcps_rcvackpack = 425941141 tcps_rcvackbyte = 603776015 tcps_rcvwinupd = 651289 tcps_predack = 2280435060 ... tcps_sc_aborted = 0 tcps_sc_dupesyn = 0 tcps_sc_dropped = 0 We see lot of difference in three values, tcps_rcvtotal,tcps_sndtotal and tcps_rcvacktoomuch and rest of the values remains the same. These values have increased ie almost doubled between time when cpu dump was taken ( Priority was 9) and the save abend (priority is 1). We presume that this observation is important clue for the problem and will work further by looking at code in what scenarios these values will increase. This will cause another end to send ack again with incorrect sequence number. These events ie of repeated acks continue which led the tcpip proc to loop and hence priority was reduced to 1. This explains the reason behind why only tcps_rcvtotal, tcps_sndtotal, tcps_rcvduppack and tcps_rcvacktoomuch are getting incremented in tcpstats. We simulated the problem by making one end to send ACK frame with ack seq number higher than the expected. This resulted in loop and priority was reduced to one. We are now looking at the scenarios when an ACK packet can come with ack sequence number higher than expected. We are wondering in which scenarios this type of problem can occur in BSD. I am looking for urgent feedbacks. Regards, Hemanth.