From owner-freebsd-bugs Sun May 5 14:11:53 2002 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from freefall.freebsd.org (freefall.FreeBSD.org [216.136.204.21]) by hub.freebsd.org (Postfix) with ESMTP id 00CB037B412 for ; Sun, 5 May 2002 14:10:02 -0700 (PDT) Received: (from gnats@localhost) by freefall.freebsd.org (8.11.6/8.11.6) id g45LA2b06435; Sun, 5 May 2002 14:10:02 -0700 (PDT) (envelope-from gnats) Received: from nwww.freebsd.org (nwww.FreeBSD.org [216.136.204.117]) by hub.freebsd.org (Postfix) with ESMTP id 7BDFC37B406 for ; Sun, 5 May 2002 14:04:31 -0700 (PDT) Received: from nwww.freebsd.org (localhost [127.0.0.1]) by nwww.freebsd.org (8.12.2/8.12.2) with ESMTP id g45L5qhG081139 for ; Sun, 5 May 2002 14:05:52 -0700 (PDT) (envelope-from nobody@nwww.freebsd.org) Received: (from nobody@localhost) by nwww.freebsd.org (8.12.2/8.12.2/Submit) id g45L5qxP081138; Sun, 5 May 2002 14:05:52 -0700 (PDT) Message-Id: <200205052105.g45L5qxP081138@nwww.freebsd.org> Date: Sun, 5 May 2002 14:05:52 -0700 (PDT) From: Brett Glass To: freebsd-gnats-submit@FreeBSD.org X-Send-Pr-Version: www-1.0 Subject: bin/37777: A single lost packet on a (userland) PPP connection causes long-term disruption and a "storm" of requests and acknowledgements on the CCP layer Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 37777 >Category: bin >Synopsis: A single lost packet on a (userland) PPP connection causes long-term disruption and a "storm" of requests and acknowledgements on the CCP layer >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun May 05 14:10:02 PDT 2002 >Closed-Date: >Last-Modified: >Originator: Brett Glass >Release: 4.5-RELEASE-P4 >Organization: LARIAT >Environment: All machines tested are using 4.5-RELEASE-P4. >Description: We are using PPPoE over a wireless connection which very occasionally drops packets -- typically a few an hour. However, whenever a dropped packet occurs, the link (which is using "deflate" compression) stalls for long periods (sometimes seconds, sometimes tens of minutes). Turning on logging of CCP (set log +ccp) reveals that, instead of resetting the compression dictionary and gracefully continuing, the two nodes are sending a "storm" of redundant refresh requests and cascading errors. The problem does not seem to be limited to PPPoE or wireless communications, but rather crops up more obviously in these situations because the underlying transport is not reliable. (Modern modems virtually always correct errors unless the connection is dropped completely.) The same problem could crop up on a congested hardwired Ethernet. PPP does not assume a reliable transport, and so should be sufficiently resilient to deal with occasional lost packets. When a packet is dropped and CCP logging is enabled, the PPP log shows a long stream of messages such as the following. (Note that the first indicates the dropped packet.) The cascade of errors can last for a long time. The glitch from which the messages below are an excerpt continued for more than 10 minutes. (Apologies for the long log excerpt, but it helps to show the severity and longevity of the outage and the thrashing which is occurring.) May 5 13:19:42 workhorse ppp[1067]: CCP: DeflateInput: Seq error: Got 397, expected 396 May 5 13:19:42 workhorse ppp[1067]: CCP: deflink: SendResetReq(5) state = Opened May 5 13:19:42 workhorse ppp[1067]: CCP: deflink: SendResetReq(5) state = Opened May 5 13:19:42 workhorse ppp[1067]: CCP: deflink: RecvResetAck(5) state = Opened May 5 13:19:42 workhorse ppp[1067]: CCP: Deflate: Input channel reset May 5 13:19:42 workhorse ppp[1067]: CCP: deflink: RecvResetAck(5) state = Opened May 5 13:19:42 workhorse ppp[1067]: CCP: deflink: Duplicate ResetAck (resetting again) May 5 13:19:42 workhorse ppp[1067]: CCP: Deflate: Input channel reset May 5 13:20:05 workhorse ppp[1067]: CCP: deflink: RecvResetReq(187) state = Opened May 5 13:20:05 workhorse ppp[1067]: CCP: Deflate: Output channel reset May 5 13:20:05 workhorse ppp[1067]: CCP: deflink: SendResetAck(187) state = Opened May 5 13:20:05 workhorse ppp[1067]: CCP: deflink: RecvResetReq(187) state = Opened May 5 13:20:05 workhorse ppp[1067]: CCP: Deflate: Output channel reset May 5 13:20:05 workhorse ppp[1067]: CCP: deflink: SendResetAck(187) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: DeflateInput: Seq error: Got 383, expected 382 May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: SendResetReq(6) state = Opened May 5 13:20:19 workhorse last message repeated 3 times May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetReq(188) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: Deflate: Output channel reset May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: SendResetAck(188) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: SendResetReq(6) state = Opened May 5 13:20:19 workhorse last message repeated 5 times May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetReq(188) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: Deflate: Output channel reset May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: SendResetAck(188) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: Deflate: Input channel reset May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Duplicate ResetAck (resetting again) May 5 13:20:19 workhorse ppp[1067]: CCP: Deflate: Input channel reset May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: RecvResetAck(6) state = Opened May 5 13:20:19 workhorse ppp[1067]: CCP: deflink: Unexpected ResetAck (id 6) ignored [Snip] >How-To-Repeat: Establish a PPPoE or PPP-over-UDP link between two computers and disrupt the link for a brief period. (Alternatively, one may instrument the code so that a packet is blocked now and then on the way to an interface to achieve the same effect.) >Fix: None known at this time. At first, we theorized that the problem was specific to the (preferred) "deflate" compression algorithm. However, while connections established using the "predictor-1" algorithm were better behaved, sometimes showed some thrashing behavior as well, with multiple renegotiations after only a single dropped packet. Turning off ALL compression (by disabling and denying all available algorithms) is a workaround, but a highly inefficent one. (One of our primary motivations for the use of PPPoE is to compress the data stream, of which approximately 30%, according to our measurements, is highly compressible HTML.) Our tests seem to indicate that the problem is most likely in the compression portion of the code -- perhaps the state machine(s) related to CCP. They may also suggest a need for a frame retransmission mechanism within PPP that does not require a complete reset of the compression layer. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message