From owner-svn-src-head@freebsd.org Tue Oct 13 17:20:49 2015 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 02D35A121ED; Tue, 13 Oct 2015 17:20:49 +0000 (UTC) (envelope-from cem@FreeBSD.org) Received: from repo.freebsd.org (repo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id A22451D8; Tue, 13 Oct 2015 17:20:48 +0000 (UTC) (envelope-from cem@FreeBSD.org) Received: from repo.freebsd.org ([127.0.1.37]) by repo.freebsd.org (8.15.2/8.15.2) with ESMTP id t9DHKlPe099190; Tue, 13 Oct 2015 17:20:47 GMT (envelope-from cem@FreeBSD.org) Received: (from cem@localhost) by repo.freebsd.org (8.15.2/8.15.2/Submit) id t9DHKlP3099189; Tue, 13 Oct 2015 17:20:47 GMT (envelope-from cem@FreeBSD.org) Message-Id: <201510131720.t9DHKlP3099189@repo.freebsd.org> X-Authentication-Warning: repo.freebsd.org: cem set sender to cem@FreeBSD.org using -f From: "Conrad E. Meyer" Date: Tue, 13 Oct 2015 17:20:47 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r289232 - head/sys/dev/ntb/ntb_hw X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Oct 2015 17:20:49 -0000 Author: cem Date: Tue Oct 13 17:20:47 2015 New Revision: 289232 URL: https://svnweb.freebsd.org/changeset/base/289232 Log: NTB: MFV 113bf1c9: BWD Link Recovery The BWD NTB device will drop the link if an error is encountered on the point-to-point PCI bridge. The link will stay down until all errors are cleared and the link is re-established. On link down, check to see if the error is detected, if so do the necessary housekeeping to try and recover from the error and reestablish the link. There is a potential race between the 2 NTB devices recovering at the same time. If the times are synchronized, the link will not recover and the driver will be stuck in this loop forever. Add a random interval to the recovery time to prevent this race. Authored by: Jon Mason Obtained from: Linux Sponsored by: EMC / Isilon Storage Division Modified: head/sys/dev/ntb/ntb_hw/ntb_hw.c Modified: head/sys/dev/ntb/ntb_hw/ntb_hw.c ============================================================================== --- head/sys/dev/ntb/ntb_hw/ntb_hw.c Tue Oct 13 17:20:05 2015 (r289231) +++ head/sys/dev/ntb/ntb_hw/ntb_hw.c Tue Oct 13 17:20:47 2015 (r289232) @@ -896,6 +896,7 @@ ntb_handle_heartbeat(void *arg) if (rc != 0) device_printf(ntb->device, "Error determining link status\n"); + /* Check to see if a link error is the cause of the link down */ if (ntb->link_status == NTB_LINK_DOWN) { status32 = ntb_reg_read(4, SOC_LTSSMSTATEJMP_OFFSET); @@ -995,7 +996,15 @@ recover_soc_link(void *arg) uint16_t status16; soc_perform_link_restart(ntb); - pause("Link", SOC_LINK_RECOVERY_TIME * hz / 1000); + + /* + * There is a potential race between the 2 NTB devices recovering at + * the same time. If the times are the same, the link will not recover + * and the driver will be stuck in this loop forever. Add a random + * interval to the recovery time to prevent this race. + */ + status32 = arc4random() % SOC_LINK_RECOVERY_TIME; + pause("Link", (SOC_LINK_RECOVERY_TIME + status32) * hz / 1000); status32 = ntb_reg_read(4, SOC_LTSSMSTATEJMP_OFFSET); if ((status32 & SOC_LTSSMSTATEJMP_FORCEDETECT) != 0) @@ -1005,12 +1014,17 @@ recover_soc_link(void *arg) if ((status32 & SOC_IBIST_ERR_OFLOW) != 0) goto retry; + status32 = ntb_reg_read(4, ntb->reg_ofs.lnk_cntl); + if ((status32 & SOC_CNTL_LINK_DOWN) != 0) + goto out; + status16 = ntb_reg_read(2, ntb->reg_ofs.lnk_stat); width = (status16 & NTB_LINK_WIDTH_MASK) >> 4; speed = (status16 & NTB_LINK_SPEED_MASK); if (ntb->link_width != width || ntb->link_speed != speed) goto retry; +out: callout_reset(&ntb->heartbeat_timer, NTB_HB_TIMEOUT * hz, ntb_handle_heartbeat, ntb); return;