From owner-svn-src-head@freebsd.org Wed Oct 19 23:27:58 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 2E44AC19133; Wed, 19 Oct 2016 23:27:58 +0000 (UTC) (envelope-from asomers@gmail.com) Received: from mail-qt0-x229.google.com (mail-qt0-x229.google.com [IPv6:2607:f8b0:400d:c0d::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id D4D8C97C; Wed, 19 Oct 2016 23:27:57 +0000 (UTC) (envelope-from asomers@gmail.com) Received: by mail-qt0-x229.google.com with SMTP id s49so37390458qta.0; Wed, 19 Oct 2016 16:27:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=OpIgJifS+OCNaztcEJzBLLR9p1+24qNMwt3hLWLBhdw=; b=zx8XYwep7gO41qjEGdh1Fo4zlbpcTArmj2tS/wj6NOVkZN0SG+dgxJydFcqRjv9QWd dsyWyZ+cVQz8DuusLH93Wnn7dxKAftfybiLBvtcLC+pRVBWjQvJ7EtURNyuDfzkq0Xbo GITCl8Nbou5He8xl3TrOPn4FsQ2+6DXRb+Peaoqh9KGVTPfHQUxgegFmyMWDAJZlUOjQ RkyXDW5SwQ2VzfNheLShCh/43E/n5WNVEpiLeF4F9fVSrHNWwtZrS7pk6sqPTwRK1UIo ZJzgOo1Iaa3bw8uEdE0G9MOxHFj++P/LjWRzDLQmJwEv2NsRm4rWrYaUkZ99jOMOdQCR F93w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=OpIgJifS+OCNaztcEJzBLLR9p1+24qNMwt3hLWLBhdw=; b=LvOkyehoDYhn/aRXsuk9qe8nD5O2Mmy9RKaBXK4UcSE2PtmVXhPdd9yfIXqRXY/AN4 IqQa/sYuQYHBurhGDFev9DXuCADzXGMNUG3uSBi0p7TGqFfoU8jt4rBq4PBqOASmSUSU PvOVxSF3Xg9T8ztkpJiHUeLXCPWBBaJ4kVOJrtRtCJd9L3Q7PdsemPcsvYE7V0ka58uH Zw24lx+XOkgY07k+7aS5KS7Y1OeSoceHFa1SqEtF7Z6+JUwd3oD8rPU8+8KQPEWtJZmU fHdgezp7t9Qs9P3hHsJdtpqfy7/W/PkgcdTe5YYhLny2l96dBsHxmwPGMYwpBR2ASetd lLvQ== X-Gm-Message-State: AA6/9Rk2m4Pm0XdWaJ0f9qY78vecpx0oIZDoMGzM+3s6thBl8nM5TUauXTrB75RYZubLBSyBCE6o0k5SKEJtyg== X-Received: by 10.200.33.201 with SMTP id 9mr9072954qtz.141.1476919676842; Wed, 19 Oct 2016 16:27:56 -0700 (PDT) MIME-Version: 1.0 Sender: asomers@gmail.com Received: by 10.12.138.251 with HTTP; Wed, 19 Oct 2016 16:27:56 -0700 (PDT) In-Reply-To: <201409271357.s8RDvmTC072149@svn.freebsd.org> References: <201409271357.s8RDvmTC072149@svn.freebsd.org> From: Alan Somers Date: Wed, 19 Oct 2016 17:27:56 -0600 X-Google-Sender-Auth: 2BCkVO6Kxe5tax2jkGxKxUeyZuU Message-ID: Subject: Re: svn commit: r272211 - head/sys/net To: "Alexander V. Chernikov" Cc: "src-committers@freebsd.org" , "svn-src-all@freebsd.org" , "svn-src-head@freebsd.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 19 Oct 2016 23:27:58 -0000 On Sat, Sep 27, 2014 at 7:57 AM, Alexander V. Chernikov wrote: > Author: melifaro > Date: Sat Sep 27 13:57:48 2014 > New Revision: 272211 > URL: http://svnweb.freebsd.org/changeset/base/272211 > > Log: > Use underlying ports counters to get lagg statistics instead of > per-packet accounting. > This introduce user-visible changes like aggregating error counters. > > Reviewed by: asomers (prev.version), glebius > CR: D781 > MFC after: 2 weeks > Sponsored by: Yandex LLC > > Modified: > head/sys/net/if_lagg.c > head/sys/net/if_lagg.h > head/sys/net/if_var.h > I think this change is causing a LOR and deadlock. It happens if I create a lagg and then quickly destroy it. The deadlocked threads have these stack traces: Tracing command ifconfig pid 7334 tid 100823 td 0xfffff8014ff34000 sched_switch() at sched_switch+0x48a/frame 0xfffffe20b3771470 mi_switch() at mi_switch+0x167/frame 0xfffffe20b37714a0 turnstile_wait() at turnstile_wait+0x3be/frame 0xfffffe20b37714f0 __mtx_lock_sleep() at __mtx_lock_sleep+0x196/frame 0xfffffe20b3771570 __mtx_lock_flags() at __mtx_lock_flags+0x10d/frame 0xfffffe20b37715c0 _rm_rlock() at _rm_rlock+0x28b/frame 0xfffffe20b3771600 _rm_rlock_debug() at _rm_rlock_debug+0x11f/frame 0xfffffe20b3771640 lagg_get_counter() at lagg_get_counter+0x4c/frame 0xfffffe20b37716c0 if_data_copy() at if_data_copy+0xa1/frame 0xfffffe20b37716e0 sysctl_rtsock() at sysctl_rtsock+0x56c/frame 0xfffffe20b3771860 sysctl_root_handler_locked() at sysctl_root_handler_locked+0x8a/frame 0xfffffe20b37718a0 sysctl_root() at sysctl_root+0x188/frame 0xfffffe20b3771920 userland_sysctl() at userland_sysctl+0x16e/frame 0xfffffe20b37719c0 sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe20b3771a70 amd64_syscall() at amd64_syscall+0x314/frame 0xfffffe20b3771bf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b3771bf0 --- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800fceeea, rsp = 0x7fffffffe408, rbp = 0x7fffffffe440 --- Tracing command ifconfig pid 7331 tid 100796 td 0xfffff80066df5a00 sched_switch() at sched_switch+0x48a/frame 0xfffffe20b36ea630 mi_switch() at mi_switch+0x167/frame 0xfffffe20b36ea660 turnstile_wait() at turnstile_wait+0x3be/frame 0xfffffe20b36ea6b0 __rw_wlock_hard() at __rw_wlock_hard+0xb5/frame 0xfffffe20b36ea740 _rw_wlock_cookie() at _rw_wlock_cookie+0xbc/frame 0xfffffe20b36ea780 lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x5c/frame 0xfffffe20b36ea7c0 lagg_ioctl() at lagg_ioctl+0x115a/frame 0xfffffe20b36ea8a0 ifioctl() at ifioctl+0xdc1/frame 0xfffffe20b36ea930 kern_ioctl() at kern_ioctl+0x246/frame 0xfffffe20b36ea990 sys_ioctl() at sys_ioctl+0x171/frame 0xfffffe20b36eaa70 amd64_syscall() at amd64_syscall+0x314/frame 0xfffffe20b36eabf0 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b36eabf0 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fd417a, rsp = 0x7fffffffe228, rbp = 0x7fffffffe2a0 --- The problem is that lagg_get_counter calls LAGG_RLOCK after calling IF_ADDR_RLOCK at rtsock.c:1717. Meanwhile, another thread called IF_ADDR_WLOCK at if_lagg.c:1581 after having already called LAGG_WLOCK at f_lagg.c:1530. I think this revision introduced the problem because reading the lagg's counters did not previously require the LAGG_RLOCK. Do you have any ideas on how to fix it? -Alan