From owner-freebsd-net@freebsd.org Fri Nov 30 12:27:50 2018 Return-Path: Delivered-To: freebsd-net@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 88D3B11430F6 for ; Fri, 30 Nov 2018 12:27:50 +0000 (UTC) (envelope-from agapon@gmail.com) Received: from mail-lj1-f169.google.com (mail-lj1-f169.google.com [209.85.208.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id BC7D08034C; Fri, 30 Nov 2018 12:27:49 +0000 (UTC) (envelope-from agapon@gmail.com) Received: by mail-lj1-f169.google.com with SMTP id x85-v6so4837511ljb.2; Fri, 30 Nov 2018 04:27:49 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:to:from:subject:openpgp:autocrypt:message-id :date:user-agent:mime-version:content-language :content-transfer-encoding; bh=DfnCV68m405iwEsujBBrHo5dtil9TMqFQRHztA3kjAg=; b=b3rhukObjRaUwDxMYV2WtGlzZXDy5Lab8Ey3wAOjBhtd/87n0Jei3/kFAYtPk6ijVI rIz+0L1WrksTPfoHtsMqAiIgysp6bCFRdQCHPVdQ9gbeg1T5hRl7HeZjanDSV1sr9rpT Md7hIH5HQhXyIaiieYynjvSftRBoejVwwR/KpuROc28m4Q5iEBFLw3usMbehyLR8m6KD AHG6sgkriIfNhv3YpyLf+/O3EO1CJetArtVY8fvsWrvGGJIYXgx5yTqlSNvI+YrTPT+O q1d1MPf6upKhCV8GmWoBWDht1FyXmd103JRbE42BkfTJUp9icX/k8Q+NtRH2fynU0hU1 E3UA== X-Gm-Message-State: AA+aEWa8eKwonHkEvl58oT4fzCpvm586of8WTuYWv1JusDSw42H20ihB aLw8uLUbxCJ5o3qH68b/g1D5pdfx X-Google-Smtp-Source: AFSGD/VZG7nLo5pItT2/ipjEKFNb26ZhnKtkSMuVXGjerr9VHhsasg6JXSPUnoojUQzmPP45uog5eQ== X-Received: by 2002:a2e:3803:: with SMTP id f3-v6mr3715887lja.169.1543580867819; Fri, 30 Nov 2018 04:27:47 -0800 (PST) Received: from [192.168.0.88] (east.meadow.volia.net. [93.72.151.96]) by smtp.googlemail.com with ESMTPSA id c14sm791534lfb.40.2018.11.30.04.27.45 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 30 Nov 2018 04:27:46 -0800 (PST) To: freebsd-net@FreeBSD.org From: Andriy Gapon Subject: lagg: a race between ioctl and clone_destroy Openpgp: preference=signencrypt Autocrypt: addr=avg@FreeBSD.org; prefer-encrypt=mutual; keydata= xsFNBFm4LIgBEADNB/3lT7f15UKeQ52xCFQx/GqHkSxEdVyLFZTmY3KyNPQGBtyvVyBfprJ7 mAeXZWfhat6cKNRAGZcL5EmewdQuUfQfBdYmKjbw3a9GFDsDNuhDA2QwFt8BmkiVMRYyvI7l N0eVzszWCUgdc3qqM6qqcgBaqsVmJluwpvwp4ZBXmch5BgDDDb1MPO8AZ2QZfIQmplkj8Y6Z AiNMknkmgaekIINSJX8IzRzKD5WwMsin70psE8dpL/iBsA2cpJGzWMObVTtCxeDKlBCNqM1i gTXta1ukdUT7JgLEFZk9ceYQQMJJtUwzWu1UHfZn0Fs29HTqawfWPSZVbulbrnu5q55R4PlQ /xURkWQUTyDpqUvb4JK371zhepXiXDwrrpnyyZABm3SFLkk2bHlheeKU6Yql4pcmSVym1AS4 dV8y0oHAfdlSCF6tpOPf2+K9nW1CFA8b/tw4oJBTtfZ1kxXOMdyZU5fiG7xb1qDgpQKgHUX8 7Rd2T1UVLVeuhYlXNw2F+a2ucY+cMoqz3LtpksUiBppJhw099gEXehcN2JbUZ2TueJdt1FdS ztnZmsHUXLxrRBtGwqnFL7GSd6snpGIKuuL305iaOGODbb9c7ne1JqBbkw1wh8ci6vvwGlzx rexzimRaBzJxlkjNfMx8WpCvYebGMydNoeEtkWldtjTNVsUAtQARAQABzR5BbmRyaXkgR2Fw b24gPGF2Z0BGcmVlQlNELm9yZz7CwZQEEwEIAD4WIQS+LEO7ngQnXA4Bjr538m7TUc1yjwUC WbgsiAIbIwUJBaOagAULCQgHAgYVCAkKCwIEFgIDAQIeAQIXgAAKCRB38m7TUc1yj+JAEACV l9AK/nOWAt/9cufV2fRj0hdOqB1aCshtSrwHk/exXsDa4/FkmegxXQGY+3GWX3deIyesbVRL rYdtdK0dqJyT1SBqXK1h3/at9rxr9GQA6KWOxTjUFURsU7ok/6SIlm8uLRPNKO+yq0GDjgaO LzN+xykuBA0FlhQAXJnpZLcVfPJdWv7sSHGedL5ln8P8rxR+XnmsA5TUaaPcbhTB+mG+iKFj GghASDSfGqLWFPBlX/fpXikBDZ1gvOr8nyMY9nXhgfXpq3B6QCRYKPy58ChrZ5weeJZ29b7/ QdEO8NFNWHjSD9meiLdWQaqo9Y7uUxN3wySc/YUZxtS0bhAd8zJdNPsJYG8sXgKjeBQMVGuT eCAJFEYJqbwWvIXMfVWop4+O4xB+z2YE3jAbG/9tB/GSnQdVSj3G8MS80iLS58frnt+RSEw/ psahrfh0dh6SFHttE049xYiC+cM8J27Aaf0i9RflyITq57NuJm+AHJoU9SQUkIF0nc6lfA+o JRiyRlHZHKoRQkIg4aiKaZSWjQYRl5Txl0IZUP1dSWMX4s3XTMurC/pnja45dge/4ESOtJ9R 8XuIWg45Oq6MeIWdjKddGhRj3OohsltKgkEU3eLKYtB6qRTQypHHUawCXz88uYt5e3w4V16H lCpSTZV/EVHnNe45FVBlvK7k7HFfDDkryM7BTQRZuCyIARAAlq0slcsVboY/+IUJdcbEiJRW be9HKVz4SUchq0z9MZPX/0dcnvz/gkyYA+OuM78dNS7Mbby5dTvOqfpLJfCuhaNYOhlE0wY+ 1T6Tf1f4c/uA3U/YiadukQ3+6TJuYGAdRZD5EqYFIkreARTVWg87N9g0fT9BEqLw9lJtEGDY EWUE7L++B8o4uu3LQFEYxcrb4K/WKmgtmFcm77s0IKDrfcX4doV92QTIpLiRxcOmCC/OCYuO jB1oaaqXQzZrCutXRK0L5XN1Y1PYjIrEzHMIXmCDlLYnpFkK+itlXwlE2ZQxkfMruCWdQXye syl2fynAe8hvp7Mms9qU2r2K9EcJiR5N1t1C2/kTKNUhcRv7Yd/vwusK7BqJbhlng5ZgRx0m WxdntU/JLEntz3QBsBsWM9Y9wf2V4tLv6/DuDBta781RsCB/UrU2zNuOEkSixlUiHxw1dccI 6CVlaWkkJBxmHX22GdDFrcjvwMNIbbyfQLuBq6IOh8nvu9vuItup7qemDG3Ms6TVwA7BD3j+ 3fGprtyW8Fd/RR2bW2+LWkMrqHffAr6Y6V3h5kd2G9Q8ZWpEJk+LG6Mk3fhZhmCnHhDu6CwN MeUvxXDVO+fqc3JjFm5OxhmfVeJKrbCEUJyM8ESWLoNHLqjywdZga4Q7P12g8DUQ1mRxYg/L HgZY3zfKOqcAEQEAAcLBfAQYAQgAJhYhBL4sQ7ueBCdcDgGOvnfybtNRzXKPBQJZuCyIAhsM BQkFo5qAAAoJEHfybtNRzXKPBVwQAKfFy9P7N3OsLDMB56A4Kf+ZT+d5cIx0Yiaf4n6w7m3i ImHHHk9FIetI4Xe54a2IXh4Bq5UkAGY0667eIs+Z1Ea6I2i27Sdo7DxGwq09Qnm/Y65ADvXs 3aBvokCcm7FsM1wky395m8xUos1681oV5oxgqeRI8/76qy0hD9WR65UW+HQgZRIcIjSel9vR XDaD2HLGPTTGr7u4v00UeTMs6qvPsa2PJagogrKY8RXdFtXvweQFz78NbXhluwix2Tb9ETPk LIpDrtzV73CaE2aqBG/KrboXT2C67BgFtnk7T7Y7iKq4/XvEdDWscz2wws91BOXuMMd4c/c4 OmGW9m3RBLufFrOag1q5yUS9QbFfyqL6dftJP3Zq/xe+mr7sbWbhPVCQFrH3r26mpmy841ym dwQnNcsbIGiBASBSKksOvIDYKa2Wy8htPmWFTEOPRpFXdGQ27awcjjnB42nngyCK5ukZDHi6 w0qK5DNQQCkiweevCIC6wc3p67jl1EMFY5+z+zdTPb3h7LeVnGqW0qBQl99vVFgzLxchKcl0 R/paSFgwqXCZhAKMuUHncJuynDOP7z5LirUeFI8qsBAJi1rXpQoLJTVcW72swZ42IdPiboqx NbTMiNOiE36GqMcTPfKylCbF45JNX4nF9ElM0E+Y8gi4cizJYBRr2FBJgay0b9Cp Message-ID: <8fcf453e-c326-eb14-367a-655cbee5d088@FreeBSD.org> Date: Fri, 30 Nov 2018 14:27:45 +0200 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: BC7D08034C X-Spamd-Result: default: False [-3.79 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FORGED_RECIPIENTS(0.00)[freebsd-net@FreeBSD.org,jpaetzel@freebsd.org ...]; NEURAL_HAM_MEDIUM(-1.00)[-1.000,0]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(-0.20)[+ip4:209.85.128.0/17]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; DMARC_NA(0.00)[FreeBSD.org]; RCPT_COUNT_ONE(0.00)[1]; NEURAL_HAM_LONG(-1.00)[-1.000,0]; RCVD_COUNT_THREE(0.00)[3]; RCVD_TLS_LAST(0.00)[]; MX_GOOD(-0.01)[cached: alt3.gmail-smtp-in.l.google.com]; NEURAL_HAM_SHORT(-0.81)[-0.809,0]; RCVD_IN_DNSWL_NONE(0.00)[169.208.85.209.list.dnswl.org : 127.0.5.0]; IP_SCORE(-0.97)[ipnet: 209.85.128.0/17(-3.43), asn: 15169(-1.33), country: US(-0.09)]; FORGED_SENDER(0.30)[avg@FreeBSD.org,agapon@gmail.com]; R_DKIM_NA(0.00)[]; FREEMAIL_ENVFROM(0.00)[gmail.com]; ASN(0.00)[asn:15169, ipnet:209.85.128.0/17, country:US]; FROM_NEQ_ENVFROM(0.00)[avg@FreeBSD.org,agapon@gmail.com]; MID_RHS_MATCH_FROM(0.00)[]; TO_DOM_EQ_FROM_DOM(0.00)[] X-Rspamd-Server: mx1.freebsd.org X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Nov 2018 12:27:50 -0000 I am working on analyzing a crash with the following stack trace: _sx_slock_hard lagg_media_status ifmedia_ioctl lagg_ioctl ifioctl kern_ioctl sys_ioctl It appears that the crash happened because of a destroyed sx lock. Please note that the code base is the CURRENT from just before the time when the network stack started to use the epoch mechanism. My theory is that the following race happened. lagg_clone_destroy() and lagg_ioctl() were called concurrently. lagg_clone_destroy() won a race to lock sc_sx while lagg_media_status() got blocked on it. I think that after some adaptive spinning the thread was placed on a sleep queue. Then, lagg_clone_destroy() unlocked the lock and proceeded to destroy it. After the lagg_media_status() thread was waken up it found the lock in the destroyed state and crashed on it in a typical fashion (trying to dereference a NULL pointer as a struct thread pointer). My knowledge of the network stack is rather shallow, so I have a few questions. Q1. Is the described race plausible? Q2. If yes, then has this class[*] of races been fixed by the epoch mechanism? I suspect that lagg_ioctl() can still have that race if it's called concurrently with lagg_clone_destroy(). Q3. Is there a way to protect against this type of a race in the code from before the epoch mechanism? I think that the safest thing to do would be to free/destroy the softc only after if_refcount goes to zero, but I could not find any callback for that. That is, I think that this code in if_free() ensures that ifp stays valid as long as it's referenced and it's being accessed under the epoch protection: if (refcount_release(&ifp->if_refcount)) epoch_call(net_epoch_preempt, &ifp->if_epoch_ctx, if_destroy); But the code in lagg_clone_destroy() would destroy the softc without waiting on anything: LAGG_XUNLOCK(sc); ifmedia_removeall(&sc->sc_media); ether_ifdetach(ifp); if_free(ifp); <---- ifp can still be used and valid after this LAGG_LIST_LOCK(); SLIST_REMOVE(&V_lagg_list, sc, lagg_softc, sc_entries); LAGG_LIST_UNLOCK(); LAGG_SX_DESTROY(sc); free(sc, M_DEVBUF); <---- but sc is immediately destroyed in any case } [*] My impression is that the problem is (or, at least, was) not limited to lagg. I think that other drivers can also have a similar race between a clone_destroy callback and an operation on an interface that needs to access a softc. -- Andriy Gapon