From owner-svn-src-all@FreeBSD.ORG Thu Jan 15 10:48:25 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id DEDFBF5B; Thu, 15 Jan 2015 10:48:25 +0000 (UTC) Received: from mail.turbocat.net (heidi.turbocat.net [88.198.202.214]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 7F864D65; Thu, 15 Jan 2015 10:48:24 +0000 (UTC) Received: from laptop015.home.selasky.org (cm-176.74.213.204.customer.telag.net [176.74.213.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id F140C1FE022; Thu, 15 Jan 2015 11:48:21 +0100 (CET) Message-ID: <54B79B25.3070707@selasky.org> Date: Thu, 15 Jan 2015 11:49:09 +0100 From: Hans Petter Selasky User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0 MIME-Version: 1.0 To: Konstantin Belousov Subject: Re: svn commit: r277199 - in head/sys: fs/devfs kern References: <201501142207.t0EM7Dfn041543@svn.freebsd.org> <20150115033109.GM42409@kib.kiev.ua> <54B76F2B.8040106@selasky.org> <20150115093841.GX42409@kib.kiev.ua> In-Reply-To: <20150115093841.GX42409@kib.kiev.ua> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Jan 2015 10:48:26 -0000 On 01/15/15 10:38, Konstantin Belousov wrote: > On Thu, Jan 15, 2015 at 08:41:31AM +0100, Hans Petter Selasky wrote: >> On 01/15/15 04:31, Konstantin Belousov wrote: >>> On Wed, Jan 14, 2015 at 10:07:13PM +0000, Hans Petter Selasky wrote: >>>> Author: hselasky >>>> Date: Wed Jan 14 22:07:13 2015 >>>> New Revision: 277199 >>>> URL: https://svnweb.freebsd.org/changeset/base/277199 >>>> >>>> Log: >>>> Avoid race with "dev_rel()" when using the recently added >>>> "delist_dev()" function. Make sure the character device structure >>>> doesn't go away until the end of the "destroy_dev()" function due to >>>> concurrently running cleanup code inside "devfs_populate()". >>>> >>>> MFC after: 1 week >>>> Reported by: dchagin@ >>>> >>>> Modified: >>>> head/sys/fs/devfs/devfs_devs.c >>>> head/sys/kern/kern_conf.c >>>> >>>> Modified: head/sys/fs/devfs/devfs_devs.c >>>> ============================================================================== >>>> --- head/sys/fs/devfs/devfs_devs.c Wed Jan 14 22:05:57 2015 (r277198) >>>> +++ head/sys/fs/devfs/devfs_devs.c Wed Jan 14 22:07:13 2015 (r277199) >>>> @@ -137,6 +137,12 @@ devfs_alloc(int flags) >>>> vfs_timestamp(&ts); >>>> cdev->si_atime = cdev->si_mtime = cdev->si_ctime = ts; >>>> cdev->si_cred = NULL; >>>> + /* >>>> + * Avoid race with dev_rel() by setting the initial >>>> + * reference count to 1. This last reference is taken >>>> + * by the destroy_dev() function. >>>> + */ >>>> + cdev->si_refcount = 1; >>> This is wrong. Not all devices are destroyed with destroy_dev(). >>> dev_rel() must be allowed to clean up allocated device. >>> >>> That said, I do not understand what race you are trying to solve. >>> Freeing of the accessible cdev memory cannot happen in parallel while >>> dev_mtx is owned. >>> >>> Please do not commit (to devfs) without seeking for the review first. >> >> Hi Konstantin, >> >> From my analysis there are basically three ways for a cdev to die: >> >> 1) Through dev_free_devlocked() >> 2) Through destroy_devl() which then later calls dev_free_devlocked() >> 3) Through destroy_dev_sched() which really is a wrapper around >> destroy_devl(). > You only look from the consumers PoV. Devfs cdev can be dereferenced > because e.g. clone handler decides that cdev is not valid/needed, > and now the memory is never freed due to extra reference. > > Do not assume that all cdevs go through destroy_dev(). Hi, All cdevs go through either case #2 or case #1 eventually from what I can see, including clone devices, which call destroy_devl() in the end aswell. See the "clone_destroy()" function! I did a simple test with /dev/dspX.Y which use clone devices. I did: vmstat -m | grep -i devfs1 1) Before plugging USB audio device: DEVFS1 157 79K - 189 512 2) Plug USB audio device: DEVFS1 164 82K - 196 512 3) Play something (env AUDIODEV=/dev/dsp2.4 play track01.wav) DEVFS1 165 83K - 197 512 4) Stop playing (clone device still exits): DEVFS1 165 83K - 197 512 5) Detach USB audio device: DEVFS1 157 79K - 197 512 I see no leakage in that case! Other case: 1) After "kldload if_tap" DEVFS1 158 79K - 201 512 2) After creating TAP device (cat /dev/tap99) DEVFS1 159 80K - 204 512 3) After creating TAP device (cat /dev/tap101) DEVFS1 160 80K - 207 512 5) After "kldunload if_tap": DEVFS1 158 79K - 207 512 6) After "kldload if_tap" again: DEVFS1 158 79K - 207 512 I see no leakage in that case either! Are there more cases which I don't see? > >> >> In the case of direct free through #1, the reference count is ignored >> and it doesn't matter if it is one or zero. Only in the case of >> destruction through destroy_dev() it matters. >> >> Like the comment says in destroy_devl(): >> >> /* Avoid race with dev_rel() */ >> >> The problem is that the "cdev->si_refcount" is zero when the initial >> devfs_create() is called. Then one ref is made. When we clear the >> CDP_ACTIVE flag in devfs_destroy() it instructs a !parallel! running >> process to destroy all the FS related structures and the reference count >> goes back to zero when the "cdp" is removed from the "cdevp_list". Then >> the cdev is freed too early. This happens because destroy_devl() is >> dropping the dev_lock() to sleep waiting for pending references. > Basically, this is very good explanation why your delist hack is wrong, > for one of the reason. Another reason is explained below. > You are trying to cover it with additional reference, but this is wrong > as well. > >> >> Do you see something else? > > I think that what you are trying to do with the CDP_ACTIVE hack is doomed > anyway, because you are allowing for devfs directory to have two entries > with the same name, until the populate loop cleans up the inactive one. > In the meantime, any access to the directory operates on random entry. The entry will not be random, because upon an open() call to a character device, I believe the devfs_lookup() function will be called, which always populate the devfs tree at first by calls to devfs_populate_xxx(). Any delisted devices which don't have the "CDP_ACTIVE" bit set, will never be seen by any open. Regarding leftover filedescriptors which still access the old "cdev" this is not a problem, and these will be closed when the si_refcount goes to zero after the destroy_devl() call. > > The checks for existent names in make_dev() are performed for the reason, > and you makes the rounds to effectively ignore it. > These checks are still correct and don't conflict with my patch from what I can see. Else the existing destroy_devl() would also be broken even before my patch with regard to the "random" selection of character devices at open() from userspace. --HPS