From owner-freebsd-net@FreeBSD.ORG Sat Feb 27 09:21:34 2010 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 73064106568A for ; Sat, 27 Feb 2010 09:21:34 +0000 (UTC) (envelope-from to.my.trociny@gmail.com) Received: from mail-ew0-f226.google.com (mail-ew0-f226.google.com [209.85.219.226]) by mx1.freebsd.org (Postfix) with ESMTP id 06B898FC2B for ; Sat, 27 Feb 2010 09:21:32 +0000 (UTC) Received: by ewy26 with SMTP id 26so474328ewy.3 for ; Sat, 27 Feb 2010 01:21:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:to:subject:organization:from :date:message-id:user-agent:mime-version:content-type; bh=0ou8JKVLBOl78Dq+XlfqH4sH2XYSC30RlnuS4K5U/HY=; b=heTEpJoYIEVzWf9pkQAqtFds2STrQD/Pe6LYPuEPi76U4dhBWWe0j4PPSk3HO8pJ6v gdglcA8VdzPWQ24UaRGG9N49PGvaVIP5W5HoEcGWAtuOkYVJRbGU9gM+BXc/JRywyr4l AD6OJPp6mmerFZMWBRTrSsN7Ib1LFsuZEzaOU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=to:subject:organization:from:date:message-id:user-agent :mime-version:content-type; b=m+y7yh7MBHLzoslLaDY5h8um2Q51fKRizGBdn/iGqwLv/1xaO+c09Zt+MAn5aIwx2E CMPJHWbYAXEBXz721HlqCOqTZxL9J5YvwOnLfUKhMo772gwGFkgohejm5BuyQQm6WHd4 mpzutcuPKsx9meltrdC18NRaxhHKdEE52Vbmw= Received: by 10.213.97.85 with SMTP id k21mr716805ebn.79.1267262481064; Sat, 27 Feb 2010 01:21:21 -0800 (PST) Received: from localhost ([95.69.170.185]) by mx.google.com with ESMTPS id 10sm4456081eyd.4.2010.02.27.01.21.19 (version=TLSv1/SSLv3 cipher=RC4-MD5); Sat, 27 Feb 2010 01:21:19 -0800 (PST) To: freebsd-net@freebsd.org Organization: TOA Ukraine From: Mikolaj Golub Date: Sat, 27 Feb 2010 11:21:17 +0200 Message-ID: <86hbp3jape.fsf@kopusha.onet> User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Subject: kmem leakage on tun/tap device removal X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Feb 2010 09:21:34 -0000 Hi, Recently I run some tests, which create/destroy tun interface in loop, and after several hours my system panicked with "kmem_map too small". It has appeared that tun (or tap) device does not free memory after the device destroy: zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1' Type InUse MemUse HighUse Requests Size(s) DEVFS1 139 35K - 157 256 zhuzha:/usr/src/sys/kern% sudo ifconfig tun0 create zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1' Type InUse MemUse HighUse Requests Size(s) DEVFS1 140 35K - 159 256 zhuzha:/usr/src/sys/kern% sudo ifconfig tun0 destroy zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1' Type InUse MemUse HighUse Requests Size(s) DEVFS1 140 35K - 159 256 And when running create/destroy in loop: Time Type InUse MemUse HighUse Requests Size(s) 09:20 DEVFS1 104 26K - 113 256 09:25 DEVFS1 8504 2126K - 16912 256 09:30 DEVFS1 31602 7901K - 63108 256 09:35 DEVFS1 54316 13579K - 108536 256 09:40 DEVFS1 77068 19267K - 154040 256 09:45 DEVFS1 99764 24941K - 199431 256 09:50 DEVFS1 122408 30602K - 244719 256 09:55 DEVFS1 144689 36173K - 289281 256 It looks like the problem is that tun/tap_clone_create() calls make_dev() and then dev_ref(dev). make_dev() calls itself dev_refl(), so after device creating we have si_refcount == 2. But on device removal (tun/tap_clone_destroy()) dev_rel() is never called, only destroy_dev(dev), which checks that si_refcount is still not zero and places the dev in dead_cdevsw.d_devs list. And running kgdb we can see the following picture: (kgdb) p *dead_cdevsw.d_devs.lh_first $2 = {__si_reserved = 0x0, si_flags = 0, si_atime = {tv_sec = 1267218482, tv_nsec = 0}, si_ctime = { tv_sec = 1267218482, tv_nsec = 0}, si_mtime = {tv_sec = 1267218482, tv_nsec = 0}, si_uid = 66, si_gid = 68, si_mode = 384, si_cred = 0x0, si_drv0 = 0, si_refcount = 1, si_list = { le_next = 0xcd183100, le_prev = 0xc0d90278}, si_clone = {le_next = 0x0, le_prev = 0xc5c83b50}, si_children = {lh_first = 0x0}, si_siblings = {le_next = 0x0, le_prev = 0x0}, si_parent = 0x0, si_name = 0xcaadc278 "tun0", si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, si_iosize_max = 0, si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, __si_namebuf = "tun0", '\0' } (kgdb) p *dead_cdevsw.d_devs.lh_first->si_list.le_next $3 = {__si_reserved = 0x0, si_flags = 0, si_atime = {tv_sec = 1267218421, tv_nsec = 0}, si_ctime = { tv_sec = 1267218421, tv_nsec = 0}, si_mtime = {tv_sec = 1267218421, tv_nsec = 0}, si_uid = 66, si_gid = 68, si_mode = 384, si_cred = 0x0, si_drv0 = 0, si_refcount = 1, si_list = { le_next = 0xcd183000, le_prev = 0xcaadc238}, si_clone = {le_next = 0x0, le_prev = 0xc5c83b50}, si_children = {lh_first = 0x0}, si_siblings = {le_next = 0x0, le_prev = 0x0}, si_parent = 0x0, si_name = 0xcd183178 "tun0", si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, si_iosize_max = 0, si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, __si_namebuf = "tun0", '\0' } ... and so on. Is dev_ref() needed in tun_clone_create() after make_dev() call? Can't it be safely removed as in the patch below? I have run some tests with the patch and it looks like it works for me. --- sys/net/if_tap.c.orig 2010-02-26 23:18:55.000000000 +0200 +++ sys/net/if_tap.c 2010-02-27 00:03:51.000000000 +0200 @@ -192,10 +192,8 @@ tap_clone_create(struct if_clone *ifc, i if (i) { dev = make_dev(&tap_cdevsw, unit | extra, UID_ROOT, GID_WHEEL, 0600, "%s%d", ifc->ifc_name, unit); - if (dev != NULL) { - dev_ref(dev); + if (dev != NULL) dev->si_flags |= SI_CHEAPCLONE; - } } tapcreate(dev); @@ -383,10 +381,8 @@ tapclone(void *arg, struct ucred *cred, *dev = make_dev(&tap_cdevsw, unit | extra, UID_ROOT, GID_WHEEL, 0600, "%s", name); - if (*dev != NULL) { - dev_ref(*dev); + if (*dev != NULL) (*dev)->si_flags |= SI_CHEAPCLONE; - } } if_clone_create(name, namelen, NULL); --- sys/net/if_tun.c.orig 2010-02-26 23:18:45.000000000 +0200 +++ sys/net/if_tun.c 2010-02-27 00:03:24.000000000 +0200 @@ -188,10 +188,8 @@ tun_clone_create(struct if_clone *ifc, i /* No preexisting struct cdev *, create one */ dev = make_dev(&tun_cdevsw, unit, UID_UUCP, GID_DIALER, 0600, "%s%d", ifc->ifc_name, unit); - if (dev != NULL) { - dev_ref(dev); + if (dev != NULL) dev->si_flags |= SI_CHEAPCLONE; - } } tuncreate(ifc->ifc_name, dev); @@ -239,10 +237,8 @@ tunclone(void *arg, struct ucred *cred, /* No preexisting struct cdev *, create one */ *dev = make_dev(&tun_cdevsw, u, UID_UUCP, GID_DIALER, 0600, "%s", name); - if (*dev != NULL) { - dev_ref(*dev); + if (*dev != NULL) (*dev)->si_flags |= SI_CHEAPCLONE; - } } if_clone_create(name, namelen, NULL); -- Mikolaj Golub