Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 27 Feb 2010 11:21:17 +0200
From:      Mikolaj Golub <to.my.trociny@gmail.com>
To:        freebsd-net@freebsd.org
Subject:   kmem leakage on tun/tap device removal
Message-ID:  <86hbp3jape.fsf@kopusha.onet>

next in thread | raw e-mail | index | archive | help
Hi,

Recently I run some tests, which create/destroy tun interface in loop, and
after several hours my system panicked with "kmem_map too small". It has
appeared that tun (or tap) device does not free memory after the device
destroy:

zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1'
         Type InUse MemUse HighUse Requests  Size(s)
       DEVFS1   139    35K       -      157  256
zhuzha:/usr/src/sys/kern% sudo ifconfig tun0 create          
zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1'
         Type InUse MemUse HighUse Requests  Size(s)
       DEVFS1   140    35K       -      159  256
zhuzha:/usr/src/sys/kern% sudo ifconfig tun0 destroy         
zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1'
         Type InUse MemUse HighUse Requests  Size(s)
       DEVFS1   140    35K       -      159  256

And when running create/destroy in loop:

 Time          Type InUse MemUse HighUse Requests  Size(s)
09:20        DEVFS1   104    26K       -      113  256
09:25        DEVFS1  8504  2126K       -    16912  256
09:30        DEVFS1 31602  7901K       -    63108  256
09:35        DEVFS1 54316 13579K       -   108536  256
09:40        DEVFS1 77068 19267K       -   154040  256
09:45        DEVFS1 99764 24941K       -   199431  256
09:50        DEVFS1 122408 30602K       -   244719  256
09:55        DEVFS1 144689 36173K       -   289281  256

It looks like the problem is that tun/tap_clone_create() calls make_dev() and
then dev_ref(dev). make_dev() calls itself dev_refl(), so after device creating
we have si_refcount == 2. But on device removal (tun/tap_clone_destroy())
dev_rel() is never called, only destroy_dev(dev), which checks that
si_refcount is still not zero and places the dev in dead_cdevsw.d_devs list.

And running kgdb we can see the following picture:

(kgdb) p *dead_cdevsw.d_devs.lh_first
$2 = {__si_reserved = 0x0, si_flags = 0, si_atime = {tv_sec = 1267218482, tv_nsec = 0}, si_ctime = {
    tv_sec = 1267218482, tv_nsec = 0}, si_mtime = {tv_sec = 1267218482, tv_nsec = 0}, si_uid = 66, 
  si_gid = 68, si_mode = 384, si_cred = 0x0, si_drv0 = 0, si_refcount = 1, si_list = {
    le_next = 0xcd183100, le_prev = 0xc0d90278}, si_clone = {le_next = 0x0, le_prev = 0xc5c83b50}, 
  si_children = {lh_first = 0x0}, si_siblings = {le_next = 0x0, le_prev = 0x0}, si_parent = 0x0, 
  si_name = 0xcaadc278 "tun0", si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, si_iosize_max = 0, 
  si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, 
  __si_namebuf = "tun0", '\0' <repeats 59 times>}
(kgdb) p *dead_cdevsw.d_devs.lh_first->si_list.le_next
$3 = {__si_reserved = 0x0, si_flags = 0, si_atime = {tv_sec = 1267218421, tv_nsec = 0}, si_ctime = {
    tv_sec = 1267218421, tv_nsec = 0}, si_mtime = {tv_sec = 1267218421, tv_nsec = 0}, si_uid = 66, 
  si_gid = 68, si_mode = 384, si_cred = 0x0, si_drv0 = 0, si_refcount = 1, si_list = {
    le_next = 0xcd183000, le_prev = 0xcaadc238}, si_clone = {le_next = 0x0, le_prev = 0xc5c83b50}, 
  si_children = {lh_first = 0x0}, si_siblings = {le_next = 0x0, le_prev = 0x0}, si_parent = 0x0, 
  si_name = 0xcd183178 "tun0", si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, si_iosize_max = 0, 
  si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, 
  __si_namebuf = "tun0", '\0' <repeats 59 times>}
... and so on.

Is dev_ref() needed in tun_clone_create() after make_dev() call? Can't it be
safely removed as in the patch below? I have run some tests with the patch and
it looks like it works for me.

--- sys/net/if_tap.c.orig	2010-02-26 23:18:55.000000000 +0200
+++ sys/net/if_tap.c	2010-02-27 00:03:51.000000000 +0200
@@ -192,10 +192,8 @@ tap_clone_create(struct if_clone *ifc, i
 	if (i) {
 		dev = make_dev(&tap_cdevsw, unit | extra,
 		     UID_ROOT, GID_WHEEL, 0600, "%s%d", ifc->ifc_name, unit);
-		if (dev != NULL) {
-			dev_ref(dev);
+		if (dev != NULL)
 			dev->si_flags |= SI_CHEAPCLONE;
-		}
 	}
 
 	tapcreate(dev);
@@ -383,10 +381,8 @@ tapclone(void *arg, struct ucred *cred, 
 
 		*dev = make_dev(&tap_cdevsw, unit | extra,
 		     UID_ROOT, GID_WHEEL, 0600, "%s", name);
-		if (*dev != NULL) {
-			dev_ref(*dev);
+		if (*dev != NULL)
 			(*dev)->si_flags |= SI_CHEAPCLONE;
-		}
 	}
 
 	if_clone_create(name, namelen, NULL);
--- sys/net/if_tun.c.orig	2010-02-26 23:18:45.000000000 +0200
+++ sys/net/if_tun.c	2010-02-27 00:03:24.000000000 +0200
@@ -188,10 +188,8 @@ tun_clone_create(struct if_clone *ifc, i
 		/* No preexisting struct cdev *, create one */
 		dev = make_dev(&tun_cdevsw, unit,
 		    UID_UUCP, GID_DIALER, 0600, "%s%d", ifc->ifc_name, unit);
-		if (dev != NULL) {
-			dev_ref(dev);
+		if (dev != NULL)
 			dev->si_flags |= SI_CHEAPCLONE;
-		}
 	}
 	tuncreate(ifc->ifc_name, dev);
 
@@ -239,10 +237,8 @@ tunclone(void *arg, struct ucred *cred, 
 		/* No preexisting struct cdev *, create one */
 		*dev = make_dev(&tun_cdevsw, u,
 		    UID_UUCP, GID_DIALER, 0600, "%s", name);
-		if (*dev != NULL) {
-			dev_ref(*dev);
+		if (*dev != NULL)
 			(*dev)->si_flags |= SI_CHEAPCLONE;
-		}
 	}
 
 	if_clone_create(name, namelen, NULL);


-- 
Mikolaj Golub



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86hbp3jape.fsf>