Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 15 Sep 2012 15:50:05 +0300
From:      Mikolaj Golub <trociny@FreeBSD.org>
To:        Miroslav Lachman <000.fbsd@quip.cz>
Cc:        Hartmut Brandt <harti@FreeBSD.org>, freebsd-stable@freebsd.org
Subject:   Re: bsnmpd always died on HDD detach
Message-ID:  <20120915125003.GA91163@gmail.com>
In-Reply-To: <504D10A7.1070701@quip.cz>
References:  <504D10A7.1070701@quip.cz>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZGiS0Q5IWpPtfppv
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

On Sun, Sep 09, 2012 at 11:56:55PM +0200, Miroslav Lachman wrote:
> I am running bsnmpd with basic snmpd.config (only community and location 
> changed).
> 
> When there is a problem with HDD and disk disapeared from ATA channel 
> (eg.: disc physically removed) the bsnmpd always dumps core:
> 
> kernel: pid 1188 (bsnmpd), uid 0: exited on signal 11 (core dumped)
> 
> I see this for a long rime on all releases of 7.x and 8.x branches (i386 
> and amd64). I did not tested 9.x.

Ok, I was able to to reproduce this under qemu doing
  
  atacontrol detach ata1

It crashes in snmp_hostres module, in

  refresh_device_tbl->refresh_disk_storage_tbl->disk_OS_get_ATA_disks

when traversing device_map list and dereferencing map->entry_p, which
is NULL here.

device_map table is used for consistent device table indexing.

refresh_device_tbl(), refresh routine for hrDeviceTable, checks the
list of available devices and calls device_entry_delete() for devices
that have gone. It does not remove the entry from device_map table,
but just sets entry_p to NULL for it (to preserve index reuse by
another device).

Then refresh_disk_storage_tbl() is called, which in turn calls

 disk_OS_get_ATA_disks();
 disk_OS_get_MD_disks();
 disk_OS_get_disks();

and it crashes in disk_OS_get_ATA_disks() when the removed map entry
is dereferenced.

I am attaching the patch that fixes the issue for me.

I was wandering why the issue was not observed after md device
removal, as disk_OS_get_MD_disks() did the same things. It has turned
out that hostres just does not see md devices, so this function is
currently useless. hostres gets devices from devinfo(3), which does
not return md devices.

disk_OS_get_disks() calls kern.disks sysctl to get the list of disks,
and uses device_map differently, so it is not affected.

-- 
Mikolaj Golub

--ZGiS0Q5IWpPtfppv
Content-Type: text/x-diff; charset=us-ascii
Content-Disposition: inline; filename="hostres_diskstorage_tbl.c.skip.patch"

Index: usr.sbin/bsnmpd/modules/snmp_hostres/hostres_diskstorage_tbl.c
===================================================================
--- usr.sbin/bsnmpd/modules/snmp_hostres/hostres_diskstorage_tbl.c	(revision 240529)
+++ usr.sbin/bsnmpd/modules/snmp_hostres/hostres_diskstorage_tbl.c	(working copy)
@@ -287,6 +287,9 @@ disk_OS_get_ATA_disks(void)
 
 	/* Walk over the device table looking for ata disks */
 	STAILQ_FOREACH(map, &device_map, link) {
+		/* Skip deleted entries. */
+		if (map->entry_p == NULL)
+			continue;
 		for (found = lookup; found->media != DSM_UNKNOWN; found++) {
 			if (strncmp(map->name_key, found->dev_name,
 			    strlen(found->dev_name)) != 0)
@@ -345,6 +348,9 @@ disk_OS_get_MD_disks(void)
 
 	/* Look for md devices */
 	STAILQ_FOREACH(map, &device_map, link) {
+		/* Skip deleted entries. */
+		if (map->entry_p == NULL)
+			continue;
 		if (sscanf(map->name_key, "md%d", &unit) != 1)
 			continue;
 

--ZGiS0Q5IWpPtfppv--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120915125003.GA91163>