From owner-freebsd-bugs@FreeBSD.ORG Wed Mar 3 11:30:02 2010 Return-Path: Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 7F9321065670 for ; Wed, 3 Mar 2010 11:30:02 +0000 (UTC) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (freefall.freebsd.org [IPv6:2001:4f8:fff6::28]) by mx1.freebsd.org (Postfix) with ESMTP id 3B6428FC0A for ; Wed, 3 Mar 2010 11:30:02 +0000 (UTC) Received: from freefall.freebsd.org (localhost [127.0.0.1]) by freefall.freebsd.org (8.14.3/8.14.3) with ESMTP id o23BU2w6068932 for ; Wed, 3 Mar 2010 11:30:02 GMT (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.14.3/8.14.3/Submit) id o23BU2A9068931; Wed, 3 Mar 2010 11:30:02 GMT (envelope-from gnats) Resent-Date: Wed, 3 Mar 2010 11:30:02 GMT Resent-Message-Id: <201003031130.o23BU2A9068931@freefall.freebsd.org> Resent-From: FreeBSD-gnats-submit@FreeBSD.org (GNATS Filer) Resent-To: freebsd-bugs@FreeBSD.org Resent-Reply-To: FreeBSD-gnats-submit@FreeBSD.org, Peter Jeremy Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 5A03C1065670 for ; Wed, 3 Mar 2010 11:20:02 +0000 (UTC) (envelope-from peterjeremy@acm.org) Received: from mail15.syd.optusnet.com.au (mail15.syd.optusnet.com.au [211.29.132.196]) by mx1.freebsd.org (Postfix) with ESMTP id B0BDE8FC37 for ; Wed, 3 Mar 2010 11:20:01 +0000 (UTC) Received: from server.vk2pj.dyndns.org (c122-106-253-149.belrs3.nsw.optusnet.com.au [122.106.253.149]) by mail15.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id o23BJoms000494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 3 Mar 2010 22:19:58 +1100 Received: from server.vk2pj.dyndns.org (localhost.vk2pj.dyndns.org [127.0.0.1]) by server.vk2pj.dyndns.org (8.14.3/8.14.3) with ESMTP id o23BJjEN083245; Wed, 3 Mar 2010 22:19:45 +1100 (EST) (envelope-from peter@server.vk2pj.dyndns.org) Received: (from peter@localhost) by server.vk2pj.dyndns.org (8.14.3/8.14.3/Submit) id o23BJjqr083244; Wed, 3 Mar 2010 22:19:45 +1100 (EST) (envelope-from peter) Message-Id: <201003031119.o23BJjqr083244@server.vk2pj.dyndns.org> Date: Wed, 3 Mar 2010 22:19:45 +1100 (EST) From: Peter Jeremy To: FreeBSD-gnats-submit@FreeBSD.org X-Send-Pr-Version: 3.113 Cc: Subject: bin/144446: [patch] db(3) fails with large block sizes X-BeenThere: freebsd-bugs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Peter Jeremy List-Id: Bug reports List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Mar 2010 11:30:02 -0000 >Number: 144446 >Category: bin >Synopsis: [patch] db(3) fails with large block sizes >Confidential: no >Severity: non-critical >Priority: low >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Wed Mar 03 11:30:01 UTC 2010 >Closed-Date: >Last-Modified: >Originator: Peter Jeremy >Release: FreeBSD 8.0-STABLE amd64 >Organization: Alcatel-Lucent Australia >Environment: System: FreeBSD server.vk2pj.dyndns.org 8.0-STABLE FreeBSD 8.0-STABLE #1: Wed Jan 27 06:55:10 EST 2010 root@server.vk2pj.dyndns.org:/var/obj/usr/src/sys/server amd64 >Description: Whilst trying to port db(3) to a Solaris system, I have identified two issues with the existing hash(3) code. Firstly, when creating a new hash database, the bucket size defaults to the st_blksize of the file (hash/hash.c::init_hash()). There is no sanity checking to ensure that st_blksize is within valid limits (hash/hash.h defined MAX_BSIZE as 65536). In FreeBSD, st_blksize is currently hardwired to PAGE_SIZE in kern/vfs_vnops.c::vn_stat() so this is purely a theoretical issue on FreeBSD. Solaris exposes the blocksize from the underlying filesystem - and in the case of ZFS, this is 128KB, which exceeds MAX_BSIZE. In my case, the symptoms were that when sequentially reading the database (via DB->seq()), the returned keys were padded with 64KB of NULs. Secondly, when the bucket size is set to 64KB (MAX_BSIZE), non-trivial databases crash reporting: "HASH: Out of overflow pages. Increase page size" It's not clear what triggers this. Thirdly, whilst writing this PR, I've noticed that hash(3) states that the default hash table bucket size is 256 bytes. The actual default (as per DEF_BUCKET_SIZE in hash/hash.h) is 4096 bytes. >How-To-Repeat: First problem can't be reproduced on FreeBSD but occurs on at least Solaris and OpenSolaris with ZFS. The second problem can be reproduced by extending the db test tool (lib/libc/db/test/run.test, test20) to include a bucket size of 65536. Based on the progression of fill factor's in test20, the logical fill factors are 2735 3647 5471, however the test fails with each of these values as well as 8, 341 and 10001. All other bucket sizes appear to work successfully - which suggests there is a problem with this particular bucket size. >Fix: First patch adds check for excessive st_blksize. Note that this assumes that st_blksize is a power of 2. It might be reasonable to verify that st_blksize isn't too small but I'm not sure what the lower limit is. Index: hash/hash.c =================================================================== RCS file: /usr/ncvs/src/lib/libc/db/hash/hash.c,v retrieving revision 1.21.2.2 diff -u -r1.21.2.2 hash.c --- hash/hash.c 28 Aug 2009 19:48:06 -0000 1.21.2.2 +++ hash/hash.c 3 Mar 2010 09:39:45 -0000 @@ -293,6 +293,8 @@ if (stat(file, &statbuf)) return (NULL); hashp->BSIZE = statbuf.st_blksize; + if (hashp->BSIZE > MAX_BSIZE) + hashp->BSIZE = MAX_BSIZE; hashp->BSHIFT = __log2(hashp->BSIZE); } This is simply a work-around for the second problem. I'm not certain what the actual problem is bu this avoids it. Index: hash/hash.h =================================================================== RCS file: /usr/ncvs/src/lib/libc/db/hash/hash.h,v retrieving revision 1.9.2.1 diff -u -r1.9.2.1 hash.h --- hash/hash.h 3 Aug 2009 08:13:06 -0000 1.9.2.1 +++ hash/hash.h 3 Mar 2010 09:39:45 -0000 @@ -118,7 +118,7 @@ /* * Constants */ -#define MAX_BSIZE 65536 /* 2^16 */ +#define MAX_BSIZE 32768 /* 2^15 but should be 65536 */ #define MIN_BUFFERS 6 #define MINHDRSIZE 512 #define DEF_BUFSIZE 65536 /* 64 K */ The following updates the man page to contain the current default bucket size as per DEF_BUCKET_SIZE in hash/hash.h Index: man/hash.3 =================================================================== RCS file: /usr/ncvs/src/lib/libc/db/man/hash.3,v retrieving revision 1.9.10.1 diff -u -r1.9.10.1 hash.3 --- man/hash.3 3 Aug 2009 08:13:06 -0000 1.9.10.1 +++ man/hash.3 3 Mar 2010 09:50:52 -0000 @@ -78,7 +78,7 @@ element defines the .Nm -table bucket size, and is, by default, 256 bytes. +table bucket size, and is, by default, 4096 bytes. It may be preferable to increase the page size for disk-resident tables and tables with large data items. .It Va ffactor >Release-Note: >Audit-Trail: >Unformatted: