From owner-freebsd-current@FreeBSD.ORG Sat Sep 8 15:37:09 2007 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0F3DD16A41A for ; Sat, 8 Sep 2007 15:37:09 +0000 (UTC) (envelope-from peter.schuller@infidyne.com) Received: from proxy1.bredband.net (proxy1.bredband.net [195.54.101.71]) by mx1.freebsd.org (Postfix) with ESMTP id BC37713C459 for ; Sat, 8 Sep 2007 15:37:08 +0000 (UTC) (envelope-from peter.schuller@infidyne.com) Received: from prometheus.scode.org (85.229.22.84) by proxy1.bredband.net (7.3.127) id 46DEA057000EAC74 for freebsd-current@freebsd.org; Sat, 8 Sep 2007 17:16:44 +0200 Received: from localhost (localhost [127.0.0.1]) by prometheus.scode.org (Postfix) with ESMTP id BBE521CC8E for ; Sat, 8 Sep 2007 17:17:05 +0200 (CEST) From: Peter Schuller To: freebsd-current@freebsd.org Date: Sat, 8 Sep 2007 17:17:04 +0200 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200709081717.05006.peter.schuller@infidyne.com> Subject: Swapping on ZFS - stability issue X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 08 Sep 2007 15:37:09 -0000 Hello, Yesterday a machine went unresponsive (pinged, TCP connections died instantly after establishment) died after someone started MySQL with cranked up buffer sizes. When I got to the machine there was no sensible message on the console, but I did see output to the effect of "swap", "pager" and such all garbled. The machine had swap configured on a ZVOL which made me suspicious, so after rebooting I tried running a simple loop: for (int i =0 ; i < 4000; i++) { char* mem = (char*)malloc(1024*1024); memset(mem, NULL, 1024*1024); } The machine swaped a select few megs and then died completely (no messages; couldn't kill top or the test application, though the console still worked). This was done on a machine with 4 GB of RAM and 4 GB of swap on a ZVOL on 7-CURRENT. Later on I tried this on my workstation too (3 GB of RAM). Swapping to glabel-on-disk behaved as you would expect; it swapped hundreds of megs to swap device without the system every being adversely affected beyond the obvious result of disk activity and memory pressue. However, swapping either to a ZVOL or a file (through md) on ZFS exhibited similar behavior to that observed on the original server; it would get off a few megs fairly slowly followed by the system becoming completely unusable, with both top and the test app not being killable by CTRL-C, though virtual console switching was operational. Swapping to ZFS was significantly slower (on the order of factor 10 or so), yet I do not believe slowness in and of itself is the only explanation for the difference (as I would still have expected to be able to recover, even if it might have taken a bit longer). I'll see if I can confirm behavior on a more recent CURRENT. The above was on CURRENT:s from 1-2 months ago in both cases. -- / Peter Schuller PGP userID: 0xE9758B7D or 'Peter Schuller ' Key retrieval: Send an E-Mail to getpgpkey@scode.org E-Mail: peter.schuller@infidyne.com Web: http://www.scode.org