From owner-soc-status@FreeBSD.ORG Mon Jun 4 10:54:57 2012 Return-Path: Delivered-To: soc-status@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 312E81065672 for ; Mon, 4 Jun 2012 10:54:57 +0000 (UTC) (envelope-from gpf.kira@gmail.com) Received: from mail-qa0-f49.google.com (mail-qa0-f49.google.com [209.85.216.49]) by mx1.freebsd.org (Postfix) with ESMTP id E1C468FC16 for ; Mon, 4 Jun 2012 10:54:56 +0000 (UTC) Received: by qabj40 with SMTP id j40so1755151qab.15 for ; Mon, 04 Jun 2012 03:54:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=6as02JlE5FWMlRtZsX3oKOtRaIyp+MhKD6UxbK69K54=; b=jWIP33vv3TNWf/lPTj3szrjZNzSoip0NRXt7IuXlpEqRIKM35Wdv3+4YWmjIiPQX0g artyAC1ZNv6xAVBdlpvy4234nolYqTg5NOkZj7UsxjpmRwpQaKUqVndM11wXfOu4QZu4 /rNCgNml0ReqfkHEI3IJBpU8mQfPMgvvrnrZ1x/aFlfYfBTIEMfueoOP+5IdG8PEBHP9 SejU90DHqPmtzadHwaHbIGyk/S2eIN2K2/fiY/Hl3qk3Wp1wHsTlZ9vRB48I9uOPJbVh 2m3PNetZ0R6bcWlJhdW93oiHMd63GBX05hyWIbDRKRVDcZNED44wPvlq1hMMNN8OJnlP /I3w== MIME-Version: 1.0 Received: by 10.224.105.202 with SMTP id u10mr13039528qao.54.1338807290477; Mon, 04 Jun 2012 03:54:50 -0700 (PDT) Received: by 10.229.217.74 with HTTP; Mon, 4 Jun 2012 03:54:50 -0700 (PDT) Date: Mon, 4 Jun 2012 13:54:50 +0300 Message-ID: From: Efstratios Karatzas To: soc-status@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.5 Subject: Kernel Level File Integrity Checker report #2 X-BeenThere: soc-status@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Summer of Code Status Reports and Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jun 2012 10:54:57 -0000 During week #2: * sbin/pefs now uses an ioctl() to grab cipher-text for each 4k block from kernel, which is then used to generate the checksum. * A major concern of mine was to have a hash table that would allow us the fastest lookups possible since lookup is the only operation performed by the kernel fs driver. I ended up implementing cuckoo hashing for that reason; this way, worst lookup case is 2 cache read misses. With cuckoo hashing, we also end up spending 4 times less memory than with separate chaining for the index that is kept in kernel heap. Cuckoo hashing does have some drawbacks like slow insertion time that may result in an infinite loop. I used a python script to experiment and see how probable it is to fall in an infinite loop. For table sizes of next_prime(n + n * 15%) where n = total elements, there's a 1.5% chance. If that happens, we simply allocate new, larger tables and try again with the same hash functions until we succeed. The chance to fall in an infinite loop twice in a row is 1.5% * 1.5% = 0.225%. Since the hash table is created only once in userland during filesystem generation phase, we don't mind the extra time that is spent at this stage to generate the tables. Proper handling of this infinite loop case is still a todo. -- Efstratios "GPF" Karatzas