From owner-freebsd-dtrace@freebsd.org Thu Apr 6 11:26:33 2017 Return-Path: Delivered-To: freebsd-dtrace@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48934D2FA20 for ; Thu, 6 Apr 2017 11:26:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A10CFB6C for ; Thu, 6 Apr 2017 11:26:29 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA14105; Thu, 06 Apr 2017 14:26:26 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1cw5YY-00048G-AU; Thu, 06 Apr 2017 14:26:26 +0300 To: illumos Developer , freebsd-dtrace@FreeBSD.org From: Andriy Gapon Subject: dtrace: normalization of stddev Message-ID: <97006cf8-369d-6649-4595-43178789feba@FreeBSD.org> Date: Thu, 6 Apr 2017 14:25:29 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-dtrace@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "A discussion list for developers working on DTrace in FreeBSD." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Apr 2017 11:26:33 -0000 It seems that currently normalization of stddev aggregation is done incorrectly. We divide both the sum of values and the sum of their squares by the normalization factor. But we should divide the sum of squares by the normalization factor squared to scale the original values properly. --- lib/libdtrace/common/dt_consume.c +++ lib/libdtrace/common/dt_consume.c @@ -389,8 +389,10 @@ dt_stddev(uint64_t *data, uint64_t normal) * The standard approximation for standard deviation is * sqrt(average(x**2) - average(x)**2), i.e. the square root * of the average of the squares minus the square of the average. + * When normalizing, we should divide the sum of x**2 by normal**2. */ dt_divide_128(data + 2, normal, avg_of_squares); + dt_divide_128(avg_of_squares, normal, avg_of_squares); dt_divide_128(avg_of_squares, data[0], avg_of_squares); norm_avg = (int64_t)data[1] / (int64_t)normal / (int64_t)data[0]; A primitive test script: BEGIN { i = 100; @s = avg(i); @v = stddev(i); i = 200; @s = avg(i); @v = stddev(i); i = 300; @s = avg(i); @v = stddev(i); i = 400; @s = avg(i); @v = stddev(i); i = 500; @s = avg(i); @v = stddev(i); i = 600; @s = avg(i); @v = stddev(i); i = 700; @s = avg(i); @v = stddev(i); i = 800; @s = avg(i); @v = stddev(i); i = 900; @s = avg(i); @v = stddev(i); printa("%@3d %@3d\n", @s, @v); normalize(@s, 10); normalize(@v, 10); printa("%@3d %@3d\n", @s, @v); exit(0); } Without the patch it produces: 500 258 50 170 With the patch: 500 258 50 25 -- Andriy Gapon From owner-freebsd-dtrace@freebsd.org Thu Apr 6 14:11:10 2017 Return-Path: Delivered-To: freebsd-dtrace@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 700CED31F3C for ; Thu, 6 Apr 2017 14:11:10 +0000 (UTC) (envelope-from bryancantrill@gmail.com) Received: from mail-qt0-x235.google.com (mail-qt0-x235.google.com [IPv6:2607:f8b0:400d:c0d::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 2BB6868 for ; Thu, 6 Apr 2017 14:11:10 +0000 (UTC) (envelope-from bryancantrill@gmail.com) Received: by mail-qt0-x235.google.com with SMTP id i34so37004335qtc.0 for ; Thu, 06 Apr 2017 07:11:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ZVhtXHRJY8EasRRaGv/LcGxipsTgXxlOXhnFHVjzU70=; b=HamqBCAAMXX7v+AQUxfPM8Djsl6V1MNfBFlEvjfYgK2VudvGUObRCVwuR+hK4gWLkN MeKlZSSUnzVXcXqlrS4FlH4U9gQH9uZ60mtLVRKYyigXVIByg0SrHCrkOMVH3UIBfwxW CtRzAT7IlYgBdZGI+NZjMG6p2PYFLc7uxRXaWAvjCJZ6PwUthCtx79MNWUYwppxY2QdC IdpCRuG4c7I56Eh4tJX+DkozFHd6p96oT7wletoyFIxhf02p9hzGeHb+oRZxSF6Ki0Nx j1bf+5lizcWTNtAVKbhHAyl67ZPqpXh9Fv9ICBx7R6HFcxXeh0d/vsmZNxiWnRmiE8Os V0jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ZVhtXHRJY8EasRRaGv/LcGxipsTgXxlOXhnFHVjzU70=; b=toEViIT7MYZTtZU0anfpRm3m2fLEcOIBSo2ndj1O3/5bquR8kBZLxCgvIjtylWSlYy BRGEcSzniRGwYDpPAYWrRrhyUpGkmLpMlJoJZ2W34G8GAh3rqsQQDEBGmJX0chnJpBhF E60wMwahTcPDwL64dE2I0ubc1uCuiPgYw6kyKbN7TG0xSq/7uETQ7s0//KILnIPLV6+S PtCbBIIbu5QeMY8knQvqmegTU9CDbXBElKev4QwyXOWH5TO7iI6lZtAJTwhymAJ5DuD7 q1il3LUpWk50+FcbF8pguYm+2kuALePLe0RUpGNzKVJ2syfLqbsTh24RPhxNUlPPrfZg 3nzg== X-Gm-Message-State: AFeK/H3XB5qx6aj8DlzNGejr18tR1xXL+3Nrx3w6rbU2rZaYS0ziRRJNAUbO7GLUf+wWBviAROswqp40c9nG0A== X-Received: by 10.200.48.174 with SMTP id v43mr35047724qta.263.1491487868848; Thu, 06 Apr 2017 07:11:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.12.146.11 with HTTP; Thu, 6 Apr 2017 07:11:08 -0700 (PDT) In-Reply-To: <97006cf8-369d-6649-4595-43178789feba@FreeBSD.org> References: <97006cf8-369d-6649-4595-43178789feba@FreeBSD.org> From: Bryan Cantrill Date: Thu, 6 Apr 2017 07:11:08 -0700 Message-ID: Subject: Re: [developer] dtrace: normalization of stddev To: illumos-dev Cc: freebsd-dtrace@freebsd.org Content-Type: text/plain; charset=UTF-8 X-Content-Filtered-By: Mailman/MimeDel 2.1.23 X-BeenThere: freebsd-dtrace@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "A discussion list for developers working on DTrace in FreeBSD." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Apr 2017 14:11:10 -0000 Nice! Let's turn this into a formalized test (see usr/src/cmd/dtrace/test/tst/common/aggs/tst.stddev.d for the one test that's there -- we should add this as usr/src/cmd/dtrace/test/tst/common/aggs/tst.normalizestddev.d or something) and then get the fix in. Thanks for debugging this, and sorry for whatever pain it caused! - Bryan On Thu, Apr 6, 2017 at 4:25 AM, Andriy Gapon wrote: > > It seems that currently normalization of stddev aggregation is done > incorrectly. > We divide both the sum of values and the sum of their squares by the > normalization factor. But we should divide the sum of squares by the > normalization factor squared to scale the original values properly. > > --- lib/libdtrace/common/dt_consume.c > +++ lib/libdtrace/common/dt_consume.c > @@ -389,8 +389,10 @@ dt_stddev(uint64_t *data, uint64_t normal) > * The standard approximation for standard deviation is > * sqrt(average(x**2) - average(x)**2), i.e. the square root > * of the average of the squares minus the square of the average. > + * When normalizing, we should divide the sum of x**2 by normal**2. > */ > dt_divide_128(data + 2, normal, avg_of_squares); > + dt_divide_128(avg_of_squares, normal, avg_of_squares); > dt_divide_128(avg_of_squares, data[0], avg_of_squares); > > norm_avg = (int64_t)data[1] / (int64_t)normal / (int64_t)data[0]; > > > A primitive test script: > > BEGIN > { > i = 100; > @s = avg(i); > @v = stddev(i); > > i = 200; > @s = avg(i); > @v = stddev(i); > > i = 300; > @s = avg(i); > @v = stddev(i); > > i = 400; > @s = avg(i); > @v = stddev(i); > > i = 500; > @s = avg(i); > @v = stddev(i); > > i = 600; > @s = avg(i); > @v = stddev(i); > > i = 700; > @s = avg(i); > @v = stddev(i); > > i = 800; > @s = avg(i); > @v = stddev(i); > > i = 900; > @s = avg(i); > @v = stddev(i); > > printa("%@3d %@3d\n", @s, @v); > > normalize(@s, 10); > normalize(@v, 10); > printa("%@3d %@3d\n", @s, @v); > > exit(0); > > } > > Without the patch it produces: > 500 258 > 50 170 > > With the patch: > 500 258 > 50 25 > > > -- > Andriy Gapon > > > ------------------------------------------- > illumos-developer > Archives: https://www.listbox.com/member/archive/182179/=now > RSS Feed: https://www.listbox.com/member/archive/rss/182179/ > 21175001-8b3b9e0a > Modify Your Subscription: https://www.listbox.com/ > member/?member_id=21175001&id_secret=21175001-7a89b2a7 > Powered by Listbox: http://www.listbox.com >