From owner-freebsd-dtrace@freebsd.org Thu Apr 6 11:26:33 2017 Return-Path: Delivered-To: freebsd-dtrace@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 48934D2FA20 for ; Thu, 6 Apr 2017 11:26:33 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from citapm.icyb.net.ua (citapm.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id A10CFB6C for ; Thu, 6 Apr 2017 11:26:29 +0000 (UTC) (envelope-from avg@FreeBSD.org) Received: from porto.starpoint.kiev.ua (porto-e.starpoint.kiev.ua [212.40.38.100]) by citapm.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id OAA14105; Thu, 06 Apr 2017 14:26:26 +0300 (EEST) (envelope-from avg@FreeBSD.org) Received: from localhost ([127.0.0.1]) by porto.starpoint.kiev.ua with esmtp (Exim 4.34 (FreeBSD)) id 1cw5YY-00048G-AU; Thu, 06 Apr 2017 14:26:26 +0300 To: illumos Developer , freebsd-dtrace@FreeBSD.org From: Andriy Gapon Subject: dtrace: normalization of stddev Message-ID: <97006cf8-369d-6649-4595-43178789feba@FreeBSD.org> Date: Thu, 6 Apr 2017 14:25:29 +0300 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:52.0) Gecko/20100101 Thunderbird/52.0 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-dtrace@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "A discussion list for developers working on DTrace in FreeBSD." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 06 Apr 2017 11:26:33 -0000 It seems that currently normalization of stddev aggregation is done incorrectly. We divide both the sum of values and the sum of their squares by the normalization factor. But we should divide the sum of squares by the normalization factor squared to scale the original values properly. --- lib/libdtrace/common/dt_consume.c +++ lib/libdtrace/common/dt_consume.c @@ -389,8 +389,10 @@ dt_stddev(uint64_t *data, uint64_t normal) * The standard approximation for standard deviation is * sqrt(average(x**2) - average(x)**2), i.e. the square root * of the average of the squares minus the square of the average. + * When normalizing, we should divide the sum of x**2 by normal**2. */ dt_divide_128(data + 2, normal, avg_of_squares); + dt_divide_128(avg_of_squares, normal, avg_of_squares); dt_divide_128(avg_of_squares, data[0], avg_of_squares); norm_avg = (int64_t)data[1] / (int64_t)normal / (int64_t)data[0]; A primitive test script: BEGIN { i = 100; @s = avg(i); @v = stddev(i); i = 200; @s = avg(i); @v = stddev(i); i = 300; @s = avg(i); @v = stddev(i); i = 400; @s = avg(i); @v = stddev(i); i = 500; @s = avg(i); @v = stddev(i); i = 600; @s = avg(i); @v = stddev(i); i = 700; @s = avg(i); @v = stddev(i); i = 800; @s = avg(i); @v = stddev(i); i = 900; @s = avg(i); @v = stddev(i); printa("%@3d %@3d\n", @s, @v); normalize(@s, 10); normalize(@v, 10); printa("%@3d %@3d\n", @s, @v); exit(0); } Without the patch it produces: 500 258 50 170 With the patch: 500 258 50 25 -- Andriy Gapon