From owner-freebsd-questions@FreeBSD.ORG Thu Aug 28 15:43:37 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C21BD3B6 for ; Thu, 28 Aug 2014 15:43:37 +0000 (UTC) Received: from be-well.ilk.org (be-well.ilk.org [23.30.133.173]) by mx1.freebsd.org (Postfix) with ESMTP id 792731929 for ; Thu, 28 Aug 2014 15:43:37 +0000 (UTC) Received: from lowell-desk.lan (lowell-desk.lan [172.30.250.41]) by be-well.ilk.org (Postfix) with ESMTP id D129D33C1E; Thu, 28 Aug 2014 11:43:31 -0400 (EDT) Received: by lowell-desk.lan (Postfix, from userid 1147) id E7A3F39860; Thu, 28 Aug 2014 11:43:30 -0400 (EDT) From: Lowell Gilbert To: Chenguang Li Subject: Re: Ask for opinion: changing rand(3) to random(3) in awk(1) References: <44mwapn1pw.fsf@lowell-desk.lan> Reply-To: freebsd-questions@freebsd.org Date: Thu, 28 Aug 2014 11:43:30 -0400 In-Reply-To: (Chenguang Li's message of "Thu, 28 Aug 2014 10:10:34 +0800") Message-ID: <44y4u8ei1p.fsf@lowell-desk.lan> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain Cc: freebsd-questions@freebsd.org X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Aug 2014 15:43:37 -0000 Chenguang Li writes: > The problem I was trying to describe was its "one-shot" randomness, take these two as examples (where it matters): > > 1. You wrote a script[1] that simulate rolling a dice, it would > produce the same result if executed within, say, 5 seconds. > [1] BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }, won't matter. One second, not 5. Calling srand() without a parameter seeds the random number generator with the current time in seconds, so the value changes once per second. > 2. You have a CGI script which will show different content based on the number generated by rand(). > > In the first situation, you can generate all the outcomes in a single > run by using for-loop, but the first outcome will be the same. OSX's > awk(1) will produce a reasonable number every time I run it. In the > latter one, you could call rand() once and throw away the result, and > call it again to get another number. Both are practical workarounds, > but we do have a better choice: applying the modification I suggested > before. You are still misunderstanding the relationship between srand() and rand(), in a way that will not be fixed by changing awk's implementation from rand(3) to random(3). srand() "seeds" the random number generator with a particular value, and the sequence of numbers is completely determined afterwards. This isn't a bug; the ability to exactly reproduce a sequence of "random" numbers is an essential feature in a lot of simulation uses. This is also why we refer to these algorithms as "pseudo-random" rather than just "random." In your cases, you really do want a different sequence every time. The way that is handled is by using a different seed each time. The normal use of srand() uses the current time, so as long as it isn't called twice within one second, it will always use a different sequence of numbers. If it *is* called twice within the same second, it will produce the same sequence of numbers (not just the same first number, but the second, third, etc. number will be the same also). This is just as true on OSX as on FreeBSD. Your use of srand() in your first script is buggy because it calls srand() for *every* call to rand(); your second version fixes this problem. How do we deal with the one-second window? Well, most of the time we ignore it. For a CGI script, it won't matter. If you really do need to run separate copies of an awk script more often, you'll need a better seed. Reading it from /dev/random would be one place for your awk script to get that. An important point that you may have missed is that when your script calls srand(), it can provide a parameter, which will be used instead of the current time. > If others are not affected by the problem I described above, then I am > okay with that. The other reason why I suggest this is, I see no loss, > only to make it better. The problem you described is caused by your calling srand() multiple times. This is a bug on your part, not a problem with awk that would affect other people. Changing awk to use random(3) instead of rand(3) will not fix your problem, because continually reseeding srandom(3) with the same seed will give you the same values from random(3) just as much as doing the same with srand(3) and rand(3) will. In your example: > BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); } the first one is broken and the second one works (try them and compare the output). Although it may not fix the problem you thought it would, you're right that there's no loss in making the change, so I think it's a good idea. Be well. Lowell