From owner-freebsd-questions@FreeBSD.ORG Fri Aug 29 02:44:20 2014 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id C1D31C2B for ; Fri, 29 Aug 2014 02:44:20 +0000 (UTC) Received: from mail-pa0-x22a.google.com (mail-pa0-x22a.google.com [IPv6:2607:f8b0:400e:c03::22a]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 94682144D for ; Fri, 29 Aug 2014 02:44:20 +0000 (UTC) Received: by mail-pa0-f42.google.com with SMTP id lf10so5076377pab.1 for ; Thu, 28 Aug 2014 19:44:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=cxzB8BTiOeX38YXtdjHVDhXkUBJPPYb5J4FATBF3Pp0=; b=sr37NTbncCvkohdlxLjMKRx6xFUDySlIdQ++ytKwGHX/L86xogdZ8Hae+qXSCud5Pk 4bi1bAdrneNQmaB1wLX0a/0kT7TDCSMv2ZiD02Gj/EfdH1Nf50FxYBsbB+3pGPtNjIvp Q40klTPNPgDSx3MfoieG2KCFEjFqMRLbGvzQlDqjaE0t44Vt78Lu2LCF6PNbzCjoneTT z4UIqmSm7rbasbC98IqScUll2OZeAuoV0hW5BGetQ0yjUdP86NfRpazyBsPpggzwT9Fz RxAnl4PoeWNjjS7w5+rIh+RZ7UTB34g6zRncxGmFCJ2KUDTvpNmdSNqieOkatI0n9jrH CLLA== X-Received: by 10.70.129.106 with SMTP id nv10mr11863524pdb.24.1409280260063; Thu, 28 Aug 2014 19:44:20 -0700 (PDT) Received: from [10.240.140.110] ([123.58.191.68]) by mx.google.com with ESMTPSA id ek9sm7736881pdb.55.2014.08.28.19.44.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 28 Aug 2014 19:44:19 -0700 (PDT) Content-Type: text/plain; charset="utf8"; Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: Ask for opinion: changing rand(3) to random(3) in awk(1) X-Pgp-Agent: GPGMail (null) From: Chenguang Li In-Reply-To: <44y4u8ei1p.fsf@lowell-desk.lan> Date: Fri, 29 Aug 2014 10:44:06 +0800 Content-Transfer-Encoding: 8bit Message-Id: <69A3F8EA-3CC2-430A-AD0B-35E3D0899BE2@gmail.com> References: <44mwapn1pw.fsf@lowell-desk.lan> <44y4u8ei1p.fsf@lowell-desk.lan> To: freebsd-questions@freebsd.org X-Mailer: Apple Mail (2.1878.6) Cc: Peter Pentchev X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 29 Aug 2014 02:44:20 -0000 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Lowell Gilbert wrote: > Chenguang Li writes: > >> The problem I was trying to describe was its "one-shot" randomness, take these two as examples (where it matters): >> >> 1. You wrote a script[1] that simulate rolling a dice, it would >> produce the same result if executed within, say, 5 seconds. >> [1] BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }, won't matter. > > One second, not 5. Calling srand() without a parameter seeds the random > number generator with the current time in seconds, so the value changes > once per second. Did you actually run this line? I will let the examples speak for me: m1: FreeBSD 10.0-RELEASE-p6 amd64 m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277292 53 m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277300 53 m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277302 53 m2: FreeBSD 10.0-RELEASE i386 m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277368 53 m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277374 53 m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277379 53 m3: FreeBSD 10.0-RELEASE i386 m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409248690 31 m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409248697 31 m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409248700 31 m1, m2, m3 are 3 different machines I have access to. Other versions and/or architectures are not tested. >> 2. You have a CGI script which will show different content based on the number generated by rand(). >> >> In the first situation, you can generate all the outcomes in a single >> run by using for-loop, but the first outcome will be the same. OSX's >> awk(1) will produce a reasonable number every time I run it. In the >> latter one, you could call rand() once and throw away the result, and >> call it again to get another number. Both are practical workarounds, >> but we do have a better choice: applying the modification I suggested >> before. > You are still misunderstanding the relationship between srand() and > rand(), in a way that will not be fixed by changing awk's implementation > from rand(3) to random(3). srand() "seeds" the random number generator > with a particular value, and the sequence of numbers is completely > determined afterwards. This isn't a bug; the ability to exactly > reproduce a sequence of "random" numbers is an essential feature in a > lot of simulation uses. This is also why we refer to these algorithms as > "pseudo-random" rather than just "random." I'm fairly confident that I have a not-so-bad understanding of the relationship between them. > In your cases, you really do want a different sequence every time. The > way that is handled is by using a different seed each time. The normal > use of srand() uses the current time, so as long as it isn't called > twice within one second, it will always use a different sequence of > numbers. If it *is* called twice within the same second, it will produce > the same sequence of numbers (not just the same first number, but the > second, third, etc. number will be the same also). This is just as true > on OSX as on FreeBSD. Your use of srand() in your first script is buggy > because it calls srand() for *every* call to rand(); your second version > fixes this problem. Yes, it's buggy, but for one-shot demonstration purpose only, makes no difference to me. And one more example: m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409278327 54 15 10 6 56 m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409278335 54 20 14 73 82 Just the first number. Now which one to blame? rand(), the timer, or the compiler? It's weird, and I have the same thought before - it should change by seconds. The fact is, it's not. Is it just me or ... > How do we deal with the one-second window? Well, most of the time we > ignore it. For a CGI script, it won't matter. If you really do need to > run separate copies of an awk script more often, you'll need a better > seed. Reading it from /dev/random would be one place for your awk > script to get that. An important point that you may have missed is that > when your script calls srand(), it can provide a parameter, which will > be used instead of the current time. > >> If others are not affected by the problem I described above, then I am >> okay with that. The other reason why I suggest this is, I see no loss, >> only to make it better. > > The problem you described is caused by your calling srand() multiple > times. This is a bug on your part, not a problem with awk that would > affect other people. Changing awk to use random(3) instead of rand(3) > will not fix your problem, because continually reseeding srandom(3) with > the same seed will give you the same values from random(3) just as much > as doing the same with srand(3) and rand(3) will. In your example: > BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); } > the first one is broken and the second one works (try them and compare > the output). I do know that I can provide a better seed when calling srand(). I know that I shoudn't call srand() every time I call rand(). I insists that our awk(1) should provide a good randomness in very single run, based on the example given, it's not doing its job well. Below is a locally patched version: m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409279522 59 38 84 67 8 m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409279524 80 71 94 80 94 Much better. > Although it may not fix the problem you thought it would, you're right > that there's no loss in making the change, so I think it's a good idea. > > Be well. > Lowell I am afraid I have only touched the surface of the problem, nevertheless the modification do fix my problem. My journey ends here. Chenguang Li -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJT/+j9AAoJELG4cS+11lRh0SsP+wROOZIHSuA2iR+NsnrAVEM8 WH6UY/Gqyh/uxzWVDJ+FIEfgFz9GGVFfOndOhsTMYnQdLWTkrbKcAcjDUP4zBXG/ nFMxKwdVws8Q3gIRM6+ZIDiPt8Yui2w+JrPks0fJQ9LVJTtGnv7v0t+jkCag5u8G aeseg1SQU5Z3aSoBaxBtuObjjNg+0wSMntwJDToG5AriKzB8uYvu5ljZ6tDhKb2z q19uVcP5AUCxr7WgOoNOhVWHP+kLYMUmpiWR7rTmkKa3Bx4jbMwIJzQZ86rjyaGk 8EyKCd+K+4GsKMEvaA+yXBYwsB4rM4f0dYUfPQ7EmQX0hS78xkO7Y7cP8QAfyv1j /ziWuecSYo0RgipU3S8gLCxt9zm9CHoTmNy81tFqJA2ZV7cqhXlx7AKwcqzoOhtI tSW9iXimUhAxTB7pB04M/hGCooZrgW0bdyP5VeaetZHTz8TNTyOHrhCPCHBwSV3O aXM+qMwYkRMcs3lEGzRzxoRdo0J4dg7FpORTT8mrm81vGIcuqFfidZpah2RLgD1K JUyd+TTUAs6aqWDC+pG80dOSdA/yE5iHnApEQp6gG3egIQK893jD7Hk4Flnsem8n RJKNTVB3ewbxwwcyJQIatFao209cvMXgsS9OsbSzvv5mYndPLhxSp7XpApvnCcCs Ob720IJk95ixCo7/tklZ =q2fd -----END PGP SIGNATURE-----