Date: Fri, 29 Aug 2014 10:44:06 +0800 From: Chenguang Li <horus.li@gmail.com> To: freebsd-questions@freebsd.org Cc: Peter Pentchev <roam@ringlet.net> Subject: Re: Ask for opinion: changing rand(3) to random(3) in awk(1) Message-ID: <69A3F8EA-3CC2-430A-AD0B-35E3D0899BE2@gmail.com> In-Reply-To: <44y4u8ei1p.fsf@lowell-desk.lan> References: <CEB89C77-7426-481C-ACCA-284C86B168A6@gmail.com> <44mwapn1pw.fsf@lowell-desk.lan> <A160596E-358B-4A87-9C42-7678A436CBBA@gmail.com> <44y4u8ei1p.fsf@lowell-desk.lan>
next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Lowell Gilbert <freebsd-questions-local@be-well.ilk.org> wrote: > Chenguang Li <horus.li@gmail.com> writes: > >> The problem I was trying to describe was its "one-shot" randomness, take these two as examples (where it matters): >> >> 1. You wrote a script[1] that simulate rolling a dice, it would >> produce the same result if executed within, say, 5 seconds. >> [1] BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); }, won't matter. > > One second, not 5. Calling srand() without a parameter seeds the random > number generator with the current time in seconds, so the value changes > once per second. Did you actually run this line? I will let the examples speak for me: m1: FreeBSD 10.0-RELEASE-p6 amd64 m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277292 53 m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277300 53 m1$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277302 53 m2: FreeBSD 10.0-RELEASE i386 m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277368 53 m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277374 53 m2$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409277379 53 m3: FreeBSD 10.0-RELEASE i386 m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409248690 31 m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409248697 31 m3$ date +%s ; echo | awk 'BEGIN{srand()}{print int(1+rand()*100)}' 1409248700 31 m1, m2, m3 are 3 different machines I have access to. Other versions and/or architectures are not tested. >> 2. You have a CGI script which will show different content based on the number generated by rand(). >> >> In the first situation, you can generate all the outcomes in a single >> run by using for-loop, but the first outcome will be the same. OSX's >> awk(1) will produce a reasonable number every time I run it. In the >> latter one, you could call rand() once and throw away the result, and >> call it again to get another number. Both are practical workarounds, >> but we do have a better choice: applying the modification I suggested >> before. > You are still misunderstanding the relationship between srand() and > rand(), in a way that will not be fixed by changing awk's implementation > from rand(3) to random(3). srand() "seeds" the random number generator > with a particular value, and the sequence of numbers is completely > determined afterwards. This isn't a bug; the ability to exactly > reproduce a sequence of "random" numbers is an essential feature in a > lot of simulation uses. This is also why we refer to these algorithms as > "pseudo-random" rather than just "random." I'm fairly confident that I have a not-so-bad understanding of the relationship between them. > In your cases, you really do want a different sequence every time. The > way that is handled is by using a different seed each time. The normal > use of srand() uses the current time, so as long as it isn't called > twice within one second, it will always use a different sequence of > numbers. If it *is* called twice within the same second, it will produce > the same sequence of numbers (not just the same first number, but the > second, third, etc. number will be the same also). This is just as true > on OSX as on FreeBSD. Your use of srand() in your first script is buggy > because it calls srand() for *every* call to rand(); your second version > fixes this problem. Yes, it's buggy, but for one-shot demonstration purpose only, makes no difference to me. And one more example: m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409278327 54 15 10 6 56 m1$ date +%s ; echo | awk 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409278335 54 20 14 73 82 Just the first number. Now which one to blame? rand(), the timer, or the compiler? It's weird, and I have the same thought before - it should change by seconds. The fact is, it's not. Is it just me or ... > How do we deal with the one-second window? Well, most of the time we > ignore it. For a CGI script, it won't matter. If you really do need to > run separate copies of an awk script more often, you'll need a better > seed. Reading it from /dev/random would be one place for your awk > script to get that. An important point that you may have missed is that > when your script calls srand(), it can provide a parameter, which will > be used instead of the current time. > >> If others are not affected by the problem I described above, then I am >> okay with that. The other reason why I suggest this is, I see no loss, >> only to make it better. > > The problem you described is caused by your calling srand() multiple > times. This is a bug on your part, not a problem with awk that would > affect other people. Changing awk to use random(3) instead of rand(3) > will not fix your problem, because continually reseeding srandom(3) with > the same seed will give you the same values from random(3) just as much > as doing the same with srand(3) and rand(3) will. In your example: > BEGIN { srand(); print int(1+rand()*6); } or BEGIN { srand(); } { print int(1+rand()*6); } > the first one is broken and the second one works (try them and compare > the output). I do know that I can provide a better seed when calling srand(). I know that I shoudn't call srand() every time I call rand(). I insists that our awk(1) should provide a good randomness in very single run, based on the example given, it's not doing its job well. Below is a locally patched version: m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409279522 59 38 84 67 8 m2$ date +%s ; echo | ./a.out 'BEGIN{srand()}{for (i=1;i<=5;i++) print int(1+rand()*100)}' 1409279524 80 71 94 80 94 Much better. > Although it may not fix the problem you thought it would, you're right > that there's no loss in making the change, so I think it's a good idea. > > Be well. > Lowell I am afraid I have only touched the surface of the problem, nevertheless the modification do fix my problem. My journey ends here. Chenguang Li -----BEGIN PGP SIGNATURE----- iQIcBAEBCgAGBQJT/+j9AAoJELG4cS+11lRh0SsP+wROOZIHSuA2iR+NsnrAVEM8 WH6UY/Gqyh/uxzWVDJ+FIEfgFz9GGVFfOndOhsTMYnQdLWTkrbKcAcjDUP4zBXG/ nFMxKwdVws8Q3gIRM6+ZIDiPt8Yui2w+JrPks0fJQ9LVJTtGnv7v0t+jkCag5u8G aeseg1SQU5Z3aSoBaxBtuObjjNg+0wSMntwJDToG5AriKzB8uYvu5ljZ6tDhKb2z q19uVcP5AUCxr7WgOoNOhVWHP+kLYMUmpiWR7rTmkKa3Bx4jbMwIJzQZ86rjyaGk 8EyKCd+K+4GsKMEvaA+yXBYwsB4rM4f0dYUfPQ7EmQX0hS78xkO7Y7cP8QAfyv1j /ziWuecSYo0RgipU3S8gLCxt9zm9CHoTmNy81tFqJA2ZV7cqhXlx7AKwcqzoOhtI tSW9iXimUhAxTB7pB04M/hGCooZrgW0bdyP5VeaetZHTz8TNTyOHrhCPCHBwSV3O aXM+qMwYkRMcs3lEGzRzxoRdo0J4dg7FpORTT8mrm81vGIcuqFfidZpah2RLgD1K JUyd+TTUAs6aqWDC+pG80dOSdA/yE5iHnApEQp6gG3egIQK893jD7Hk4Flnsem8n RJKNTVB3ewbxwwcyJQIatFao209cvMXgsS9OsbSzvv5mYndPLhxSp7XpApvnCcCs Ob720IJk95ixCo7/tklZ =q2fd -----END PGP SIGNATURE-----
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?69A3F8EA-3CC2-430A-AD0B-35E3D0899BE2>