Welcome to Perl 5 Porters Weekly, a summary of the email traffic of the
perl5-porters email list.
Topics this week include:
- rand() on Windows only uses 15 bits of entropy
- RFC: Removing several undocumented functions from the Perl core
- 5.18 VT in \s
- Comment period extended for Unicode’s changing some common characters from
Punctuation to Symbol
rand() on Windows only uses 15 bits of entropy
If you ran this program on your system what would you expect the results to
be?
my %nums;
my $cnt = 0;
foreach my $i (1 .. 1_000_000) {
my $num = rand;
$cnt++ if ($nums{$num});
$nums{$num} = 1;
}
print "$cnt out of 1,000,000\n";
If you run it on Windows, you might be surprised to discover that it will
print 967232 out of 1,000,000 every time. That means there’s only 32,768
possible floats between 0 .. 1 that rand() will generate on that platform.
Other platforms don’t seem to be affected by the lack of entropy in rand().
This comes from a RT ticket opened by Brendan Byrd. There was a bunch of
discussion about this. Ricardo Signes indicated that he’s very interested in
seeing an implementation of the Mersenne Twister algorithm implemented,
which follows some suggestions by Nicholas Clark.
RFC: Removing several undocumented functions from the Perl core
Karl Williamson suggested that several undocumented functions in Perl’s API
be removed for 5.18. They are:
- is_uni_idfirst_lc
- is_utf8_idfirst
- is_utf8_xidfirst
- is_utf8_idcont
Nicholas Clark suggested the calls be flagged as deprecated in 5.18 and
removed in 5.20.
5.18 VT in \s
Karl Williamson suggested that the vertical tab character, added
experimentally to m/\s/ in 5.17 be formally released in 5.18. He notes that
this addition caused no perceptable failures in CPAN smoke testing and
doesn’t expect any failures from releasing it into the wild, partially
because the vertical tab character is so rarely used.
For the full context about vertical tab’s inclusion, read the thread.
Comment period extended for Unicode’s changing some common characters from
Punctuation to Symbol
Completing his trifeca on this week’s summary, Karl Williamson noted that
Unicode has extended its comment period to January 21, 2013 for changing
some common characters from the “punctuation” class to the “symbol” class.
You can make comments by visiting here.
David Golden asked in a followup the following questions:
* How can we better document (if we're not) the forward compatibility
risks inherent in using Unicode character classes?
* How can we let programs introspect the version of Unicode that Perl
provides?
* Is it possible to make any of this pluggable, so a program could
specify which version of Unicode classes they want to use?
Karl replied that it’s not pluggable but it is possible to compile any perl
that supports Unicode with a version specific character database by
following the instructions in README.perl inside of lib/unicore if you
wanted to downgrade a particular perl for some reason.
He also notes that these proposed changes were included last summer
experimentally and only broke 1 CPAN module. He says that the [[:punct:]]
class matches both the Unicode ASCII-range symbols plus punctuation, so most
Perl programs will not notice any of these changes.
With respect to the third question above about Unicode version inspection
(programmatically), Karl wrote:
This information has long been available through Unicode::UCD::UnicodeVersion()