On the 16th day of Advent my True Language brought to me..
Math::BigInt
 [previous] [next]

Scalars in Perl are highly magical things. Unlike much more statically typed languages like C, they can contain many different types of information. They can hold strings, integers, and floating point numbers and more. You gets so used to them holding whatever you throw at them is can be a big shock when you reach the limitations of your platform.

The problem comes when you run out of space to store the data in the slot. This isn't so much a problem with a string, but when you're storing numbers, there's only so big a number you can store before you start losing accuracy - that is, unless you switch to using a Math::BigInt object.

The Devel::Peek module can be used to inspect the inside of Perl scalars. For example:

#!/usr/bin/perl
# turn on perl's safety features
use strict;
use warnings;
use Devel::Peek qw(Dump);
my \$foo = 42;
print Dump(\$foo);

This prints out the rather cryptic looking thing

SV = IV(0x8172b74) at 0x8170efc
REFCNT = 1
IV = 42

The important bit in here is the IV, which contains the number stored within the scalar, and the IOK flag which indicates that that value is correct. In a similar fashion, we can print out a non-integer like so:

my \$bar = 2/3;
print Dump(\$bar);

We get a slightly different type of output:

SV = NV(0x817ddf8) at 0x8170efc
REFCNT = 1
NV = 0.666666666666667

The IV type can no longer hold our result...this time the scalar's NV field has been used, and the NOK flag has been set to indicate that we're using that value instead. So, we use IV for integer values and NV for floating point. Let's try something a little more complicated:

\$foo = 42;
\$foo = \$foo + 0.4077;
print Dump(\$foo);

We take an integer and add a floating point number to it. What we see is rather odd...

REFCNT = 1
IV = 42
NV = 42.4077
PV = 0

...unless you know how how to read this. Now there's two values in here, the IV (integer value) and the NV (floating point.) The important thing to read is that only the NOK flags is there - meaning that only the NV value is valid and should be used by the system (and anything else, like the old IV, should be ignored.)

The same thing happens when you get a string and add something to it. A normal string like this:

my \$baz = "42";
print Dump(\$baz);

Looks like this:

SV = PV(0x815de80) at 0x8170efc
REFCNT = 1
PV = 0x816db20 "42"\0
CUR = 2
LEN = 3

Note we've got a PV value now and a POK flag. As soon as we try and do any maths on it:

my \$baz = "42";
\$baz = \$baz + 1;
print Dump(\$baz);

It loses it's POK flag and updates the IV flag.

SV = PVIV(0x815e290) at 0x8170efc
REFCNT = 1
IV = 43
PV = 0x816db20 "42"\0
CUR = 2
LEN = 3

Fitting things in

Where, you ask, am I going with this? Well besides giving you an insight into how Perl works I have a point to make about the size of data you can store in an IV and maintain accuracy. Well we've seen how perl automatically updates it's scalars from an IV to a NV when it can't represent the number as an IV. Let's look at some other cases where this happens:

my \$buzz = "100000000000000000";
\$buzz = \$buzz + 0;
print Dump(\$buzz);

This prints the confusing hodge podge that is:

REFCNT = 1
IV = -1
NV = 1e+17
PV = 0x816de10 "100000000000000000"\0
CUR = 18
LEN = 19

What we can see here is that the string has been converted into a number, but that number was too big to fit inside an IV on my platform. The biggest number you can fit in an IV depends on your platform and the version of perl you're running (With perl 5.8.X and a 64 bit platform it's quite big indeed, but with my lowly 32 bit system it's quite small.) Once a number is bigger than the number of bits that your system has available to represent an integer, the only field that can hold the number is the NV. However this stores it as 1 plus ten to the power of 17 rather than counting each number one by one. The trouble is the representation of the number is approximate. For example, if we do this:

my \$buzz = "100000000000000001";
\$buzz = \$buzz + 0;
print \$buzz;

We get "1e+17" printed out because the floating point number is unable to accurately store 1.00000000000000001e+17 in it in much the same way as the figure was unable to be stored in the IV value.

This disturbing approximation means that this program, which we think should go on for ever, actually does terminate:

my \$c = 0;
\$c++ while (\$c+1 > \$c);
print "\$c\n";

On my computer this eventually will terminate and print

9.00719925474099e+15

(Of course, it will take it about two hundred and thirty years to do so)

So the question is, if we need to use really big numbers, how do we maintain precision?

Math::BigInt

The solution, as often is the case, is to use a module. In this case it's the Math::BigInt module that creates objects that represent numbers that can be arbitrary sized scalars. Because of the wonders of overloading these objects can be manipulated with the standard mathematical operators

#!/usr/bin/perl
# turn on perl's safety features
use strict;
use warnings;
use Math::BigInt;
# create some numbers
my \$op = Math::BigInt->new(4);
my \$aj = Math::BigInt->new(3);
# work out the hypotenuse
my \$hy = sqrt(\$op**2 + \$aj**2);
print "hy: \$hy\n";

The snazzy thing about Math::BigInt is that you can use numbers as large as you want for calculations

my \$x = Math::BigInt->new("100_000_000_000_000_000";);
my \$y = Math::BigInt->new("1");
print \$x+\$y;

This correctly prints:

100000000000000001

And it it hasn't lost the precision. Isn't Math::BigInt wonderfully simple to use? Of course, the price you pay for all this extra precision is speed. We can attempt to benchmark the comparative speed of running normal increment and using increment on a Math::BigInt object:

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark qw(:all);
use Math::BigInt;
cmpthese(10, {
'native' => q{my \$c = 1;
\$c++ while \$c<10000;
print "\$c\n"},
'bigint' => q{my \$c = Math::BigInt->new("1");
\$c++ while \$c<10000;
print "\$c\n"}
});

This completely fails to produce any meaningful results. We simply get this out the other end:

(warning: too few iterations for a reliable count)
s/iter bigint native
bigint     3.15     --  -100%
native 4.00e-03 78700%     --

Which indicates that either the native version of the code is so fast compared to the Math::BigInt code that it's impossible to do a meaningful comparison. The other worrying thing is the memory requirements to store a Math::BigInt number compared to a normal scalar:

#!/usr/bin/perl
# turn on perl's safety features
use strict;
use warnings;
use Devel::Size qw(total_size);
use Math::BigInt;
# output
print "native is ".total_size(3)."\n";
print "bigint is ".total_size(Math::BigInt->new("3"))."\n";

This prints:

native is 16
bigint is 271

Meaning that Math::BigInt objects are 16 times bigger than normal numbers on my system.

Other Math::Big classes

If you want even more precision, and by this I mean floating point numbers, then you can use Math::BigFloat instead. This creates objects that are like Math::BigInt numbers but instead these numbers can hold floating point numbers:

use Math::BigFloat;
my \$x = Math::BigFloat->new("0.000000000000000001");
my \$y = Math::BigFloat->new("0.0000000000000000001");
print \$x + \$y;

This, rather predictably, prints out

0.0000000000000000011

Of course, this extra precision comes again at the code of yet another speed hit and Math::BigFloat's even slower to use that Math::BigInt.

One thing you can do to make your code more efficient is to use the Math::BigInt::Lite class. This class uses less storage than a Math::BigInt but can only hold values that an IV would normally hold. The cunning thing is that the Math::BigInt::Lite object will auto-promote itself to a proper Math::BigInt (in much the same way an IV will to a NV) when the object runs out of space.

use Math::BigInt::Lite;
my \$num = Math::BigInt::Lite->new("10");
print "num, '\$num' isa ".ref(\$num)."\n";
\$num = \$num**100;
print "num, '\$num' isa ".ref(\$num)."\n";

Conclusion

You can see it's possible to do some very smart things with large numbers very easily with Perl, but only if you're willing to take the massive speed an memory hits. There's no such thing as a free lunch.