Perl 2003 Advent Calendar: Math::BigInt

On the 16th day of Advent my True Language brought to me..

Math::BigInt

Scalars in Perl are highly magical things. Unlike much more statically typed languages like C, they can contain many different types of information. They can hold strings, integers, and floating point numbers and more. You gets so used to them holding whatever you throw at them is can be a big shock when you reach the limitations of your platform.

The problem comes when you run out of space to store the data in the slot. This isn't so much a problem with a string, but when you're storing numbers, there's only so big a number you can store before you start losing accuracy - that is, unless you switch to using a Math::BigInt object.

[Read the documentation for Math::BigInt on search.cpan.org]

The Devel::Peek module can be used to inspect the inside of Perl scalars. For example:

  #!/usr/bin/perl

  # turn on perl's safety features
  use strict;
  use warnings;

  # load the 'Dump' routine
  use Devel::Peek qw(Dump);

  my $foo = 42;
  print Dump($foo);

This prints out the rather cryptic looking thing

  SV = IV(0x8172b74) at 0x8170efc
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY,IOK,pIOK)
    IV = 42

The important bit in here is the IV, which contains the number stored within the scalar, and the IOK flag which indicates that that value is correct. In a similar fashion, we can print out a non-integer like so:

  my $bar = 2/3;
  print Dump($bar);

We get a slightly different type of output:

  SV = NV(0x817ddf8) at 0x8170efc
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY,NOK,pNOK)
    NV = 0.666666666666667

The IV type can no longer hold our result...this time the scalar's NV field has been used, and the NOK flag has been set to indicate that we're using that value instead. So, we use IV for integer values and NV for floating point. Let's try something a little more complicated:

  $foo = 42;
  $foo = $foo + 0.4077;
  print Dump($foo);

We take an integer and add a floating point number to it. What we see is rather odd...

  SV = PVNV(0x81adb10) at 0x8170efc
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY,NOK,pNOK)
    IV = 42
    NV = 42.4077
    PV = 0

...unless you know how how to read this. Now there's two values in here, the IV (integer value) and the NV (floating point.) The important thing to read is that only the NOK flags is there - meaning that only the NV value is valid and should be used by the system (and anything else, like the old IV, should be ignored.)

The same thing happens when you get a string and add something to it. A normal string like this:

  my $baz = "42";
  print Dump($baz);

Looks like this:

  SV = PV(0x815de80) at 0x8170efc
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY,POK,pPOK)
    PV = 0x816db20 "42"\0
    CUR = 2
    LEN = 3

Note we've got a PV value now and a POK flag. As soon as we try and do any maths on it:

  my $baz = "42";
  $baz = $baz + 1;
  print Dump($baz);

It loses it's POK flag and updates the IV flag.

  SV = PVIV(0x815e290) at 0x8170efc
  REFCNT = 1
  FLAGS = (PADBUSY,PADMY,IOK,pIOK)
  IV = 43
  PV = 0x816db20 "42"\0
  CUR = 2
  LEN = 3

Fitting things in

Where, you ask, am I going with this? Well besides giving you an insight into how Perl works I have a point to make about the size of data you can store in an IV and maintain accuracy. Well we've seen how perl automatically updates it's scalars from an IV to a NV when it can't represent the number as an IV. Let's look at some other cases where this happens:

  my $buzz = "100000000000000000";
  $buzz = $buzz + 0;
  print Dump($buzz);

This prints the confusing hodge podge that is:

  SV = PVNV(0x81adb08) at 0x8170efc
    REFCNT = 1
    FLAGS = (PADBUSY,PADMY,NOK,pNOK)
    IV = -1
    NV = 1e+17
    PV = 0x816de10 "100000000000000000"\0
    CUR = 18
    LEN = 19

What we can see here is that the string has been converted into a number, but that number was too big to fit inside an IV on my platform. The biggest number you can fit in an IV depends on your platform and the version of perl you're running (With perl 5.8.X and a 64 bit platform it's quite big indeed, but with my lowly 32 bit system it's quite small.) Once a number is bigger than the number of bits that your system has available to represent an integer, the only field that can hold the number is the NV. However this stores it as 1 plus ten to the power of 17 rather than counting each number one by one. The trouble is the representation of the number is approximate. For example, if we do this:

  my $buzz = "100000000000000001";
  $buzz = $buzz + 0;
  print $buzz;

We get "1e+17" printed out because the floating point number is unable to accurately store 1.00000000000000001e+17 in it in much the same way as the figure was unable to be stored in the IV value.

This disturbing approximation means that this program, which we think should go on for ever, actually does terminate:

  my $c = 0;
  $c++ while ($c+1 > $c);
  print "$c\n";

On my computer this eventually will terminate and print

  9.00719925474099e+15

(Of course, it will take it about two hundred and thirty years to do so)

So the question is, if we need to use really big numbers, how do we maintain precision?

Math::BigInt

The solution, as often is the case, is to use a module. In this case it's the Math::BigInt module that creates objects that represent numbers that can be arbitrary sized scalars. Because of the wonders of overloading these objects can be manipulated with the standard mathematical operators

  #!/usr/bin/perl

  # turn on perl's safety features
  use strict;
  use warnings;

  # load the extension library
  use Math::BigInt;

  # create some numbers
  my $op = Math::BigInt->new(4);
  my $aj = Math::BigInt->new(3);

  # work out the hypotenuse
  my $hy = sqrt($op**2 + $aj**2);
  print "hy: $hy\n";

The snazzy thing about Math::BigInt is that you can use numbers as large as you want for calculations

  my $x = Math::BigInt->new("100_000_000_000_000_000";);
  my $y = Math::BigInt->new("1");
  print $x+$y;

This correctly prints:

  100000000000000001

And it it hasn't lost the precision. Isn't Math::BigInt wonderfully simple to use? Of course, the price you pay for all this extra precision is speed. We can attempt to benchmark the comparative speed of running normal increment and using increment on a Math::BigInt object:

  #!/usr/bin/perl

  use strict;
  use warnings;

  use Benchmark qw(:all);
  use Math::BigInt;

  cmpthese(10, {
    'native' => q{my $c = 1;
                  $c++ while $c<10000;
                  print "$c\n"},
    'bigint' => q{my $c = Math::BigInt->new("1");
                  $c++ while $c<10000;
                  print "$c\n"}
  });

This completely fails to produce any meaningful results. We simply get this out the other end:

              (warning: too few iterations for a reliable count)
           s/iter bigint native
  bigint     3.15     --  -100%
  native 4.00e-03 78700%     --

Which indicates that either the native version of the code is so fast compared to the Math::BigInt code that it's impossible to do a meaningful comparison. The other worrying thing is the memory requirements to store a Math::BigInt number compared to a normal scalar:

  #!/usr/bin/perl

  # turn on perl's safety features
  use strict;
  use warnings;

  # load the extensions
  use Devel::Size qw(total_size);
  use Math::BigInt;

  # output
  print "native is ".total_size(3)."\n";
  print "bigint is ".total_size(Math::BigInt->new("3"))."\n";

This prints:

  native is 16
  bigint is 271

Meaning that Math::BigInt objects are 16 times bigger than normal numbers on my system.

Other Math::Big classes

If you want even more precision, and by this I mean floating point numbers, then you can use Math::BigFloat instead. This creates objects that are like Math::BigInt numbers but instead these numbers can hold floating point numbers:

  use Math::BigFloat;
  my $x = Math::BigFloat->new("0.000000000000000001");
  my $y = Math::BigFloat->new("0.0000000000000000001");
  print $x + $y;

This, rather predictably, prints out

  0.0000000000000000011

Of course, this extra precision comes again at the code of yet another speed hit and Math::BigFloat's even slower to use that Math::BigInt.

One thing you can do to make your code more efficient is to use the Math::BigInt::Lite class. This class uses less storage than a Math::BigInt but can only hold values that an IV would normally hold. The cunning thing is that the Math::BigInt::Lite object will auto-promote itself to a proper Math::BigInt (in much the same way an IV will to a NV) when the object runs out of space.

  use Math::BigInt::Lite;
  my $num = Math::BigInt::Lite->new("10");
  print "num, '$num' isa ".ref($num)."\n";
  $num = $num**100;
  print "num, '$num' isa ".ref($num)."\n";

Conclusion

You can see it's possible to do some very smart things with large numbers very easily with Perl, but only if you're willing to take the massive speed an memory hits. There's no such thing as a free lunch.

Math::BigFloat

Math::BigInt::Lite