The 2002 Perl Advent Calendar
[about] | [archives] | [contact] | [home]

On the 15th day of Advent my True Language brought to me..
YAML

One of the all time most useful Perl modules is Data::Dumper. It can be used to dump out any - well, almost any - data structure as Perl code that allows you to see what's in it. This is invaluable when you're debugging.

One of the other great uses of Data::Dumper is for simple data persistence. You can eval back in a structure printed out with Dumper and recreate the data structure. You can use this for technique for saving data between program runs, or communicating data between other Perl programs. The biggest advantage is that if you understand Perl data structures, then it's easy to edit the data structures by hand.

This, however, is a big disadvantage if you don't understand Perl data structures. For example, if you're using Data::Dumper to create a config file, then you'd better hope the admin can program Perl. Likewise if you want a program to read the file, then that program better be Perl. If you're trying to communicate your output to a Java program then you have a problem.

The biggest disadvantage of using Data::Dumper is however that unless you trust the data source then you can't eval it - the file could contain malicious code rather than a Data::Dumper data structure.

Enter YAML. YAML is alternative data structure language that is human readable, cross platform (Perl, Python, Java) and most importantly eval safe.

To demonstrate this let's create a data structure that repents the IMDB's top three films.

  #!/usr/bin/perl
  # turn on perl's safety procedures
  use strict;
  use warnings;
  # load the URI module
  use URI;
  # build a list of IMDB's current top films
  my @top_films;
  my %godfather =
    ( name => "Godfather, The",
      stars => { "Marlon Brando"  => "Don Vito Corleone",
	         "Al Pacino"      => "Michael 'Mike' Corleone",
                 "Diane Keaton"   => "Kay Adams-Corleone", },
      year  => 1972,
      url   => "http://us.imdb.com/Title?0068646");
  push @top_films, \%godfather;
  my %shawshank =
    ( name => "Shawshank Redemption, The",
      stars => { "Tim Robbins"    => "Andy Dufresne",
	         "Morgan Freeman" => "Ellis Boyd 'Red' Redding",
                 "Bob Gunton"     => "Warden Samuel Norton", },
      year  => 1994,
      url   => "http://us.imdb.com/Title?0111161");
  push @top_films, \%shawshank;
  my %godfather2 =
    ( name => "Godfather: Part 2, The",
      stars => { "Al Pacino"      => "Michael 'Mike' Corleone",
                 "Robert Duvall"  => "Tom Hagen",
                 "Diane Keaton"   => "Kay Adams-Corleone", },
      year  => 1974,
      url   => "http://us.imdb.com/Title?0071562");
  push @top_films, \%godfather2;

And then print out the with Data::Dumper:

  use IO::File;
  my $fh = IO::File->new("dumper",">")
    or die "Can't open 'dumper': $!)
  use Data::Dumper;
  print {$fh} Dumper \@top_films;

Which produces a file that looks like:

  $VAR1 = [
            {
              'stars' => {
                           'Diane Keaton' => 'Kay Adams-Corleone',
                           'Marlon Brando' => 'Don Vito Corleone',
                           'Al Pacino' => 'Michael \'Mike\' Corleone'
                         },
              'url' => 'http://us.imdb.com/Title?0068646',
              'name' => 'Godfather, The',
              'year' => 1972
            },
            {
              'stars' => {
                           'Morgan Freeman' => 'Ellis Boyd \'Red\' Redding',
                           'Tim Robbins' => 'Andy Dufresne',
                           'Bob Gunton' => 'Warden Samuel Norton'
                         },
              'url' => 'http://us.imdb.com/Title?0111161',
              'name' => 'Shawshank Redemption, The',
              'year' => 1994
            },
            {
              'stars' => {
                           'Diane Keaton' => 'Kay Adams-Corleone',
                           'Al Pacino' => 'Michael \'Mike\' Corleone',
                           'Robert Duvall' => 'Tom Hagen'
                         },
              'url' => 'http://us.imdb.com/Title?0071562',
              'name' => 'Godfather: Part 2, The',
              'year' => 1974
            }
          ];

That can then be read back in with eval like so:

  use IO::File;
  my $fh2 = IO::File->new("dumper","<")
    or die "Can't open 'dumper': $!)
  my @films_dumper;
  {
    # slurp in the whole file, rather than
    # a line at a time
    local $/; 
    # load all the data
    my $data = <$fh>;
    @films_dumper = @{ eval $data };
  }

Now let's see how that works with YAML

  my $fh3 = IO::File->new("yaml",">")
    or die "Can't open 'yaml': $!)
  use YAML;
  print {$fh3} Dump \@top_films;

And that produces output like this

  --- #YAML:1.0
  - name: Godfather, The
    stars:
      Al Pacino: Michael 'Mike' Corleone
      Diane Keaton: Kay Adams-Corleone
      Marlon Brando: Don Vito Corleone
    url: http://us.imdb.com/Title?0068646
    year: 1972
  - name: Shawshank Redemption, The
    stars:
      Bob Gunton: Warden Samuel Norton
      Morgan Freeman: Ellis Boyd 'Red' Redding
      Tim Robbins: Andy Dufresne
    url: http://us.imdb.com/Title?0111161
    year: 1994
  - name: 'Godfather: Part 2, The'
    stars:
      Al Pacino: Michael 'Mike' Corleone
      Diane Keaton: Kay Adams-Corleone
      Robert Duvall: Tom Hagen
    url: http://us.imdb.com/Title?0071562
    year: 1974

I'll hope you'll agree with me that the output is quite easy to understand. This can be read back in like so

  use IO::File;
  my $fh4 = IO::File->new("yaml","<")
    or die "Can't open 'yaml': $!)
  my @films_yaml;
  {
    # slurp in the whole file, rather than
    # a line at a time
    local $/; 
    # load all the data
    my $data = <$fh4>;
    @films_yaml = @{ Load($data) };
  }

  • Data::Dumper
  • Data::Denter - Pure Perl predecessor to YAML