Perl Advent Calendar 2010-12-01

Tangled Tidings

by Jerrad Pierce

We begin this year's calendar with a tool to help the adept Perl hacker cope with laziness, be it the laziness of selves past or someone else. YAPE::Regex::Explain is a package in the YAPE​ family which can untangle the Christmas lights of Perl… regular expressions. YAPEREE​ turns line noise into English explanations. The unstyled output from:

% perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr%<([^\s>]+)(?:\s+[^>]*?)?(?:/|>.*?</\1)>%)->explain'

looks like the following:

The regular expression:

(?s-imx:<([^\s>]+)(?:\s+[^>]*?)?(?:/|>.*?</\1)>)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?s-imx:                 group, but do not capture (with . matching
                         \n) (case-sensitive) (with ^ and $ matching
                         normally) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  <                        '<'
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^\s>]+                  any character except: whitespace (\n,
                             \r, \t, \f, and " "), '>' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    [^>]*?                   any character except: '>' (0 or more
                             times (matching the least amount
                             possible))
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    /                        '/'
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    >                        '>'
----------------------------------------------------------------------
    .*?                      any character (0 or more times (matching
                             the least amount possible))
----------------------------------------------------------------------
    </                       '</'
----------------------------------------------------------------------
    \1                       what was matched by capture \1
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
  >                        '>'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Pretty nifty, but arguably a somewhat redundant and unfortunate default given the more useful regex mode—no P!—which you can use to create a skeleton /x-style commented regexp like the one at the end of this document.

There's also a misleadingly named silent mode, which is a sort of regular expression pretty printer:

% perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr%<([^\s>]+)(?:\s+[^>]*?)?(?:/|>.*?</\1)>%g)->explain'

(?sx-im:

  <

  (

    [^\s>]+

  )

  (?x:

    \s+

    [^>]*?

  )?

  (?x:

    /

   |

    >

    .*?

    </

    \1

  )

  >


)

Although the POD notes that the module can parse some expressions passed as strings, this can fail, so you are better off passing everything through qr// first. Perhaps the largest drawback to the current version of the module, for those who can read it's English output at least, is that it does not include support for syntax added since 5.6 e.g;

Thankfully, Explain is aware of the easily confused positive/negative look-ahead/behind e.g; /(?<!foo)bar(?=quz)/

(?x-ims:               # group, but do not capture (disregarding
                       # whitespace and comments) (case-sensitive)
                       # (with ^ and $ matching normally) (with . not
                       # matching \n):

    (?<!                 # look behind to see if there is not:

      foo                  # 'foo'

  )                      # end of look-behind

  bar                    # 'bar'

  (?=                    # look ahead to see if there is:

      quz                  # 'quz'

  )                      # end of look-ahead


)                      # end of grouping
More»
View Source (POD)