Perl Advent Calendar 2016http://perladvent.org/2016/2024-03-13T21:03:10ZMark FowlerXML::Atom::SimpleFeedHere Comes Santa Claushttp://perladvent.org/2016/2016-12-25.html<div class='pod'><p>Merry Christmas one and all. As we bring the seventeenth Perl Advent Calendar to close for another year, we extend our season's greetings and warm wishes to Perl programmers throughout the world.</p>
<h3 id="In-Numbers">In Numbers:</h3>
<p>Each year I'm amazed by the work the volunteers for the Perl Advent Calendar do without complaint or recompense. To put it in context this year we managed:</p>
<ul>
<li><p>7,002 lines (with the longest article being 1,015 lines long!)</p>
</li>
<li><p>40,052 words</p>
</li>
<li><p>341,911 characters</p>
</li>
<li><p>11 images</p>
</li>
<li><p>5 data files</p>
</li>
<li><p>7 extra example scripts</p>
</li>
<li><p>22 authors</p>
</li>
<li><p>310 commits (so far, we still haven't found <i>all</i> the typos..)</p>
</li>
</ul>
<p>That's an amazing job everyone, well done!</p>
<h3 id="With-Thanks">With Thanks</h3>
<p>Year after year the Perl Advent Calendar is our gift to you the Perl Community. The Perl Advent Calendar hopes to teach you all a little something more about Perl in a fun and playful way. Sometimes we focus more on the teaching, making things clearer and easier for every Perl programmer to understand. And sometimes we focus just on the fun...</p>
<pre><code class="code-listing">#!perl<br />eval eval '"'.<br /><br /><br /> '#'.'!'.(<br /> '['^'+').(('`')| '%').('['<br /> ^ ( ( ( ( ( ( (( ')'))))))<br /> ))) .( ( ( ( ( ( ( ( '`'))))))<br /> )|',').('!'^ ( ( ( ( ( ( ( '+')))))))).('!'^<br /> '+').('['^'+').('[' ^ ( ( ( ( ( ( ( ( ')')))))))))<br /> .('`'|')').("\`"| ( ( ( ( ( ( '.' ) )))))).''. ('['^<br /> '/').('{'^'[').''. ( ( ( ( ( ( ( ( '\\')))))) ) ) .'"'<br /> .('`'^'-').('`'| '%' ) . ( ( '[' ) ^')').('[' ^ ( ( ')'<br /> ))).+( '['^'"').('{'^'[').('`'^'#').('`' | ( ( ( ((<br /> '(')))))).('['^')').("\`"| ')').('[' ^ ( ( ( '('<br /> )))).('['^'/').("\`"| '-').('`'|'!').("\["^ '('). ( ( ( '{'<br /> ))^+ '[').('`'^'&') .(('[')^ ')'). ( ( ((<br /> '`' )))| ( '/')) .''. ( ((<br /> '`' ))|'-').('{' ^+ ( (( '[') ))) .+(<br /> '{'^'/' ).( (( (( (( '`') ))))<br /> )|"\(").( '`'| '%') . ((( '{')) ^(( '[' )))<br /> .''. +( '{'^ '+') .+( ('`')| ('%')).( '[' ^+<br /> ')' ).''. ('`' |"\,").( (( '{' )<br /> )^+ (( (( '[' )) )) ).( (<br /> '`')^ (( (( '!' )) ))).''. ('`'|'$').('['^ '-')<br /> .( '`'|('%')).( '`' |'.'). ('['^'/'). ("\{"^ '['<br /> ). ('`'^'#').("\`"| '!' ).+( '`'|',') .('`'|'%') .+( ((<br /> '`'))|'.').("\`"| '$') .+( '`' |'!') .('['^"\)").'\\'. ((<br /> '"') ).("\!"^ ('+')). ( "\""); $:='.'^'~';$~='@'|'('; $^=<br />(( (( ( ')') ))) )^"\[";$/= '`'|'.';$,='('^"\}";<br />$\= ('`')| ( '!') ;$:= ')' ^'}';$~='*'|'`';<br /> $^= (( (( (( '+' )))) ))^ (( '_'));$/='&' |'@'<br />;$,= ( (( '[')) )&(( '~' ));($\)= ','^'|';$:=<br />'.'^('~');$~= "\@"| '(' ;$^=(')')^ '[';$/='`'|"\."; $,=<br />'('^'}';$\= (( "\`"))| "\!"; $:=')'^'}';$~='*' |((<br /> "\`")); $^= '+'^"\_";$/= '&'|'@';$,="\["& '~';$\=(',')^ ((<br /> '|'));$:='.'^"\~"; $~='@'|'(';$^=')'^('[');$/= '`'|('.');$,= (<br /> '(')^'}';$\='`'| '!';$:=')'^'}';$~=('*')| '`';$^=('+')^ (<br /> '_');$/='&'|'@';$,= '['&'~';$\=','^('|');$:= '.'^('~');$~= (<br /> '@')|'(';$^= (')')^ '[';$/='`'|('.');$,= '('^('}');$\= (<br /> '`')|'!';$: =(( (( ')'))))^'}';$~ ='*'|('`');$^= (<br /> '+')^"\_"; $/ ='&'|'@';$,= '['&'~';$\=',' ^<br /> '|';$:='.' ^(( ('~')));$~= '@'|'(';$^=')'^ ((<br /> "\["));$/= '`' |'.';$,='(' ^'}';$\='`'|"\!"; $:<br /> =')'^'}' ;($~) ='*'|'`';$^ ='+'^'_';$/="\&"| ((<br /> '@')); $,='['&'~';$\ =( ',')^'|';$:='.'^'~' ;(<br /> $~)=('@')| '(';$^= ')'^ '[';$/='`'|('.');$,= ((<br /> '('))^"\}"; ($\) ='`'|'!';$:=')'^('}');$~= '*'<br /> |+ '`';$^='+'^'_';$/='&'|'@';$,='['&('~');$\= ','<br /> ^+ '|';$:='.'^'~' ; $~='@'|"\(";$^= ')'<br /> ^+ '[';$/="\`"| "\."; $,="\("^ '}';<br /> $\ ='`'|'!'; $:=')'^ "\}";<br /> $~ ="\*"| '`';$^='+'^ "\_";$/=<br /> '&' |'@';$,='['&"\~"; $\=','^'|';$:<br /> ="\."^ '~';$~='@'|'(';$^=')'^'[';$/='`'|'.';$,<br /> ='('^'}';$\='`'|'!';$:=')'^'}';$~='*'|'`';$^<br /> ='+' ^'_';$/= '&'|'@';$,='['&'~'<br /> ; $\=','^'|';$:= (<br /> ( ( (<br /> ( ( (<br /> (( (( ((<br /> '.') )))))) )))<br /> )))^"\~";$~= '@'| "\(";<br /> ( ($^)) =( ')')^"\[";$/=<br /> ( '`')| ( '.');$, =<br /> ( '(')^ ( '}');$\ =<br /> ( '`')| ( '!');$: =<br /> ')' ^'}'; ( $~)='*' |<br /> '`' ;$^='+'^ '_';$/= (<br /> '&' )|('@');$,= '['&'~' ;<br /> $\= ','^'|'; $:= '.'^'~' ;(<br /> $~)= '@'|'(' ;$^=')' ^+<br /> '[';$/="\`"| '.'; ($,)<br /> =('(')^ '}';$\='`'<br /> |'!'</code></pre>
<p>Until next year. Happy Holidays!</p>
</div>2016-12-25T00:00:00ZMark FowlerThe Grinch's Well-Tested Second Attempthttp://perladvent.org/2016/2016-12-24.html<div class='pod'><p>The Grinch's earlier attempt to steal Christmas failed. This was documented by Dr. Seuss (PhD, Rhymeology) many years ago. However, Seuss's account, while correct insofar as it went, does not tell the full story. While The Grinch's heart did grow three sizes, the effect was temporary. It has now returned to its prior (two sizes too small) size, and The Grinch is determined to steal Christmas properly this time.</p>
<p>This time, however, The Grinch is much better armed. The Grinch is going to steal Christmas using modern tools, and he's hired me to help him!</p>
<h3 id="l33t-hax0r-7oolz">l33t hax0r 7oolz</h3>
<p>With so much commerce taking place online these days, it's natural to focus on stealing Christmas by hacking. Why steal all the presents by hand when he can simply divert the shipments to him directly? It's simpler and more elegant.</p>
<p>So The Grinch and I have been working on a suite of cybercrime tools. The target is the online mega-retailer Nile.com. We're following good OO design principles, and we've broken up these tools up into well-defined components. Each component is a stage in the hacking process, with the ultimate goal being to divert all orders for children's presents to The Grinch.</p>
<p>We'll take a look at the <code>Grinch::Netfiltrator</code> class, which implements the logic necessary to break into the Nile.com internal network, where additional tools can then be deployed.</p>
<h3 id="Testing-with-Test2::Suite">Testing with Test2::Suite</h3>
<p>We want to make sure this all works. After all, you only get one chance each year to steal Christmas. That means writing tests for all of our code. We're using <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a>, which provides a rich set of tools for testing data structures, objects, and more. Even better, it provides really nice output when a test fails.</p>
<h4 id="Bundles">Bundles</h4>
<p>The <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a> distribution ships with several bundle modules. Each bundle exports a set of testing subroutines. The <a href="https://metacpan.org/module/Test2::Bundle::More">Test2::Bundle::More</a> module exports subroutines that are almost a drop-in replacement for <a href="https://metacpan.org/module/Test::More">Test::More</a>, including <code>ok</code>, <code>is</code>, <code>like</code>, <code>diag</code>, etc. The <a href="https://metacpan.org/module/Test2::Bundle::Extended">Test2::Bundle::Extended</a> module exports even more functions, as well as loading some useful plugins. You get all the familiar subroutines like <code>ok</code> and <code>is</code>, plus many more.</p>
<p>However, under the hood, familiar subs like <code>is</code> and <code>like</code> are much more powerful. These work a lot more like <code>cmp_deeply</code> from <a href="https://metacpan.org/module/Test::Deep">Test::Deep</a>. The Extended bundle also gives you a lot of special comparison tools, similar to what <a href="https://metacpan.org/module/Test::Deep">Test::Deep</a> provides. The biggest difference is that with this bundle most of the comparisons are defined using a DSL rather than complex data structures.</p>
<h4 id="Writing-Some-Tests">Writing Some Tests</h4>
<p>Let's get back to our hacking tools. We'll start with some tests for <code>Grinch::Netfiltrator</code>. It has a method named <code>find_nile_servers</code> that scans the entire Internet to find servers owned by Nile. Let's not worry about how it does that, we'll just look at the tests for the method's return value.</p>
<p>Our first version of this method returned an array reference of hash references. With some mocking in place, the test code looked like this:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Test2::Bundle::Extended</span><span class="structure">;</span><br /><br /><span class="comment"># Mocking goes here ...<br /></span><br /><span class="keyword">my</span> <span class="symbol">$servers</span> <span class="operator">=</span> <span class="word">Grinch::Netfiltrator</span><span class="operator">-></span><span class="word">new</span><span class="operator">-></span><span class="word">find_nile_servers</span><span class="structure">;</span><br /><span class="word">is</span><span class="structure">(</span><br /> <span class="symbol">$servers</span><span class="operator">,</span><br /> <span class="word">array</span> <span class="structure">{</span><br /> <span class="word">item</span> <span class="word">hash</span> <span class="structure">{</span><br /> <span class="word">field</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="single">'www.nile.com'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ip</span> <span class="operator">=></span> <span class="single">'1.2.3.4'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ssh_port</span> <span class="operator">=></span> <span class="number">443</span><span class="structure">;</span><br /> <span class="structure">};</span><br /> <span class="word">item</span> <span class="word">hash</span> <span class="structure">{</span><br /> <span class="word">field</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="single">'www2.nile.com'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ip</span> <span class="operator">=></span> <span class="single">'1.2.3.5'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ssh_port</span> <span class="operator">=></span> <span class="number">447</span><span class="structure">;</span><br /> <span class="structure">};</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'got the expected servers back'</span><br /><span class="structure">);</span></code></pre>
<p>The subroutines <code>array</code>, <code>item</code>, <code>hash</code>, and <code>field</code> are all exported from <a href="https://metacpan.org/module/Test2::Bundle::Extended">Test2::Bundle::Extended</a>. We put these all together to declaratively define what we expect a complex data structure to look like.</p>
<p>The <code>array</code> sub takes a code reference which provides further details of the expected array. Inside that sub, a call to <code>item</code> takes an optional index and a value check. If we don't provide an index, it just uses the next index (starting at 0).</p>
<p>The value check can be many things. If we give it a plain scalar (including <code>undef</code>), we're asking it to check for that literal value. So if we wrote <code>item 42</code> we'd be saying that the next item in the arrayref should be the value <code>42</code>.</p>
<p>We can also give it a check defined by further calls to subroutines provided by <code>Test2::Bundle::Extended</code>. We could write <code>item T()</code>. The <code>T</code> subroutine matches any Perlishly true value. Or we could write <code>item F()</code>, where <code>F</code> matches any Perlishly false value.</p>
<p>In our case, we're using <code>hash</code> to define the hash we expect to see. Just like with <code>array</code>, the <code>hash</code> sub takes a coderef that defines the hash contents. We use <code>field</code> to name each of the fields we expect to see, along with their values. The values are just like those passed to <code>item</code>. They can be literals, checks exported by the bundle, or even complex validators that you define on the fly.</p>
<p>So what happens if a check fails? The <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a> tools give us very detailed information on the failure:</p>
<pre><code> # Failed test 'got the expected servers back'
# at t/netfiltrator.t line 46.
# +---------+------------------+----+---------+--------+
# | PATH | GOT | OP | CHECK | LNs |
# +---------+------------------+----+---------+--------+
# | | ARRAY(0x1502660) | | <ARRAY> | 38, 45 |
# | [1] | HASH(0x18c12b0) | | <HASH> | 44 |
# | [1]{ip} | 1.2.3.5 | eq | 1.2.3.6 | 42 |
# +---------+------------------+----+---------+--------+</code></pre>
<p>We can see that the diagnostics show the exact path to the failure, including the checks that succeeded before the failure, the failed check, and the lines where all of these things were defined. This makes debugging test failures <i>much</i> easier!</p>
<p>In this particular case we can see that the <code>$servers->[1]{ip}</code> contains the value "1.2.3.5" when we expected "1.2.3.6". The failing check was defined at line 42 in our test file.</p>
<p>But there's something missing here. What if one of the hashes has other, unexpected keys? And what if the arrayref being returned has more than the two elements we're testing? Right now we won't catch that at all. That's no good. In fact, <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a> will warn you about this and suggest some ways to fix it.</p>
<p>In our case we want to fix this by adding calls to <code>end</code> in the appropriate spots:</p>
<pre><code class="code-listing"><span class="word">is</span><span class="structure">(</span><br /> <span class="symbol">$servers</span><span class="operator">,</span><br /> <span class="word">array</span> <span class="structure">{</span><br /> <span class="word">item</span> <span class="word">hash</span> <span class="structure">{</span><br /> <span class="word">field</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="single">'www.nile.com'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ip</span> <span class="operator">=></span> <span class="single">'1.2.3.4'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ssh_port</span> <span class="operator">=></span> <span class="number">443</span><span class="structure">;</span><br /> <span class="word">end</span><span class="structure">();</span><br /> <span class="structure">};</span><br /> <span class="word">item</span> <span class="word">hash</span> <span class="structure">{</span><br /> <span class="word">field</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="single">'www2.nile.com'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ip</span> <span class="operator">=></span> <span class="single">'1.2.3.5'</span><span class="structure">;</span><br /> <span class="word">field</span> <span class="word">ssh_port</span> <span class="operator">=></span> <span class="number">447</span><span class="structure">;</span><br /> <span class="word">end</span><span class="structure">();</span><br /> <span class="structure">};</span><br /> <span class="word">end</span><span class="structure">();</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'got the expected servers back'</span><br /><span class="structure">);</span></code></pre>
<p>The <code>end</code> subroutine can be used inside the <code>array</code> and <code>hash</code> subs to say that we only expect the defined fields or items, not more. And if that fails we get this:</p>
<pre><code> # Failed test 'got the expected servers back'
# at t/netfiltrator.t line 67.
# +---------------+------------------+---------+------------------+--------+
# | PATH | GOT | OP | CHECK | LNs |
# +---------------+------------------+---------+------------------+--------+
# | | ARRAY(0x17b3670) | | <ARRAY> | 57, 66 |
# | [1] | HASH(0x1b722a0) | | <HASH> | 64 |
# | [1]{username} | admin | !exists | <DOES NOT EXIST> | |
# +---------------+------------------+---------+------------------+--------+</code></pre>
<p>So we can see that the <code>$servers->[1]</code> hashref contains a <code>username</code> key that we did not expect.</p>
<h4 id="I-Object-Mr.-Grinch">I Object, Mr. Grinch</h4>
<p>I wasn't really happy with the way this class returns raw data structures. It was obvious that these data structures would be better off as objects. That way we could hack each server simply by writing <code>$server->hack</code>. Simple and elegant! I talked to The Grinch and he agreed. Once I'd implemented that change I needed to update the tests as well. Fortunately, <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a> has tools for testing objects as well. Here's what our test looks like now:</p>
<pre><code class="code-listing"><span class="word">is</span><span class="structure">(</span><br /> <span class="symbol">$servers</span><span class="operator">,</span><br /> <span class="word">array</span> <span class="structure">{</span><br /> <span class="word">item</span> <span class="word">object</span> <span class="structure">{</span><br /> <span class="word">prop</span> <span class="word">blessed</span> <span class="operator">=></span> <span class="single">'Grinch::Server'</span><span class="structure">;</span><br /> <span class="word">call</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="single">'www.nile.com'</span><span class="structure">;</span><br /> <span class="word">call</span> <span class="word">ip</span> <span class="operator">=></span> <span class="single">'1.2.3.4'</span><span class="structure">;</span><br /> <span class="word">call</span> <span class="word">ssh_port</span> <span class="operator">=></span> <span class="number">443</span><span class="structure">;</span><br /> <span class="structure">};</span><br /> <span class="word">item</span> <span class="word">object</span> <span class="structure">{</span><br /> <span class="word">prop</span> <span class="word">blessed</span> <span class="operator">=></span> <span class="single">'Grinch::Server'</span><span class="structure">;</span><br /> <span class="word">call</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="single">'www2.nile.com'</span><span class="structure">;</span><br /> <span class="word">call</span> <span class="word">ip</span> <span class="operator">=></span> <span class="single">'1.2.3.5'</span><span class="structure">;</span><br /> <span class="word">call</span> <span class="word">ssh_port</span> <span class="operator">=></span> <span class="number">446</span><span class="structure">;</span><br /> <span class="structure">};</span><br /> <span class="word">end</span><span class="structure">();</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'got the expected servers back'</span><br /><span class="structure">);</span></code></pre>
<p>We've replaced our use of <code>hash</code> with <code>object</code>. Inside the sub we pass to <code>object</code>, we can call a number of other subs, including <code>prop</code> and <code>call</code>. The <code>prop</code> sub is used to check meta-information about the object. We're checking what class it's blessed into here. The <code>call</code> sub calls the named method and looks for the named result.</p>
<p>And here's another example of what the failure diagnostics look like:</p>
<pre><code> # Failed test 'got the expected servers back'
# at t/netfiltrator.t line 108.
# +---------------+--------------------------------+----+------------------------+---------+
# | PATH | GOT | OP | CHECK | LNs |
# +---------------+--------------------------------+----+------------------------+---------+
# | | ARRAY(0xf49560) | | <ARRAY> | 98, 107 |
# | [1] | Grinch::Server=HASH(0x1481708) | | <OBJECT> | 105 |
# | [1] <blessed> | Grinch::Server | eq | Grinch::Server::Hacked | 101 |
# +---------------+--------------------------------+----+------------------------+---------+</code></pre>
<p>This tells us that our second object was expected to be a <code>Grinch::Server::Hacked</code> object but is instead just a <code>Grinch::Server</code>.</p>
<h4 id="Shorthand-for-Common-Cases">Shorthand for Common Cases</h4>
<p>For simple cases involving array and hash reference values, you don't need to write everything out using <code>array</code> and <code>hash</code>. Let's assume that our <code>ssh_port</code> method from above returns an arrayref. We can check that like:</p>
<pre><code> call ssh_port => [ 443, 444 ];</code></pre>
<p>Rather than writing this out with <code>array</code>, we can just use a literal array reference that contains the expected value. You can do the same thing with a hash ref. If the check fails, we get output that looks like this:</p>
<pre><code> # Failed test 'got the expected servers back'
# at t/netfiltrator.t line 144.
# +----------------------+--------------------------------+----+------------------------+----------+
# | PATH | GOT | OP | CHECK | LNs |
# +----------------------+--------------------------------+----+------------------------+----------+
# | | ARRAY(0x1385598) | | <ARRAY> | 134, 143 |
# | [1] | Grinch::Server=HASH(0x1743e18) | | <OBJECT> | 141 |
# | [1] <blessed> | Grinch::Server | eq | Grinch::Server::Hacked | 137 |
# | [1]->ssh_port()->[4] | <DOES NOT EXIST> | | 447 | |
# +----------------------+--------------------------------+----+------------------------+----------+</code></pre>
<p>If our method returns a list rather than an arrayref, that's easy to handle as well:</p>
<pre><code class="code-listing"><span class="word">call_list</span> <span class="word">ssh_ports</span> <span class="operator">=></span> <span class="structure">[</span> <span class="number">443</span><span class="operator">,</span> <span class="number">444</span> <span class="structure">];</span></code></pre>
<p>The <code>call_list</code> sub calls the method in list context, turns the return value into an arrayref, and compares it to the right hand side value. There is a hash version as well called <code>call_hash</code>.</p>
<h4 id="Regex-Checks">Regex Checks</h4>
<p>Maybe we don't want to check for a specific hostname. Instead, let's just check that this is any valid hostname. There is, of course, a module to do that, but for the sake of example we'll whip up a quick regex:</p>
<pre><code class="code-listing"><span class="word">call</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="word">matches</span> <span class="regexp">qr/\A\w+(?:\.\w+)+\z/</span><span class="structure">;</span></code></pre>
<p>If the regex check fails we get output that looks like this:</p>
<pre><code> # Failed test 'got the expected servers back'
# at t/netfiltrator.t line 184.
# +-----------------+--------------------------------+----+------------------------+----------+
# | PATH | GOT | OP | CHECK | LNs |
# +-----------------+--------------------------------+----+------------------------+----------+
# | | ARRAY(0x11f7598) | | <ARRAY> | 174, 183 |
# | [1] | Grinch::Server=HASH(0x16fe418) | | <OBJECT> | 181 |
# | [1]->hostname() | www2.nile.com!@$!@ | =~ | (?^:\A\w+(?:\.\w+)+\z) | 178 |
# +-----------------+--------------------------------+----+------------------------+----------+</code></pre>
<h4 id="Arbitrary-Checks">Arbitrary Checks</h4>
<p>That regex is terrible. Let's use <a href="https://metacpan.org/module/Data::Validate::Domain">Data::Validate::Domain</a> instead. We can wrap its <code>is_hostname</code> sub to provide much better hostname checking:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Data::Validate::Domain</span> <span class="words">qw( is_hostname )</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$hostname_check</span> <span class="operator">=</span> <span class="word">validator</span><span class="structure">(</span> <span class="word">is_hostname</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">is_hostname</span><span class="structure">(</span><span class="magic">$_</span><span class="structure">)</span> <span class="structure">}</span> <span class="structure">);</span><br /><br /><span class="operator">...</span><br /><span class="word">call</span> <span class="word">hostname</span> <span class="operator">=></span> <span class="symbol">$hostname_check</span><span class="structure">;</span></code></pre>
<p>The <code>validator</code> sub call can take a number of forms. In this case we've given it a name (used in diagnostic output) and a subroutine that implements the check, returning true or false based on the value in <code>$_</code>.</p>
<p>If that validator fails we get output that looks like this:</p>
<pre><code> # Failed test 'got the expected servers back'
# at t/netfiltrator.t line 207.
# +-----------------+--------------------------------+-----------+-------------+----------+
# | PATH | GOT | OP | CHECK | LNs |
# +-----------------+--------------------------------+-----------+-------------+----------+
# | | ARRAY(0x2a8c670) | | <ARRAY> | 197, 206 |
# | [1] | Grinch::Server=HASH(0x2f82638) | | <OBJECT> | 204 |
# | [1]->hostname() | www2.nile.com!@$!@ | CODE(...) | is_hostname | 188 |
# +-----------------+--------------------------------+-----------+-------------+----------+</code></pre>
<h3 id="More-Tools">More Tools</h3>
<p>This is just a small sample of the test comparisons supported by <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a>. This distribution has a variety of helpers for checking definedness, whether elements of arrays and hashes exist or not, and much, much more. And you can extend it simply by writing your own class which inherits from <a href="https://metacpan.org/module/Test2::Compare::Base">Test2::Compare::Base</a>.</p>
<p>My experience when using <a href="https://metacpan.org/module/Test2::Suite">Test2::Suite</a> to test The Grinch's hacking tools has been great. The expressive declarative testing syntax, combined with the excellent output on failures, has helped me find and fix dozens of bugs. I think that The Grinch is going to steal Christmas in a big way this year!</p>
<h3 id="References">References</h3>
<p>To see the actual test code, go to <a href="https://github.com/autarch/perl-advent-calendar-2016-test2">https://github.com/autarch/perl-advent-calendar-2016-test2</a>. You can fiddle with the mocked values to produce different kinds of failure output if you're curious.</p>
</div>2016-12-24T00:00:00ZDave Rolskyspeeding up the inter-webhttp://perladvent.org/2016/2016-12-23.html<div class='pod'><p>So, after years using mainframes to get the job done and later using monolithic applications, it became time to have a more lean architecture. Service Oriented Architectures or Reactive Microservices or ...</p>
<p>Tourmaline the newbie Elf got all excited with the new task and had been jumping for days and started to build some nice REST-api with Dancer2. And as good developers do, first a working proof of concept... and then make it fast.</p>
<p>However, the biggest speed increase was not in writing optimised code, it was in making sure that the services wouldn't get overloaded with repeating request when nothing had changed!</p>
<h3 id="Let-caches-temporarily-store-the-responses">Let caches temporarily store the responses!</h3>
<p>"Caching?!" shouted the the old grumpy Elves, "that is evil magic that get things go kaput!". But it didn't stop the keen young devoted Elf. They watched the <i>'Talking Heads'</i> presentation, <a href="https://www.youtube.com/watch?v=E7OO83xvWPY">REST API's don't need to be your 'psycho-killer'</a>. Was it not just a matter of knowing what goes on inside those heads of HTTP requests and HTTP responses? But which header and how should Tourmaline the Elf do it? Was it enough to add <code>Expires: on Christmas Eve</code> to the response?</p>
<h3 id="RFC-7234---HTTP-Caching">RFC 7234 - HTTP Caching</h3>
<p>Tourmaline decided he should just sit down and read the official documentation in the form of RFC7234. A hefty document, it attempts to describe HTTP caching - essentially the ability to re-use a document you'd previously downloaded rather than fetch it again.</p>
<p>Like most RFCs it starts with introduction, specification, definition, credits, IANA concerns, copyright table of contents and a lot more blah blah. The remaining of the 41 pages is still a nightmare to go through, but basically is about three things:</p>
<dl>
<dt>Storing responses in a cache</dt>
<dd>
<p>When can a cache store a response, what to do with errors, and is it safe to store it in intermediate responses on public servers?</p>
</dd>
<dt>Reusing responses</dt>
<dd>
<p>If a request comes in, can it just reuse the response, or are there more checks that need to be done?</p>
</dd>
<dt>HTTP Header fields</dt>
<dd>
<p>There are only a few headers that really play a role in a response: the validation header field <code>Last-Modified</code> and <code>ETag</code>, and then two others, <code>Vary</code> and C <Cache-Control>.</p>
</dd>
</dl>
<p>Those validation headers are needed by the cache to revalidate the stored responses and they are used with a GET request to conditionally return a new response. In other words as part of the GET request header the client passes meta information about what it's already got cached to the server and the server will decide to respond based on this metadata with either an updated copy of the file or simply a response that indicates that they cached copy is okay to reuse. This metadata is the same data the server sent to the client in the response header when it delivered the original content that is now in its cache. The server will conditionally return based on the <code>If-Modified- Since</code> header, the previously returned <code>Last-Modified</code> date or <code>If-None-Match</code> from the given <code>ETag</code>s. An ETag uniquely differentiates two different versions of a resource and the forms it can take are varied; Some servers use a MD5 hash over the values, but a incremental serial number will do too.</p>
<h3 id="Dancer2::Plugin::HTTP::ConditionalRequest---no-no-no">Dancer2::Plugin::HTTP::ConditionalRequest - no no no</h3>
<p>Dancer2 can make such requests conditionally by using a small plugin:</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="word">Dancer2::Plugin::HTTP::ConditionalRequest</span><span class="structure">;</span><br /><br /><span class="operator">...</span><br /><br /><span class="word">get</span> <span class="single">'/catalog'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /><br /> <span class="word">http_conditional</span> <span class="structure">{</span><br /> <span class="word">last_modified</span> <span class="operator">=></span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">http_last_modified</span><span class="structure">(</span><span class="single">'Catalog'</span><span class="structure">)</span><br /> <span class="structure">};</span><br /><br /> <span class="operator">...</span><br /><br /><span class="structure">}</span></code></pre>
<p>Not only will it set the values for a new response, more importantly, it will check with the request header fields if to continue or not. If the pre-conditions are met then the Dancer application will continue. If not - in case of a GET request - if the resource is NOT Not-Modified-Since, or NOT None-Match - If the pre-conditions are not met, the Dancer app will stop here returning a status code <code>304</code> (<code>Not modified</code>) back to the cache or the client when it was a request with a 'safe' method.</p>
<h3 id="Dancer2::Plugin::HTTP::ContentNegotiation---your-representation-can-vary">Dancer2::Plugin::HTTP::ContentNegotiation - your representation can vary</h3>
<p>Everybody at the north pole has their favourite data representation. Santa loves YAML. The new kids around the block think that the world only knows JSON. But Santa just relaxes while the JSON usage constantly changes, new specs, new ideas about data-type, new this new that (Actually, Santa looooooves XML).</p>
<p>Dancer2 does already come with some add-ons that make it possible to let the client (and Santa) decide what <code>Content-Type</code> the response they would like returned to them. But, as shown during that <i>Talking Heads</i>, it is nothing REST like:</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="word">Dancer2::Plugin::HTTP::ContentNegotiation</span><span class="structure">;</span><br /><br /><span class="operator">...</span><br /><br /><span class="word">get</span> <span class="single">'/catalog'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /><br /> <span class="operator">...</span><br /><br /> <span class="word">my</span> <span class="symbol">@list</span> <span class="operator">=</span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">list</span><span class="structure">(</span> <span class="word">Catalog</span> <span class="operator">=></span> <span class="single">'all'</span> <span class="structure">);</span><br /><br /> <span class="word">http_choose_media_type</span> <span class="structure">(</span><br /> <span class="single">'application/json'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">to_json</span> <span class="cast">\</span><span class="symbol">@list</span><span class="operator">,</span> <span class="structure">{</span><span class="word">canonical</span> <span class="operator">=></span> <span class="number">1</span><span class="structure">}}</span><span class="operator">,</span><br /> <span class="single">'application/x-yaml'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">to_yaml</span> <span class="cast">\</span><span class="symbol">@list</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">default</span> <span class="operator">=></span> <span class="core">undef</span> <span class="structure">}</span><span class="operator">,</span> <span class="comment"># default is 406: Not Acceptable</span><br /> <span class="structure">);</span><br /><span class="structure">}</span></code></pre>
<p>By using in the request the <code>Accept</code> header field, one can choose what output one will get. Setting the default to <code>undef</code> dictates the client to specify one rather then defaulting to the first in the list.</p>
<p>But take note of what also happens... remember that our dedicate Elf Tourmaline was concerned about caching! Not only does <code>http_choose_media_type</code> set the C <Content-Type> response header, as a bonus it sets the <code>Vary</code> header too. This <code>Vary</code> header informs the cache that there are more variants of this resource. Each of those variants must be saved separately and the cache should figure out which of the variants shall be use to hand to the client.</p>
<h3 id="Dancer2::Plugin::HTTP::Caching---doesnt-do-any-caching">Dancer2::Plugin::HTTP::Caching - doesn't do any caching!</h3>
<p>Elf D. wasn't yet so sure about the whole caching thing. She knows that those caches can be naughty and she just wants to be in control about what they do and do not do. She was not happy with confidential information being scattered in these caches all around the world. Private sensitive information about kids (and parents) can not be trusted to the wide open Internet! She wanted to be in control about how long the caches can keep their data and when should be checked if it's still valid.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="word">Dancer2::Plugin::HTTP::Caching</span><span class="structure">;</span><br /><br /><span class="operator">...</span><br /><br /><span class="word">get</span> <span class="single">'/catalog'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /><br /> <span class="operator">...</span><br /><br /> <span class="word">http_cache_max_age</span> <span class="number">3600</span><span class="structure">;</span> <span class="comment"># one hour</span><br /> <span class="word">http_cache_private</span><span class="structure">;</span><br /><br /> <span class="operator">...</span><br /><br /><span class="structure">}</span></code></pre>
<p>The only thing the plugin provides are a bunch of keywords and do some sanity checks on the parameters following them.</p>
<h3 id="Ready-to-roll">Ready to roll!</h3>
<p>Yes, it magically all works and the sound of jingling bells can fill the world!</p>
<p><a href="https://metacpan.org/module/Dancer2::Plugin::HTTP::Bundle">Dancer2::Plugin::HTTP::Bundle</a> does it all together. And ol' Dave, the wise Elf has suggested some very useful improvements.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Santa</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Dancer2</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Dancer2::Plugin::HTTP::Bundle</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Santa::Helper</span><span class="structure">;</span><br /><br /><span class="word">get</span> <span class="single">'/catalog'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /><br /> <span class="word">http_cache_max_age</span> <span class="number">30</span><span class="structure">;</span> <span class="comment"># half a minute</span><br /> <span class="word">http_cache_private</span><span class="structure">;</span><br /><br /> <span class="word">http_conditional</span> <span class="structure">{</span><br /> <span class="word">last_modified</span> <span class="operator">=></span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">http_last_modified</span><span class="structure">(</span> <span class="word">Catalog</span> <span class="operator">=></span> <span class="core">undef</span> <span class="structure">)</span><span class="operator">,</span><br /><br /><br /> <span class="word">my</span> <span class="symbol">@list</span> <span class="operator">=</span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">list</span><span class="structure">(</span> <span class="word">Catalog</span> <span class="operator">=></span> <span class="single">'all'</span> <span class="structure">);</span><br /> <span class="word">http_choose_media_type</span> <span class="structure">(</span><br /> <span class="single">'application/json'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">to_json</span> <span class="cast">\</span><span class="symbol">@list</span><span class="operator">,</span> <span class="structure">{</span><span class="word">canonical</span> <span class="operator">=></span> <span class="number">1</span><span class="structure">}}</span><span class="operator">,</span><br /> <span class="single">'application/x-yaml'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">to_yaml</span> <span class="cast">\</span><span class="symbol">@list</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">default</span> <span class="operator">=></span> <span class="core">undef</span> <span class="structure">}</span><span class="operator">,</span> <span class="comment"># default is 406: Not Acceptable</span><br /> <span class="structure">);</span><br /><span class="structure">};</span><br /><br /><span class="operator">...</span><br /><br /><span class="word">get</span> <span class="single">'/catalog/:uuid'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$uuid</span> <span class="operator">=</span> <span class="word">route_parameters</span><span class="operator">-></span><span class="word">get</span><span class="structure">(</span><span class="single">'uuid'</span><span class="structure">);</span><br /> <span class="keyword">unless</span> <span class="structure">(</span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">does_exists</span><span class="structure">(</span> <span class="word">Catalog</span> <span class="operator">=></span> <span class="symbol">$uuid</span> <span class="structure">)</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">status</span> <span class="single">'Not Found'</span><span class="structure">;</span><br /> <span class="keyword">return</span><br /> <span class="structure">}</span><br /><br /> <span class="word">http_cache_max_age</span> <span class="number">3600</span><span class="structure">;</span> <span class="comment"># a full hour</span><br /><br /> <span class="word">http_conditional</span> <span class="structure">{</span><br /> <span class="word">etag</span> <span class="operator">=></span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">http_etag</span><span class="structure">(</span> <span class="word">Catalog</span> <span class="operator">=></span> <span class="symbol">$uuid</span> <span class="structure">)</span><span class="operator">,</span><br /> <span class="structure">};</span><br /><br /> <span class="word">response_header</span> <span class="single">'Content-Type'</span> <span class="operator">=></span> <span class="single">'application/json'</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@languages_available</span> <span class="operator">=</span><br /> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">lang_available</span><span class="structure">(</span> <span class="word">Catalog</span> <span class="operator">=></span> <span class="symbol">$uuid</span> <span class="structure">);</span><br /> <span class="word">http_choose_language</span> <span class="structure">(</span><br /> <span class="cast">\</span><span class="symbol">@languages_available</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$data</span> <span class="operator">=</span> <span class="word">Santa::Helper</span><span class="operator">-></span><span class="word">find_in_language</span><span class="structure">(</span><br /> <span class="word">Catalog</span> <span class="operator">=></span> <span class="symbol">$uuid</span><span class="operator">,</span> <span class="word">http_chosen_language</span><br /> <span class="structure">);</span><br /> <span class="word">to_json</span><span class="structure">(</span> <span class="symbol">$data</span><span class="operator">,</span> <span class="structure">{</span><span class="word">canonical</span> <span class="operator">=></span> <span class="number">1</span><span class="structure">}</span> <span class="structure">)</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">default</span> <span class="operator">=></span> <span class="single">'en'</span> <span class="structure">}</span><br /> <span class="structure">)</span><br /><span class="structure">};</span></code></pre>
<h3 id="Kaput">Kaput!</h3>
<p>The grumpy ol' Elves were right. Things were not working at all! When Santa is checking the catalog, the application still makes request to the origin server and the Elves from the NorthPole Operation Center scratching their heads what went wrong with their Microservices.</p>
<p>Quickly the root of the problem was found... Since <a href="https://metacpan.org/module/LWP::UserAgent">LWP::UserAgent</a> the application uses does not know about caching, it just makes request, <code>GET</code>s results, <code>POST</code>s new stuff or <code>DELETE</code>s things directly on the server. But surely, there must be a way to cache the responses!</p>
<h3 id="RTFRC">RTFRC</h3>
<p>And surely, some Elf wrote <code>LWP::UserAgent::Cache</code> that only stores responses and gets it when the URL is the same, fast, but dirty and wrong if one talks to a REST-api.</p>
<p>Then another Elf wrote <code>LWP::UserAgent::WithCache</code>, and yeah, it knows about <code>If- Modified-Since</code></p>
<p>After that yet another Elf, wrote <code>LWP::UserAgent::CHICaching</code>, and while he was doing quite well obviously did think a lot about the spec.</p>
<p>And more and more and more... all written for what the Elves needed for them at that time for a specific reason...</p>
<p>None of these various modules properly respects the <code>Vary</code> header and makes content negotiation impossible, <code>Cache-Control</code> directives are ignored. But worst of all, none of these caches are invalidating the stored responses after an unsafe method like <code>POST</code>, <code>PUT</code> or <code>DELETE</code>. One can not imagine what will happen when serving old REST resources that have just been update or even deleted...</p>
<p>The sad Tourmaline Elf went back and thought about <i>TIMTOWTDI</i>. He thought about the warnings from the old Elves when quoting Phil Karlton about the two hard things in computer science.</p>
<h3 id="Making-the-first-hard-thing-...-easy">Making the first hard thing ... easy!</h3>
<p>Next morning Tourmaline woke up with a brilliant plan... for once and for all, make a UserAgent that does get it right! Simple replace LWP::UserAgent with something else!</p>
<p>He read the RFC over and over, studied it, front to back and the other way around and wrote a nasty piece of software that no one ever should use... <a href="https://metacpan.org/module/HTTP::Caching">HTTP::Caching</a> <i>The RFC 7234 compliant brains to do caching right</i>. It does know (almost) all about the RFC.</p>
<p>Of course, that's not the end of the story. HTTP::Cachine wasn't well written for <i>clients</i>. And it seems to be under continuous development and it might break stuff. So what we need is a something more LWP like that can make use of HTTP::Caching but use a familiar interface.....something like <a href="https://metacpan.org/module/LWP::UserAgent::Caching">LWP::UserAgent::Caching</a> instead.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="word">LWP::UserAgent::Caching</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">CHI</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$cache</span> <span class="operator">=</span> <span class="word">CHI</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">driver</span> <span class="operator">=></span> <span class="single">'File'</span><span class="operator">,</span><br /> <span class="word">root_dir</span> <span class="operator">=></span> <span class="single">'/tmp/LWP_UserAgent_Caching'</span><span class="operator">,</span><br /> <span class="word">file_extension</span> <span class="operator">=></span> <span class="single">'.cache'</span><span class="operator">,</span><br /> <span class="word">l1_cache</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">driver</span> <span class="operator">=></span> <span class="single">'Memory'</span><span class="operator">,</span><br /> <span class="word">global</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">max_size</span> <span class="operator">=></span> <span class="number">1024</span><span class="operator">*</span><span class="number">1024</span><br /> <span class="structure">}</span><br /><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$ua</span> <span class="operator">=</span> <span class="word">LWP::UserAgent::Caching</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="word">http_caching</span> <span class="operator">=></span> <span class="structure">{</span><span class="word">cache</span><span class="operator">=></span><span class="symbol">$cache</span><span class="structure">}</span> <span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$rqst</span> <span class="operator">=</span> <span class="word">HTTP::Request</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="word">GET</span> <span class="operator">=></span> <span class="single">'http://northpole.xxx/catalog'</span><span class="structure">);</span><br /><span class="symbol">$rqst</span><span class="operator">-></span><span class="word">header</span><span class="structure">(</span> <span class="word">Accept</span> <span class="operator">=></span> <span class="single">'application/x-yaml'</span> <span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$resp</span> <span class="operator">=</span> <span class="symbol">$ua</span><span class="operator">-></span><span class="word">request</span><span class="structure">(</span> <span class="symbol">$rqst</span> <span class="structure">);</span></code></pre>
<h3 id="Perl-to-make-easy-things-easy-and-making-hard-things-simple">Perl, to make easy things easy and making hard things simple!</h3>
<p>If even that is all too complicated, then Tourmaline made it really simple for you guys...with <a href="https://metacpan.org/module/LWP::UserAgent::Caching::Simple">LWP::UserAgent::Caching::Simple</a></p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="word">LWP::UserAgent::Caching::Simple</span> <span class="words">qw/get_from_json/</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$data</span> <span class="operator">=</span> <span class="word">get_from_json</span> <span class="structure">(</span><span class="single">'http://northpole.xxx/catalog'</span><span class="structure">);</span></code></pre>
<p>And yes, it does respect the rules written down in the RFC 7234 - so it will be a Merry Christmas after all!</p>
<p>Next time when you want to write a REST-api, please consider the <a href="https://metacpan.org/module/Dancer2::Plugin::HTTP::Bundle">Dancer2::Plugin::HTTP::Bundle</a>. And if you ever want to write an application to plan a pub-crawl near your hotel where you want to crash down after Christmas Eve... no need to build your own caches on top of your app. No more extra databases and figuring out what to keep and how long... Just keep it simple and let the slow HTTP stack handle itself with <a href="https://metacpan.org/module/LWP::UserAgent::Caching::Simple">LWP::UserAgent::Caching::Simple</a></p>
<p>Finally... Merry Christmas All! Ho ho ho!</p>
</div>2016-12-23T00:00:00ZTheo van HoeselAll I want for Christmas...Is Statistically Calcuablehttp://perladvent.org/2016/2016-12-22.html<div class='pod'><p>Elladryl was always looking for ways to improve customer satisfaction, and for years she'd been stumped by a particularly difficult to please group of children. Well, actually, they weren't difficult <i>per se</i>, more like inscrutable. Some children were so polite or meek, that they never told Santa what they wanted! Elladryl had tried everything: give them a random toy, give them the most popular toy that year, give them what their siblings or neighbors got to avoid envy… none of it worked. And since these kids were perpetually underwhelmed, some of the <i>scrooges</i> in accounting were proposing that they simply be given leftover stock from previous years! There had to be a better way.</p>
<p>One day, while sitting in the break room staring at a bit of corporate kitsch — a poster proclaiming "Every child is as unique as a snowflake "— Elladryl muttered "poppycock." She was an avid macro-photographer in the off-season, and more than once she'd snapped a picture of very similar looking snowflakes. Of course, that was it! She simply had to figure out which children were similar to those that hadn't made a request, and she'd have an idea of what they might like to receive. After a little digging around on CPAN, searching for various keywords, she found <a href="https://metacpan.org/module/Algorithm::KMeans">Algorithm::KMeans</a>: a tool for clustering multidimensional data. Perfect! In short, the module groups data points (read children) together based on as many criteria as one cares to use, such that the items in one group are more like one another than they are anything else, a sort of <a href="http://wias-berlin.de/software/tetgen/figs/s164-vd-bnd.gif">multidimensional Voronoi diagram</a>.</p>
<p>It's well known that Santa keeps psychological profiles of all the world's children — naughty or nice, <i>et cetera et cetera</i> — but that data is closely guarded, and Elladryl would have to demonstrate the need for and effectiveness of her idea. Fortunately, <a href="https://metacpan.org/module/Algorithm::KMeans">Algorithm::KMeans</a> has the ability to create synthetic data sets for testing purposes. It also has the ability determine the optimum number of clusters for a data set; a somewhat slow process in which it assesses the quality of fit for dividing the up into 2 through 16 clusters.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Algorithm::KMeans</span><span class="structure">;</span><br /><br /><span class="word">Algorithm::KMeans</span><span class="operator">-></span><span class="word">cluster_data_generator</span><span class="structure">(</span><br /> <span class="word">input_parameter_file</span> <span class="operator">=></span> <span class="single">'Big5.param'</span><span class="operator">,</span><br /> <span class="word">output_datafile</span> <span class="operator">=></span> <span class="single">'Big5-Data.csv'</span><span class="operator">,</span><br /> <span class="word">number_data_points_per_cluster</span> <span class="operator">=></span> <span class="number">128</span><span class="operator">,</span><br /><span class="structure">);</span><br /><br /><span class="separator">__DATA__</span><br /><span class="data">#Big5.param<br />#open conscientious extrovert agreeable neurotic<br /><br />#artsy<br />cluster<br />80 40 60 40 90<br /><br />1 0 0 0 0<br />0 1 0 0 0<br />0 0 1 0 0<br />0 0 0 1 0<br />0 0 0 0 1<br /><br />#sporty<br />cluster<br />50 60 90 80 20<br /><br />1 0 0 0 0<br />0 1 0 0 0<br />0 0 1 0 0<br />0 0 0 1 0<br />0 0 0 0 1<br /><br />#adventurous<br />cluster<br />80 70 20 50 20<br /><br />1 0 0 0 0<br />0 1 0 0 0<br />0 0 1 0 0<br />0 0 0 1 0<br />0 0 0 0 1<br /><br />#sciencey<br />cluster<br />60 90 40 40 60<br /><br />1 0 0 0 0<br />0 1 0 0 0<br />0 0 1 0 0<br />0 0 0 1 0<br />0 0 0 0 1</span></code></pre>
<p>After inputting some plausible parameters for the <a href="https://en.wikipedia.org/wiki/Big_Five_personality_traits">big five personality traits</a>, Elladryl ran the auto-clusterer over her synthetic sample of 512 children and was disappointed with the results: two clusters of 509 and 3 tots respectively. What did she do wrong? After racking her brain and trying everything, on her umpteenth run the clusterer created two similarly-sized groups. Success, of a sort. There were clearly more than two types of children in the world, nerd jokes about binary aside, so what had changed? It turns out that in the most recent run, while playing with the data, she had renormalized the variables from 0 to 10 rather than 0 to 1. Interesting, for some reason the algorithm seemed to be sensitive to the magnitude of the inputs. She rescaled the data again, from 0 to 100, and ended up with a much more plausible result of four clusters. Huzzah!</p>
<pre><code> data dimensionality: 9
Value of Kmax is: 16
Clustering for K = 2
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 3
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 4
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 5
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 6
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 7
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 8
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 9
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 10
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 11
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 12
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 13
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 14
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 15
. . . . . . . . . . . . . . . . . . . . .
Clustering for K = 16
. . . . . . . . . . . . . . . . . . . . .
Displaying final clusters for best K (= 4) :
Cluster 0 (128 records):
0 10 105 108 112 113 114 123 124 126 131 133
135 137 138 139 141 148 150 152 154 156 162 167
168 178 179 18 180 181 182 191 192 195 197 198
2 20 200 206 207 210 213 241 242 245 251 261
262 269 272 275 276 278 279 28 285 288 292 293
294 299 3 306 308 316 326 328 333 335 339 34
343 344 350 354 356 358 365 368 372 384 385
387 392 398 402 410 413 414 419 424 43 444 445
448 453 457 459 464 469 470 473 474 48 483 484
489 497 498 50 501 503 508 510 58 62 65 73 74
79 8 81 82 90 94 96 98
Cluster 1 (128 records):
1 107 110 118 119 128 13 132 142 151 153 159
165 169 17 171 183 185 187 188 193 196 212 218
22 221 223 226 228 234 246 248 25 250 252 255
256 259 26 260 264 27 274 277 280 284 287 290
297 313 314 315 318 319 330 334 337 340 341 342
346 351 353 357 361 369 370 38 381 382 386 39
396 4 40 401 404 405 408 409 412 418 421 422
431 434 439 440 441 442 443 449 451 452 454 455
460 461 462 465 467 471 476 479 485 486 487 491
496 5 505 509 511 52 53 55 57 6 63 64 68 69 71
72 83 88 89 97
Cluster 2 (128 records):
102 103 104 11 111 115 117 121 122 130 134 136 14
143 144 146 147 149 155 157 158 16 166 184 186 19
194 199 203 205 214 216 217 225 227 229 230 231 233
235 237 238 243 257 263 265 266 268 270 286 289 298
300 302 303 305 307 309 311 317 32 327 33 331 345
347 349 35 352 355 359 36 360 362 363 367 371 374
376 377 379 380 389 391 393 397 403 416 42 425 426
427 433 436 44 447 45 450 46 466 468 47 472 475 478
481 488 49 493 494 495 507 51 54 56 61 66 67 7 70
78 80 85 86 91 92 93 95
Cluster 3 (128 records):
100 101 106 109 116 12 120 125 127 129 140 145 15
160 161 163 164 170 172 173 174 175 176 177 189 190
201 202 204 208 209 21 211 215 219 220 222 224 23
232 236 239 24 240 244 247 249 253 254 258 267 271
273 281 282 283 29 291 295 296 30 301 304 31 310
312 320 321 322 323 324 325 329 332 336 338 348 364
366 37 373 375 378 383 388 390 394 395 399 400 406
407 41 411 415 417 420 423 428 429 430 432 435 437
438 446 456 458 463 477 480 482 490 492 499 500 502
504 506 59 60 75 76 77 84 87 9 99
cluster 0 (128 records):
cluster center 0: 75.2109 66.1172 16.0859 45.5781 15.1250
1.8125 1.4141 1.8125 1.4609
cluster 1 (128 records):
cluster center 1: 44.9375 56.1250 85.8594 75.9141 15.4922
1.4531 1.5234 1.6875 1.4688
cluster 2 (128 records):
cluster center 2: 75.4297 35.1953 55.5000 35.0938 85.9844
1.4297 1.4766 1.6328 1.4531
cluster 3 (128 records):
cluster center 3: 55.8516 85.4375 36.2109 35.7188 55.7891
1.8594 1.3906 1.8594 1.4531
Best clustering achieved for K=4 with QoC = 0.28300341334325
QoC values array (the smaller the value, the better it is)
for different K starting with K=2:
0.339081457952933
0.311405776061022
0.28300341334325
0.357635728839801
0.424281399187201
0.498892089419827
0.587438662856472
0.629084161222652
0.687506829650931
0.744921591912404
0.803628373753345
0.842131262698576
0.889647660373095
0.945119696311851
1.03367867739993
Writing cluster 0 to file cluster0.txt
Writing cluster 1 to file cluster1.txt
Writing cluster 2 to file cluster2.txt
Writing cluster 3 to file cluster3.txt
cluster0
agreeable 75.8031496062992
conscientious 66.6377952755905
extravert 16.2125984251969
neurotic 45.9370078740157
open 15.244094488189
subject 1.82677165354331
sex 1.4251968503937
siblings 1.82677165354331
birth_order 1.47244094488189
cluster1
agreeable 45.2913385826772
conscientious 56.5669291338583
extravert 86.5354330708661
neurotic 76.511811023622
open 15.6141732283465
subject 1.46456692913386
sex 1.53543307086614
siblings 1.7007874015748
birth_order 1.48031496062992
cluster2
agreeable 76.0236220472441
conscientious 35.4724409448819
extravert 55.9370078740157
neurotic 35.3700787401575
open 86.6614173228347
subject 1.44094488188976
sex 1.48818897637795
siblings 1.64566929133858
birth_order 1.46456692913386
cluster3
agreeable 56.2913385826772
conscientious 86.1102362204724
extravert 36.496062992126
neurotic 36
open 56.2283464566929
subject 1.8740157480315
sex 1.40157480314961
siblings 1.8740157480315
Birth_Order 1.46456692913386</code></pre>
<p>It's not clear why the clusterer should care about anything more than the relative values of variables, but the important thing is that it appears to work. Elladryl could now make a case to the head elf for access to more data, but she was sure he would agree, and that many more children would be happy this holiday. Of course, even with this break through, she still had plenty of experimenting left to perform. Should the silent children receive the same toy as a randomly selected child in the same cluster? The most popular toy in the cluster? Or perhaps the clusters could be sub-divided even further? After all, since her tool will be processing hundreds of millions of record, she'll have to be somewhat frugal in what variables she can use. But perhaps a coarse clusterer using some high-level psychosocial data could be followed up with another pass using additional variables to create finer sub-divisions.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Algorithm::KMeans</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">List::Util</span> <span class="single">'sum'</span><span class="structure">;</span><br /><br /><span class="comment">#Instantiate clusterer and load data<br /></span><span class="keyword">my</span> <span class="symbol">$clusterer</span> <span class="operator">=</span> <span class="word">Algorithm::KMeans</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">datafile</span> <span class="operator">=></span> <span class="single">'data.csv'</span><span class="operator">,</span><br /> <span class="word">mask</span> <span class="operator">=></span> <span class="single">'N111111111'</span><span class="operator">,</span><br /> <span class="word">K</span> <span class="operator">=></span> <span class="number">0</span><span class="operator">,</span><br /> <span class="word">cluster_seeding</span> <span class="operator">=></span> <span class="single">'random'</span><span class="operator">,</span><br /> <span class="word">terminal_output</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">write_clusters_to_files</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /><span class="structure">);</span><br /><span class="symbol">$clusterer</span><span class="operator">-></span><span class="word">read_data_from_file</span><span class="structure">();</span><br /><br /><span class="comment">#Group the data, optimally<br /></span><span class="keyword">my</span><span class="structure">(</span><span class="symbol">$clusters_hash</span><span class="operator">,</span> <span class="symbol">$cluster_centers_hash</span><span class="structure">)</span> <span class="operator">=</span> <span class="symbol">$clusterer</span><span class="operator">-></span><span class="word">kmeans</span><span class="structure">();</span><br /><br /><span class="comment">#Display the results, and the mean values of variables for each group<br />#to compare with cluster centers<br /></span><span class="keyword">my</span> <span class="symbol">@vars</span> <span class="operator">=</span> <span class="words">qw/agreeable conscientious extravert neurotic open<br /> subject sex siblings birth_order/</span><span class="structure">;</span><br /><span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$cluster_id</span> <span class="structure">(</span><span class="word">sort</span> <span class="word">keys</span> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$clusters_hash</span><span class="structure">})</span> <span class="structure">{</span><br /><span class="comment"> #print "\n$cluster_id => @{$clusters_hash->{$cluster_id}}\n";<br /></span> <span class="word">print</span> <span class="double">"\n$cluster_id\n"</span><span class="structure">;</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$i</span> <span class="structure">(</span> <span class="number">0</span> <span class="operator">..</span> <span class="arrayindex">$#vars</span> <span class="structure">){</span><br /> <span class="keyword">my</span> <span class="symbol">@F</span> <span class="operator">=</span> <span class="word">map</span><span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="structure">[</span><span class="symbol">$i</span><span class="structure">]</span> <span class="structure">}</span> <span class="cast">@</span><span class="structure">{</span><br /> <span class="symbol">$clusterer</span><span class="operator">-></span><span class="structure">{</span><span class="word">_original_data</span><span class="structure">}}{</span> <span class="cast">@</span><span class="structure">{</span><span class="symbol">$clusters_hash</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$cluster_id</span><span class="structure">}</span> <span class="structure">}</span><br /> <span class="structure">};</span><br /> <span class="word">print</span> <span class="symbol">$vars</span><span class="structure">[</span><span class="symbol">$i</span><span class="structure">]</span><span class="operator">,</span> <span class="double">"\t"</span><span class="operator">,</span> <span class="word">sum</span><span class="structure">(</span><span class="symbol">@F</span><span class="structure">)</span><span class="operator">/</span><span class="arrayindex">$#F</span><span class="operator">,</span> <span class="double">"\n"</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">}</span></code></pre>
<p><p>The <a href="data.csv">source datafile</a> is here, though you will probably want to remove the first header line before running the above code (since it causes lots of complaints from Algorithm::KMeans)</p>.
</p>
</div>2016-12-22T00:00:00ZJerrad PierceOrganizing catalogues and wishlists with Dancer::SearchApphttp://perladvent.org/2016/2016-12-21.html<div class='pod'><h2 id="Finding-presents-instead-of-searching">Finding presents instead of searching</h2>
<p>Many children are under the assumption that Santa needs to work only one day a year, when he distributes all the presents across the world. But there is a lot of work that has to go into the preparation for this day. Especially the choice of presents is an issue - while Santa knows who's been naughty and who's been nice, he also has a list of the wishes of each child. Santa also needs to monitor closely the demand and availability of presents to match the wishes to the gifts.</p>
<center><img src="dancer-searchapp-jungle.png"></center>
<p>Of course, Santas suppliers provide even the most festive descriptions of the items. The sheer supply of possible gifts makes it difficult even for Santa and his elves to remember the numbering schemes and descriptions of the various items. And as the wishes and potential presents come in at a time when the preparations are already in full swing and Christmas is close, there is little time to dedicate elves to categorizing the gifts.</p>
<h3 id="Finding-things">Finding things</h3>
<p>Ideally, any of the elves could use some kind of search engine to find an appropriate present for a child. Of course, companies like Google already provide a solution to finding the most appropriate thing in the stack of proposed gifts for a wish of a child. But Santa is peculiar about sharing his knowledge with others. While he knows who's going to get their wish and who's going to get coal, he feels that sharing that knowledge with other parties is not a good approach. What is known at the North Pole stays at the North Pole.</p>
<p>So his approach is to build his own search engine.</p>
<h3 id="Parts">Parts</h3>
<p>The search engine consists of three parts, the Crawler, the Index and the Searchform.</p>
<h4 id="Crawler">Crawler</h4>
<p>Most catalogues come in as files on the filesystem, one file per potential present. The letters and wishlists all get stored in a large IMAP server. The crawler is then responsible for reading all the information about gifts and storing it in the index. Currently, the module comes with two separate crawlers, one for the filesystem and one for an IMAP store. Both crawlers take all items, read them and extract the text from them and write them to the search index.</p>
<p>Santa wants to store some key facts as metadata with every document. For example the author and sent date are extracted for every wishlist. Imagine giving a gift for a 40-year old to their 8 year old self.</p>
<h4 id="Index">Index</h4>
<p>The search index is provided by the Elasticsearch search index, with <a href="https://metacpan.org/module/Search::Elasticsearch">Search::Elasticsearch</a> as the Perl interface. The index is a data structure that allows for quick retrieval of documents given some words that are associated with the documents. When storing the document information, Elasticsearch also adds more data like synonyms so that the wish for <i>a tricycle</i> can also find <i>ASIN20161225, vehicle, three wheeled</i> or other synonyms.</p>
<h4 id="Searchform">Searchform</h4>
<p>The front end is provided by a Dancer application. It consists mostly of a form field where elves enter the wish and submit the query. The <a href="https://metacpan.org/module/Dancer">Dancer</a> application ties together the Elasticsearch index and search functionality and returns word completion and matching documents for each query.</p>
<h3 id="Making-presents">Making presents</h3>
<p>Santa himself cannot make his presence known, but a reimplementation of the search engine has been released on CPAN as <a href="https://metacpan.org/module/Dancer::SearchApp">Dancer::SearchApp</a>. It includes the two crawler programs and the search form. You still need to add the special spice that is Elasticsearch.</p>
<h4 id="Installation-and-setup">Installation and setup</h4>
<p>Before you can search your documents, you have to install the prerequisites and then import some documents. The upside is that there are importers that read from the filesystem and IMAP mail stores. The downside is that you will need to install some software that is not available via CPAN.</p>
<ul>
<li><p>Install Java JRE 8</p>
<p>That's what Elasticsearch needs.</p>
</li>
<li><p>Install Elasticsearch 5.x</p>
<p>Installing Elasticsearch is as easy as downloading the latest release from <a href="https://www.elastic.co/downloads/elasticsearch">https://www.elastic.co/downloads/elasticsearch</a>. For this document, we'll assume that you the Elasticsearch configuration directory is at <code>/opt/elasticsearch/config</code>.</p>
</li>
<li><p>Configure Elasticsearch</p>
<p>The search engine needs an English dictionary of synonyms. A good English synonym dictionary can be found at <a href="https://sites.google.com/site/kevinbouge/synonyms-lists">https://sites.google.com/site/kevinbouge/synonyms-lists</a>. Download the file for English from there and save it to</p>
<pre><code> /opt/elasticsearch/config/synonyms/synonyms_en.txt</code></pre>
<p>If you don't want to install one, create an empty file.</p>
</li>
<li><p>Download <a href="https://metacpan.org/module/Dancer::SearchApp">Dancer::SearchApp</a> from CPAN</p>
<p>These instructions download the distribution into a temporary directory for this test run. If you are satisfied with the configuration, copy the complete tree to a more permament location.</p>
<pre><code> cpanm --look Dancer::SearchApp
cpanm --installdeps .</code></pre>
<p>Note the current directory, as that is where the configuration will happen.</p>
</li>
<li><p>Install Apache Tika</p>
<p>Download the Tika server from <a href="https://tika.apache.org/download.html">https://tika.apache.org/download.html</a></p>
<p>The current version is <a href="http://www.apache.org/dyn/closer.cgi/tika/tika-server-1.14.jar">http://www.apache.org/dyn/closer.cgi/tika/tika-server-1.14.jar</a></p>
<p>Copy the JAR file into the directory <code> jar/</code> of the distribution.</p>
</li>
<li><p>Launch Elasticsearch</p>
<p>Before starting to save data in the index, Elasticsearch needs to be running. It doesn't need additional configuration beyond the synonym file.</p>
</li>
<li><p>Launch the web interface</p>
<p>Launching the web interface is done through the following command:</p>
<pre><code> plackup -Ilib bin/app.pl</code></pre>
<p>You can then access the web interface at <a href="http://0:5000/">http://0:5000/</a>.</p>
</li>
</ul>
<center><img src="dancer-searchapp-homepage.png"></center>
<h4 id="Indexing-content">Indexing content</h4>
<p>Indexing content is done from the installation directory where you unpacked the CPAN distribution into. As Elasticsearch updates its index live there is no need to wait for an index run to finish before you can start searching your data through the web interface.</p>
<ul>
<li><p>Indexing files on the disk</p>
<p>Files on disk are indexed by invoking <code>bin/index-filesystem.pl</code> and giving it one or more directories.</p>
<pre><code> perl -Ilib -w bin/index-filesystem.pl -f t/documents</code></pre>
<p>For better control, you can give it a configuration file. This allows inclusion and exclusion of specific directories and file patterns. There is an example configuration file in the <code>config-examples/</code> directory of the distribution.</p>
</li>
<li><p>Indexing an IMAP account</p>
<p>If you want to be able to search your email, you have to import it from the IMAP server into the search index. Copy the config file from <code> config-examples/imap-import.yml </code> and edit the username, password, server and folders to index. Then run the IMAP indexer. If you are like me and have most of your email history available, this may take a while.</p>
<pre><code> perl -Ilib -w bin/index-imap.pl -c my-imap-import.yml</code></pre>
</li>
</ul>
<h2 id="POD-ERRORS">POD ERRORS</h2>
<p>Hey! <b>The above document had some coding errors, which are explained below:</b></p>
<dl>
<dt>Around line 10:</dt>
<dd>
<p>Cannot have multiple =encoding directives</p>
<p>Invalid =encoding syntax: utf8</p>
</dd>
</dl>
</div>2016-12-21T00:00:00ZCorionPDF-Reuse + Reduce & Recyclehttp://perladvent.org/2016/2016-12-20.html<div class='pod'><p>One thing Santa should be glad about is that he doesn't have to deal with billing. Can you imagine having to invoice all the parents of the world? Sadly in the commercial sector we have to worry about sending invoice to clients. And as an overhead - you can't invoice for invoicing after all - we want to automate it all away. What better language to use than Perl?</p>
<p>While invoices these days don't actually have to be printed on paper anymore, they remain simulated paper documents: It's time to get Perl to create a bunch of PDFs for us.</p>
<p>We're building a billing solution for Advertising Agency. It has the usual Estimate → Purchase Order → Invoice cycle, with Detail, Rate, Qty and Amount format. Just to make this more complicated it has varied column information for handling Print Media & Audio/Video Media with more detail on print media position & A/V media program slot and timing. In some cases, each media type will be handled by different legal entities for manage tax related issues. Finally, the requirements were:</p>
<ol>
<li>6 work segments (Production, Production with PO, Print Media, Audio/Video, Production with TAX)</li>
<li>Upto 6 different legal entities for each segment</li>
<li>3 type of estimates</li>
<li>Purchase Order</li>
<li>3 type of Release Orders for handle media communication</li>
<li>3 type of invoices</li>
</ol>
<p>Wow! That's a lot of stuff to consider.</p>
<h3 id="Challenges">Challenges</h3>
<p>We went in search of suitable PDF production module. We mainly looked for:</p>
<ol>
<li>Ability to tailor build the document</li>
<li>A module that gives production control in pixel level </li>
<li>Flexible options to set page properties</li>
<li>Option to implement signatures</li>
<li>Position control to handle page continuation</li>
</ol>
<p>After a lengthy search for few days, we finally zeroed in <a href="https://metacpan.org/module/PDF::Reuse">PDF::Reuse</a>. It did not have user friendly function wraps for graphical output, but it offered a developer friendly core functions to control the graphical output in a pixel level. It helped to wrap our custom functions to handle in generic way.</p>
<h3 id="Map-to-Reuse-Reduce-Recycle">Map to Reuse, Reduce & Recycle</h3>
<p>One of PDF::Reuse's main advantage is that it can take an existing PDF - something that our designers can create with their standard PDF tools - and then all we have to do is add the various per invoice details to it.</p>
<p>The code to re-use a PDF with PDF::Reuse is really simple:</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="word">PDF::Reuse</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="comment"># The file we're going to make<br /></span><span class="word">prFile</span><span class="structure">(</span><span class="single">'myFile.pdf'</span><span class="structure">);</span><br /><br /><span class="comment"># Take page one from a PDF we already have we're going to reuse<br /></span><span class="word">prForm</span><span class="structure">(</span><span class="single">'source.pdf'</span><span class="structure">);</span><br /><br /><span class="comment"># Add some custom text<br /></span><span class="word">prText</span><span class="structure">(</span><span class="number">150</span><span class="operator">,</span> <span class="number">700</span><span class="operator">,</span> <span class="single">'Customer Data'</span><span class="structure">);</span><br /><br /><span class="comment"># And finally write it our<br /></span><span class="word">prEnd</span><span class="structure">();</span></code></pre>
<p>The feature of reusing the existing PDF and writing content over that, helped to reuse the existing legal entity letter pads in specific company document production. It's avoided manual graphic production work.</p>
<h3 id="A-Range-of-Functions">A Range of Functions</h3>
<p>The PDF::Reuse module provides a bunch of primative functions for adding data to our document. For example, we've already seen using the <code>prText()</code> function to add text to the document. But PDF::Reuse provides a plethora of other functions that can do all kinds of things.</p>
<p>These can be as simple as <code>prPage</code></p>
<pre><code class="code-listing"><span class="comment"># add a new page to the document<br /></span><span class="word">prPage</span><span class="structure">();</span></code></pre>
<p>Or as powerful as <code>prImage</code>:</p>
<pre><code class="code-listing"><span class="comment"># insert an existing PDF into the document at the x,y coords:<br /></span><span class="word">prImage</span><span class="structure">({</span><br /> <span class="single">'file'</span> <span class="operator">=></span> <span class="symbol">$filename</span><span class="operator">,</span><br /> <span class="single">'x'</span> <span class="operator">=></span> <span class="symbol">$c_v</span><span class="structure">{</span><span class="word">x</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'y'</span> <span class="operator">=></span> <span class="symbol">$c_v</span><span class="structure">{</span><span class="word">y_data</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'size'</span> <span class="operator">=></span> <span class="symbol">$scale_factor</span><span class="operator">,</span><br /><span class="structure">});</span></code></pre>
<p>The problem with these functions is that they're all <i>primative</i>. If we want to create our document in a sensbile fashion we need to build up our own libray on top of these simplistic functions.</p>
<h3 id="Abstracting">Abstracting</h3>
<p>We classified the document we wanted to write into three major parts:</p>
<p><ol> <li>The page layout and page background</li> <li>Header</li> <li>Content</li> </ol>
</p>
<p>It looks like a core wrapper function we need is some way to write out a row of row structure to meet individual lines and table like outputs. The wrapper function constructed based on <code>PrText()</code>,<code>PrAdd()</code>,<code>PrImage()</code></p>
<pre><code class="code-listing"><span class="comment"># Data structure input for produce one liner output<br /># base structure<br /></span><br /><span class="keyword">my</span> <span class="symbol">$print_data</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="number">0</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">left_xaxis</span> <span class="operator">=></span> <span class="number">250</span><span class="operator">,</span> <span class="comment"># print start x position</span><br /> <span class="word">line_height</span> <span class="operator">=></span> <span class="number">25</span><span class="operator">,</span><br /><br /><span class="comment"> # column definition<br /></span> <span class="word">header</span> <span class="operator">=></span> <span class="structure">{</span><br /><span class="comment"> # define the first (and only) column<br /></span> <span class="number">0</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">font</span> <span class="operator">=></span> <span class="single">'Helvetica-Bold'</span><span class="operator">,</span><br /> <span class="word">font_size</span> <span class="operator">=></span> <span class="number">12</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /><br /><span class="comment"> # insert the data<br /></span> <span class="word">data</span> <span class="operator">=></span> <span class="structure">[[</span><span class="single">'Production Estimate'</span><span class="structure">]]</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">};</span></code></pre>
<p>It will produce a title <b>Production Estimate</b> in page center. A data structure that produces several rows look like this:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$row_data</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="number">0</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">line_height</span> <span class="operator">=></span> <span class="number">12</span><span class="operator">,</span><br /><br /><span class="comment"> # format definition for 3 columns<br /></span> <span class="word">header</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="number">0</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">font</span> <span class="operator">=></span> <span class="single">'Helvetica'</span><span class="operator">,</span><br /> <span class="word">font_size</span> <span class="operator">=></span> <span class="number">9</span><span class="operator">,</span><br /> <span class="word">d_width</span> <span class="operator">=></span> <span class="number">50</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="number">1</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">font</span> <span class="operator">=></span> <span class="single">'Helvetica'</span><span class="operator">,</span><br /> <span class="word">font_size</span> <span class="operator">=></span> <span class="number">9</span><span class="operator">,</span><br /> <span class="word">d_width</span> <span class="operator">=></span> <span class="number">10</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="number">2</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">font</span> <span class="operator">=></span> <span class="single">'Helvetica-Bold'</span><span class="operator">,</span><br /> <span class="word">font_size</span> <span class="operator">=></span> <span class="number">9</span><span class="operator">,</span><br /> <span class="word">d_width</span> <span class="operator">=></span> <span class="number">90</span><span class="operator">,</span><br /> <span class="word">align</span> <span class="operator">=></span> <span class="single">'left'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /><br /><span class="comment"> # data for 3 columns<br /></span> <span class="word">data</span><span class="operator">=></span> <span class="structure">[</span><br /> <span class="structure">[</span><span class="single">'Client:'</span><span class="operator">,</span><span class="single">''</span><span class="operator">,</span><span class="single">'<Client Address>'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span><span class="single">'Address:'</span><span class="operator">,</span><span class="single">''</span><span class="operator">,</span><span class="single">'<Address 1>'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span><span class="single">''</span><span class="operator">,</span><span class="single">''</span><span class="operator">,</span><span class="single">'<Address 2>'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span><span class="single">''</span><span class="operator">,</span><span class="single">''</span><span class="operator">,</span><span class="single">'<Address 3>'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span><span class="single">'Estimate No.'</span><span class="operator">,</span><span class="single">''</span><span class="operator">,</span><span class="single">'<estimate no.>'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span><span class="single">'Date:'</span><span class="operator">,</span><span class="single">''</span><span class="operator">,</span><span class="single">'<estimate date>'</span> <span class="structure">]</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">};</span></code></pre>
<p>Writing the code to make this work is actually quite fiddly: We have to do a lot of accounting of how much space each column takes up, and keep track of where we need to place the next line.</p>
<p>Rather than showing you pages and pages of our code, here's an example of a small snippet of code that we use to create the header and shows some of these challenges:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">header_text</span><span class="structure">{</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$data</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$page</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$width</span><span class="operator">,</span> <span class="symbol">$height</span> <span class="structure">)</span> <span class="operator">=</span> <span class="word">checking</span><span class="structure">(</span> <span class="symbol">$page</span> <span class="structure">);</span><br /> <span class="keyword">my</span> <span class="symbol">$y</span> <span class="operator">=</span> <span class="symbol">$height</span> <span class="operator">-</span> <span class="symbol">$page</span><span class="operator">-></span><span class="structure">{</span><span class="word">margin</span><span class="structure">}{</span><span class="word">top</span><span class="structure">};</span><br /><br /><span class="comment"> # each data item<br /></span> <span class="keyword">for</span> <span class="keyword">my</span> <span class="symbol">$key</span> <span class="structure">(</span> <span class="word">sort</span> <span class="structure">{</span> <span class="symbol">$a</span> <span class="operator"><=></span> <span class="symbol">$b</span> <span class="structure">}</span> <span class="word">keys</span> <span class="cast">%</span><span class="structure">{</span> <span class="symbol">$data</span> <span class="structure">}</span> <span class="structure">){</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$chunk</span> <span class="operator">=</span> <span class="symbol">$data</span><span class="structure">{</span> <span class="symbol">$key</span> <span class="structure">};</span><br /> <span class="keyword">my</span> <span class="symbol">$start_x</span> <span class="operator">=</span> <span class="symbol">$chunk</span><span class="operator">-></span><span class="structure">{</span><span class="word">left_xaxis</span><span class="structure">}</span> <span class="operator">||</span> <span class="symbol">$page</span><span class="operator">-></span><span class="structure">{</span><span class="word">margin</span><span class="structure">}{</span><span class="word">left</span><span class="structure">};</span><br /><br /><span class="comment"> # each row<br /></span> <span class="keyword">for</span> <span class="keyword">my</span> <span class="symbol">$row_data</span> <span class="structure">(</span><span class="cast">@</span><span class="structure">{</span> <span class="symbol">$chunk</span><span class="operator">-></span><span class="structure">{</span><span class="word">data</span><span class="structure">}</span> <span class="structure">})</span> <span class="structure">{</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$counter</span> <span class="operator">=</span> <span class="number">0</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$x</span> <span class="operator">=</span> <span class="symbol">$start_x</span><span class="structure">;</span><br /><br /> <span class="keyword">for</span> <span class="keyword">my</span> <span class="symbol">$column_data</span> <span class="structure">(</span><span class="cast">@</span><span class="structure">{</span> <span class="symbol">$row_data</span> <span class="structure">})</span> <span class="structure">{</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$meta</span> <span class="operator">=</span> <span class="symbol">$chunk</span><span class="operator">-></span><span class="structure">{</span><span class="word">header</span><span class="structure">}{</span> <span class="symbol">$counter</span> <span class="structure">};</span><br /><br /> <span class="word">prFont</span><span class="structure">(</span> <span class="symbol">$meta</span><span class="operator">-></span><span class="structure">{</span><span class="word">font</span><span class="structure">}</span> <span class="operator">||</span> <span class="symbol">$page</span><span class="operator">-></span><span class="structure">{</span><span class="word">font</span><span class="structure">}</span> <span class="structure">);</span><br /> <span class="word">prFontSize</span><span class="structure">(</span> <span class="symbol">$meta</span><span class="operator">-></span><span class="structure">{</span><span class="word">font_size</span><span class="structure">}</span> <span class="operator">||</span> <span class="symbol">$page</span><span class="operator">-></span><span class="structure">{</span><span class="word">font_size</span><span class="structure">}</span> <span class="structure">);</span><br /> <span class="word">prText</span><span class="structure">(</span> <span class="symbol">$x</span><span class="operator">,</span> <span class="symbol">$y</span><span class="operator">,</span> <span class="symbol">$column_data</span> <span class="structure">);</span><br /><br /> <span class="symbol">$x</span> <span class="operator">+=</span> <span class="symbol">$meta</span><span class="operator">-></span><span class="structure">{</span><span class="word">d_width</span><span class="structure">};</span><br /><br /> <span class="operator">++</span><span class="symbol">$counter</span><span class="structure">;</span><br /><br /> <span class="structure">}</span> <span class="comment"># each column</span><br /><br /><span class="comment"> # reflect current y position<br /></span> <span class="symbol">$y</span> <span class="operator">-=</span> <span class="symbol">$chunk</span><span class="operator">-></span><span class="structure">{</span><span class="word">line_height</span><span class="structure">}</span> <span class="operator">||</span> <span class="symbol">$page</span><span class="operator">-></span><span class="structure">{</span><span class="word">line_height</span><span class="structure">};</span><br /><br /> <span class="structure">}</span> <span class="comment"># each row</span><br /> <span class="structure">}</span> <span class="comment"># each item</span><br /> <span class="keyword">return</span> <span class="symbol">$y</span><span class="structure">;</span><br /><span class="structure">}</span> <span class="comment"># end</span></code></pre>
<p>You can see the techniques that we leverage: Keeping track of the changes <code>$x</code> and <code>$y</code> as we render each section, rendering column by column and looking up the corresponding header information for each data section, and using per page defaults when the meta data doesn't have settings in each section.</p>
<p>By abstracting this logic into sections we can deal with the complexity and build up a powerful library to easily create our own pages, making PDF::Reuse very powerful.</p>
<h3 id="In-Summary">In Summary</h3>
<p>With our library of functions the creator will produce the PDF document based on given data on page properties & page content. In implementation case different structures are predefined first. During runtime, the bill information dynamically added to the structure. A basic structure created first, then it cloned and blended to different account needs.</p>
<p>PDF:Reuse core functions <code>PrFile</code>, <code>PrText</code>, <code>PrForm</code>, <code>PrImage</code> itself helped to produce business quality documents with scalability and reusability. The modules simple and straight core functions helped us to achieve our custom document generation with ease and control. Even though we've been using it for six year, its continuously satisfying the evolving needs with the basic functions.</p>
<p>Now the documents become a part of our clients business communication also it gave us a business. <b>Thanks PDF::Reuse</b>.</p>
<h3 id="Example-Output">Example Output</h3>
<img src="pdf_estimate.png">
<br>
<h2>Sample Invoice Document</h2>
<img src="pdf_print_media.png">
<h3 id="SEE-ALSO">SEE ALSO</h3>
<ul>
<li><p><a href="https://metacpan.org/release/PDF-Reuse">PDF-Reuse</a></p>
</li>
<li><p><a href="https://github.com/WebstarsCG/Kangiten-PDF-Writer">Example</a></p>
</li>
</ul>
</div>2016-12-20T00:00:00ZRaja Renga BashyamUsing containers with Linuxhttp://perladvent.org/2016/2016-12-19.html<div class='pod'><p>Linux containers are a hot topic these days. This is why I decided to share with you, how you can interact with containers directly from Perl.</p>
<h3 id="Linux-Namespaces">Linux Namespaces</h3>
<p>Linux containers are an OS-level virtualization technique to help processes co- exist on the same machine regardless of what other processes are already running on that machine (for example, having two processes listening on port 80 on the same physical hardware). Rather than starting a whole new virtual machine, including a new operating system and system daemons, Linux containers are a way to simply and sufficiently isolate a process from other processes running on the same operating system beyond the normal process management Linux provides.</p>
<p>Currently in Linux there are six namespaces which we can control to allow processes to:</p>
<ul>
<li><p>UTS - have different hostname and domain name</p>
</li>
<li><p>PID - have its own view of the PIDs of the machine. This allows the process to create processes with PIDs that are already existing on the machine.</p>
</li>
<li><p>NET - see different, limited version of the network infrastructure of the kernel. As if the process has its own network.</p>
</li>
<li><p>IPC - have a separate SHM, SEM and MQ identifiers</p>
</li>
<li><p>USER - have separate user and group identifiers</p>
</li>
<li><p>MOUNT - see different view of the mounted filesystems</p>
</li>
</ul>
<p>For the purpose of this article I will assume that a "container" is a processes or group of processes that has one or more namespaces different from the initial namespaces.</p>
<p>There are 3 modules on the CPAN for working with "containers":</p>
<ul>
<li><p><a href="https://metacpan.org/module/Linux::Clone">Linux::Clone</a> - to create a new container by creating a new process.</p>
</li>
<li><p><a href="https://metacpan.org/module/Linux::Unshare">Linux::Unshare</a> - to create a new container without forking.</p>
</li>
<li><p><a href="https://metacpan.org/module/Linux::Setns">Linux::Setns</a> - to change your current container.</p>
</li>
</ul>
<h3 id="Linux::Clone">Linux::Clone</h3>
<p>So let's start with <a href="https://metacpan.org/module/Linux::Clone">Linux::Clone</a>. This module is basically wrapper to the glibc <a href="http://man.he.net/man2/clone">clone(2)</a> wrapper function. I will cover only the parts of it related to Linux namespaces. These are <code>CLONE_NEWNET</code>, <code>CLONE_NEWPID</code>, <code>CLONE_NEWIPC</code>, <code>CLONE_NEWUTS</code>, <code>CLONE_NEWNS</code>(mount namespace), <code>CLONE_NEWUSER</code>.</p>
<p>Namespaces are used to create isolated environment where your process have only limited visibility to the system.</p>
<p>For example if you want to make sure that only processes you have started can use the SHM (shared memory) you created, you can use <a href="https://metacpan.org/module/Linux::Clone">Linux::Clone</a> to create a new process with its own IPC namespace and after that create your SHM. This way you are both protecting your SHM from others on the machine and protecting everyone else from your process.</p>
<p>This is how the code will look in your Perl application:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Linux::Clone</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">POSIX</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">child</span> <span class="structure">{</span><br /> <span class="word">print</span> <span class="double">"In the child\n"</span><span class="structure">;</span><br /> <span class="word">system</span><span class="structure">(</span><span class="double">"ipcs"</span><span class="structure">);</span><br /><span class="structure">}</span><br /><span class="word">Linux::Clone::clone</span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">child</span><span class="structure">;</span> <span class="structure">}</span><span class="operator">,</span> <span class="number">0</span><span class="operator">,</span> <span class="word">Linux::Clone::NEWIPC</span> <span class="operator">||</span> <span class="word">POSIX::SIGCHILD</span><span class="structure">;</span> </code></pre>
<p>Normally running the <code>ipcs</code> command on a normal laptop/desktop machine will print out a few entries listing all the ipc facilities that your process normally has access to. However, when <code>ipcs</code> is called in the above script it prints nothing - the new namespace the process is in has nothing in it.</p>
<h3 id="Linux::Unshare">Linux::Unshare</h3>
<p>If you don't want to create a new process, but you want to change some of the namespaces for your current process you can use <a href="https://metacpan.org/module/Linux::Unshare">Linux::Unshare</a>. This module implements the glibc <a href="http://man.he.net/man2/unshare">unshare(2)</a> wrapper function, which does exactly this.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Linux::Unshare</span> <span class="words">qw(unshare CLONE_NEWIPC)</span><span class="structure">;</span><br /><br /><span class="word">unshare</span><span class="structure">(</span><span class="word">CLONE_NEWIPC</span><span class="structure">);</span><br /><span class="word">system</span><span class="structure">(</span><span class="double">"ipcs"</span><span class="structure">);</span></code></pre>
<p>Now the above example will create a new IPC namespace without creating new process and it will replace your current IPC namespace with the newly created one. However, if you want to return to your previous IPC namespace, you would need to make sure you have a file descriptor from that IPC namespace and use <a href="https://metacpan.org/module/Linux::Setns">Linux::Setns</a> (which I'll cover next) to switch your current IPC namespace to the previous one.</p>
<h3 id="Linux::Setns">Linux::Setns</h3>
<p>Usually you would not create containers directly from your application, instead you probably would use something like <a href="https://linuxcontainers.org/lxc/">LXC</a>, <a href="https://linuxcontainers.org/lxd/">LXD</a>, <a href="https://coreos.com/blog/rocket/">Rocket</a> or <a href="https://www.docker.com/">Docker</a> to create and mange your containers. So most likely what you would want to do is attach/enter into these containers and do some work there.</p>
<p>For this you can use the <a href="https://metacpan.org/module/Linux::Setns">Linux::Setns</a> module, a wrapper for the <a href="http://man.he.net/man2/setns">setns(2)</a> glibc function.</p>
<p>In order to identify a namespace you would use the files in <code>/proc/PID/ns/{ipc,mnt,net,pid,user,uts}</code>. With the setns function, you can join one or more of these namespaces or all of them. This way your process can actually enter in each namespace and do the rest of its work within the confines of that namespace.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Linux::Setns</span> <span class="words">qw(setns CLONE_NEWIPC)</span><br /><br /><span class="word">setns</span><span class="structure">(</span><span class="double">"/proc/213/ns/ipc"</span><span class="operator">,</span> <span class="word">CLONE_NEWIPC</span><span class="structure">);</span><br /><span class="word">system</span><span class="structure">(</span><span class="double">"ipcs"</span><span class="structure">);</span></code></pre>
<p>When you are entering a single namespace, <a href="http://man.he.net/man2/setns">setns(2)</a> requires that you give it a file descriptor from that precise namespace. So if we look at the example above, we are entering the IPC namespace (by using <code>CLONE_NEWIPC</code>) and we are supplying the path to the <code>ipc</code> file descriptor (<code>/proc/$PID/ns/ipc</code>). For example, if we want to enter the network namespace, we would use <code>CLONE_NEWNET</code> and <code>/proc/$PID/ns/net</code>.</p>
<h3 id="Additional">Additional</h3>
<p>For any kind of container manipulation (creation or entering) you will need root (<code>CAP_SYS_ADMIN</code>) privileges.</p>
<p>In addition to namespaces, the Linux Kernel also offer resource limits/isolation in the form of Control Groups. Currently Perl lacks module for managing them, but I'm going to present my proposal for such a module at <a href="https://fosdem.org">FOSDEM 2017</a>.</p>
</div>2016-12-19T00:00:00ZMarian HackMan MarinovWarm and Fuzzy Insidehttp://perladvent.org/2016/2016-12-18.html<div class='pod'><pre><code> He knows when you're asleep
He knows when you're awake
He knows when you've been bad or good
So be good for goodness sake</code></pre>
<p>Sounds pretty nefarious, right? Jolly old elf indeed. Just how does Nick know so much about everyone? Truth be told, he has many proprietary mechanisms of collecting data in house but as is so often the case with large endeavours, he subcontracts much of the work out to other parties. In this case, he purchases commercial data sets from market research firms which you would think would lighten the load on his data mining elves however most of his contractors aren't as rigorous about data cleaning and normalization as they are.</p>
<p>Fortunately, the North Pole Crew have some experience with imprecise matching and <a href="http://perladvent.org/2008/20/">due to Santa's poor spelling</a>. In fact, earlier this month we saw <a href="http://perladvent.org/2016/2016-12-03.html">an article</a> on how they were testing modules for calculating <a href="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein distance</a> - the minimum number of single-character edits (i.e. insertions, deletions or substitutions) required to change one word into the other - to help work out how close the two strings are together. With this could the Elves splice their data and the commercial data sets together?</p>
<h3 id="Bad-Santa">Bad Santa</h3>
<p>The elves did some testing and sadly it didn't look good. The problem the Elves were facing is that Levenshtein distance alone wasn't being smart enough for them. Consider the following corruptions:</p>
<p><table class="pretty-table"> <tr> <th>Original</th> <th>Corrupted</th> <th>Levenshtein Distance</th> </tr> <tr class="alt"> <td>santa</td> <td>snata</td> <td>2</td> </tr> <tr> <td>santa</td> <td>sandy</td> <td>2</td> </tr> </table>
</p>
<p>In the first example the <code>an</code> from Santa's name simply got swapped to become <code>na</code>, whereas the second example requires two whole new characters replaced two old characters. However: these two examples have identical Levenshtein distance scores - how could the elves get their code to prioritize one over the other?</p>
<h3 id="Enter-Text::Fuzzy">Enter Text::Fuzzy</h3>
<p>Text::Fuzzy is a fuzzy text matcher that can not only calculate Levenshtein distance, but is also able to calculate Damerau-Levenshtein distance - counting character swaps as just one step. This gives us this much better looking result.</p>
<p><table class="pretty-table"> <tr> <th>Original</th> <th>Corrupted</th> <th>Damerau-Levenshtein Distance</th> </tr> <tr class="alt"> <td>santa</td> <td>snata</td> <td>1</td> </tr> <tr> <td>santa</td> <td>sandy</td> <td>2</td> </tr> </table>
</p>
<p>As a bonus the interface for Text::Fuzzy is also surprisingly simple:</p>
<pre><code class="code-listing"><span class="word">say</span> <span class="word">Text::Fuzzy</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="symbol">$corrupted</span><span class="operator">,</span> <span class="word">trans</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">)</span><br /> <span class="operator">-></span><span class="word">distance</span><span class="structure">(</span> <span class="symbol">$original</span> <span class="structure">);</span></code></pre>
<p>(Note the use of the <code>trans</code> option there to enable Damerau-Levenshtein scores.)</p>
<p>Having Text::Fuzzy pick the one of the closest matches from a list is also really easy:</p>
<pre><code class="code-listing"><span class="word">say</span> <span class="word">scalar</span> <span class="word">Text::Fuzzy</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="single">'blixen'</span><span class="structure">)</span><span class="operator">-></span><span class="word">nearestv</span><span class="structure">([</span><span class="words">qw(<br /> dasher dancer prancer vixen comet<br /> cupid donner blitzen rudolph<br />)</span><span class="structure">]);</span></code></pre>
<h3 id="Putting-It-All-Together">Putting It All Together</h3>
<p>With this tool in in mind, the Elves knocked together a proof-of-concept for combining the two data sets. The code takes two disparate CSVs and combines them into a single CSV - fuzzily!</p>
<pre><code class="code-listing"><span class="comment">#!perl -l<br /></span><span class="keyword">use</span> <span class="word">Text::Fuzzy</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Text::xSV</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Inline::Files</span> <span class="word">-backup</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">List::Util</span> <span class="single">'max'</span><span class="structure">;</span><br /><br /><span class="comment"># Minimum match level<br /></span><span class="keyword">my</span> <span class="symbol">$threshold</span> <span class="operator">=</span> <span class="number">85</span><span class="structure">;</span><br /><br /><span class="comment"># Configure to read the two CSVs embedded at the bottom of the<br /># example code. One representing the Polar data and another<br /># representing the third party vendor data<br /></span><span class="keyword">my</span> <span class="symbol">$PDB</span> <span class="operator">=</span> <span class="word">Text::xSV</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="word">fh</span><span class="operator">=></span><span class="symbol">*POLARDB</span><span class="structure">);</span> <span class="symbol">$PDB</span><span class="operator">-></span><span class="word">read_header</span><span class="structure">();</span><br /><span class="keyword">my</span> <span class="symbol">$VDB</span> <span class="operator">=</span> <span class="word">Text::xSV</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="word">fh</span><span class="operator">=></span><span class="symbol">*VENDORDB</span><span class="structure">);</span> <span class="symbol">$VDB</span><span class="operator">-></span><span class="word">read_header</span><span class="structure">();</span><br /><br /><span class="comment"># Print combined header row<br /></span><span class="keyword">my</span> <span class="symbol">@phead</span><span class="operator">,</span> <span class="symbol">@vhead</span><span class="structure">;</span><br /><span class="word">print</span> <span class="word">join</span><span class="structure">(</span><span class="single">','</span><span class="operator">,</span> <span class="symbol">@phead</span><span class="operator">=</span><span class="cast">@</span><span class="structure">{</span><span class="symbol">$PDB</span><span class="operator">-></span><span class="structure">{</span><span class="word">row</span><span class="structure">}}</span><span class="operator">,</span> <span class="single">'Match'</span><span class="operator">,</span> <span class="symbol">@vhead</span><span class="operator">=</span><span class="cast">@</span><span class="structure">{</span><span class="symbol">$VDB</span><span class="operator">-></span><span class="structure">{</span><span class="word">row</span><span class="structure">}}</span> <span class="structure">);</span><br /><br /><span class="comment"># Load all the vendor data into memory<br /></span><span class="keyword">my</span> <span class="symbol">@VDB</span><span class="structure">;</span><br /><span class="word">push</span><span class="structure">(</span><span class="symbol">@VDB</span><span class="operator">,</span> <span class="magic">$_</span><span class="structure">)</span> <span class="word">while</span><span class="structure">(</span> <span class="magic">$_</span> <span class="operator">=</span> <span class="symbol">$VDB</span><span class="operator">-></span><span class="word">fetchrow_hash</span><span class="structure">()</span> <span class="structure">);</span><br /><br /><span class="comment"># Process all the polar CSV lines and look for something we're reasonably<br /># confident is a match<br /></span><span class="keyword">while</span><span class="structure">(</span> <span class="keyword">my</span> <span class="symbol">$polarRecord</span> <span class="operator">=</span> <span class="symbol">$PDB</span><span class="operator">-></span><span class="word">fetchrow_hash</span><span class="structure">()</span> <span class="structure">){</span><br /> <span class="keyword">my</span> <span class="symbol">$key</span> <span class="operator">=</span> <span class="word">serializePolarKey</span><span class="structure">(</span><span class="symbol">$polarRecord</span><span class="structure">);</span><br /> <span class="keyword">my</span> <span class="symbol">$keyLen</span> <span class="operator">=</span> <span class="word">length</span><span class="structure">(</span><span class="symbol">$key</span><span class="structure">);</span><br /> <span class="symbol">$TF</span> <span class="operator">=</span> <span class="word">Text::Fuzzy</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="symbol">$key</span><span class="operator">,</span> <span class="word">trans</span><span class="operator">=></span><span class="number">1</span> <span class="structure">);</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@matches</span><span class="structure">;</span><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$vendorRecord</span> <span class="structure">(</span> <span class="symbol">@VDB</span> <span class="structure">){</span><br /> <span class="keyword">my</span> <span class="symbol">$dist</span> <span class="operator">=</span> <span class="symbol">$TF</span><span class="operator">-></span><span class="word">distance</span><span class="structure">(</span> <span class="word">serializeVendorKey</span><span class="structure">(</span><span class="symbol">$vendorRecord</span><span class="structure">)</span> <span class="structure">);</span><br /> <span class="keyword">my</span> <span class="symbol">$confidence</span> <span class="operator">=</span> <span class="word">int</span><span class="structure">(</span> <span class="number">100</span> <span class="operator">*</span> <span class="structure">(</span><span class="symbol">$keyLen</span><span class="operator">-</span><span class="symbol">$dist</span><span class="structure">)</span><span class="operator">/</span><span class="symbol">$keyLen</span><span class="structure">);</span><br /> <span class="word">push</span> <span class="symbol">@matches</span><span class="operator">,</span> <span class="symbol">$confidence</span><span class="structure">;</span><br /><br /><span class="comment"> # Alternatively, stash match, and only return highest match after loop?<br /></span> <span class="word">print</span> <span class="word">join</span><span class="structure">(</span><span class="single">','</span><span class="operator">,</span><br /> <span class="cast">@</span><span class="structure">{</span><span class="symbol">$polarRecord</span><span class="structure">}{</span><span class="symbol">@phead</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="symbol">$confidence</span><span class="operator">,</span><br /> <span class="cast">@</span><span class="structure">{</span><span class="symbol">$vendorRecord</span><span class="structure">}{</span><span class="symbol">@vhead</span><span class="structure">}</span> <span class="structure">)</span> <span class="word">if</span> <span class="symbol">$confidence</span> <span class="operator">>=</span> <span class="symbol">$threshold</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="keyword">unless</span><span class="structure">(</span> <span class="word">max</span><span class="structure">(</span><span class="symbol">@matches</span><span class="structure">)</span> <span class="operator">>=</span> <span class="symbol">$threshold</span> <span class="structure">){</span><br /> <span class="word">print</span> <span class="word">join</span><span class="structure">(</span><span class="single">','</span><span class="operator">,</span> <span class="cast">@</span><span class="structure">{</span><span class="symbol">$polarRecord</span><span class="structure">}{</span><span class="symbol">@phead</span><span class="structure">}</span><span class="operator">,</span> <span class="word">max</span><span class="structure">(</span><span class="symbol">@matches</span><span class="structure">));</span><br /> <span class="structure">}</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">serializePolarKey</span><span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">sprintf</span><span class="structure">(</span><span class="double">"%s %s %s,%s, %s, %s %s, %s"</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">FirstName</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">MiddleName</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">LastName</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">Street</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">City</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">Province</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">PostCode</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">Country</span><span class="structure">}</span> <span class="structure">);</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">serializeVendorKey</span><span class="structure">{</span><br /> <span class="keyword">return</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">FullName</span><span class="structure">}</span> <span class="operator">.</span> <span class="single">','</span><span class="operator">.</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">Address</span><span class="structure">};</span><br /><span class="structure">}</span><br /><br /><span class="word">__POLARDB__</span><br /><span class="word">ID</span><span class="operator">,</span><span class="word">FirstName</span><span class="operator">,</span><span class="word">MiddleName</span><span class="operator">,</span><span class="word">LastName</span><span class="operator">,</span><span class="word">Street</span><span class="operator">,</span><span class="word">City</span><span class="operator">,</span><span class="word">Province</span><span class="operator">,</span><span class="word">PostCode</span><span class="operator">,</span><span class="word">Country</span><br /><span class="number">1</span><span class="operator">,</span><span class="word">John</span><span class="operator">,</span><span class="word">Jacob</span><span class="operator">,</span><span class="word">Jingleheimer-Schmidt</span><span class="operator">,</span><span class="number">1320</span> <span class="word">Needham</span> <span class="word">Drive</span><span class="operator">,</span><span class="word">Centreville</span><span class="operator">,</span><span class="word">NV</span><span class="operator">,</span><span class="number">89119</span><span class="operator">,</span><span class="word">USA</span><br /><span class="number">2</span><span class="operator">,</span><span class="word">Bartholomew</span><span class="operator">,</span><span class="word">Jay</span><span class="operator">,</span><span class="word">Simpson</span><span class="operator">,</span><span class="number">1322</span> <span class="word">Evergreen</span> <span class="word">Terrace</span><span class="operator">,</span><span class="word">Springfield</span><span class="operator">,,,</span><span class="word">USA</span><br /><span class="number">3</span><span class="operator">,</span><span class="word">Marie</span> <span class="word">Claire</span><span class="operator">,,</span><span class="word">Dubois</span><span class="operator">,</span><span class="number">62</span> <span class="word">Clos</span> <span class="word">du</span> <span class="word">Bois</span> <span class="word">Rossia</span><span class="operator">,</span><span class="word">Namur</span><span class="operator">,,</span><span class="number">5017</span><span class="operator">,</span><span class="word">BE</span><br /><span class="word">__VENDORDB__</span><br /><span class="word">FullName</span><span class="operator">,</span><span class="word">Address</span><span class="operator">,</span><span class="word">ExtraData</span><br /><span class="word">Jon</span> <span class="word">Jacob</span> <span class="word">Jingleheimmer</span> <span class="word">Schmitt</span><span class="operator">,</span><span class="double">"1320 Needham Dr, Centerville, NV 89119, USA"</span><span class="operator">,</span><span class="word">Tm90aGluZyB0byBzZWUgaGVyZS4</span><span class="operator">=</span><br /><span class="word">Bart</span> <span class="word">Simpson</span><span class="operator">,</span><span class="double">"1322 Evergreen Ter., Springfield USA"</span><span class="operator">,</span><span class="word">SSdtIGdldHRpbicgbm91dHRpbicgZm9yIENocmlzdG1hcw</span><span class="operator">==</span><br /><span class="word">Mary</span> <span class="word">Claire</span> <span class="word">Dubois</span><span class="operator">,</span><span class="double">"62 Clos duBois Russia,Namur,,5017,BE"</span><span class="operator">,</span><span class="word">dGl0aSB0b3RvIHR1dHUgdGF0YQ</span><span class="operator">==</span><br /><span class="word">Marie</span> <span class="word">Claire</span> <span class="word">du</span> <span class="word">Bois</span><span class="operator">,</span><span class="double">"26 Clos du Bois Rossia,Namur,,5017,BE"</span><span class="operator">,</span><span class="word">SidhaW1lIGJpbmUgbGVzIGhpc3RvaXJlcyBkZSBHYXN0b24gTGFnYWZmZQ</span><span class="operator">==</span></code></pre>
<h4 id="Output">Output</h4>
<pre><code> ID,FirstName,MiddleName,LastName,Street,City,Province,PostCode,Country,Match,FullName,Address,ExtraData
1,John,Jacob,Jingleheimer-Schmidt,1320 Needham Drive,Centreville,NV,89119,USA,89,Jon Jacob Jingleheimmer Schmitt,1320 Needham Dr, Centerville, NV 89119, USA,Tm90aGluZyB0byBzZWUgaGVyZS4=
2,Bartholomew,Jay,Simpson,1322 Evergreen Terrace,Springfield,,,USA,71
3,Marie Claire,,Dubois,62 Clos du Bois Rossia,Namur,,5017,BE,85,Mary Claire Dubois,62 Clos duBois Russia,Namur,,5017,BE,dGl0aSB0b3RvIHR1dHUgdGF0YQ==
3,Marie Claire,,Dubois,62 Clos du Bois Rossia,Namur,,5017,BE,85,Marie Claire du Bois,26 Clos du Bois Rossia,Namur,,5017,BE,SidhaW1lIGJpbmUgbGVzIGhpc3RvaXJlcyBkZSBHYXN0b24gTGFnYWZmZQ==</code></pre>
</div>2016-12-18T00:00:00ZJerrad PierceWriting command line tools made easyhttp://perladvent.org/2016/2016-12-17.html<div class='pod'><p>Dear Santa,</p>
<p>I want to eliminate programming. Well, the boring kind of programming, at least.</p>
<p>Ok, that's a huge wish. Let's talk about commandline tools for a start.</p>
<h3 id="Command-line-options">Command line options</h3>
<p>There's really good support in Perl for reading options. For example, see the well known modules <a href="https://metacpan.org/module/Getopt::Long">Getopt::Long</a>, <a href="https://metacpan.org/module/Getopt::Long::Descriptive">Getopt::Long::Descriptive</a>, <a href="https://metacpan.org/module/Getopt::Long::DescriptivePod">Getopt::Long::DescriptivePod</a>, <a href="https://metacpan.org/module/Pod::Usage">Pod::Usage</a> and several more on the CPAN. In fact, there are so many modules for processing command line options in Perl that this year perlancar is writing a whole <a href="https://perlancar.wordpress.com/2016/12/">advent calendar</a> just about them!</p>
<p>Do any of them do exactly what I want though? I actually want subcommands, nested. And named parameters. And validation. And shell completion. And still be able to define it all in one place.</p>
<p>Let's imagine writing a hypothetical command line weather application that can be used to look up and predict the weather around the world. How would we like our application to function? And what would we like the corresponding code to look like?</p>
<h3 id="Desired-Feature:-Subcommands">Desired Feature: Subcommands</h3>
<p>So in addition to being able to pass simple commands to our application like <code>forcast</code>:</p>
<pre><code> % weather forecast</code></pre>
<p>I want to have subcommands - passing a top level command like <code>list</code> and then having that take another command to tell it to list <code>countries</code> or <code>cities</code>:</p>
<pre><code> % weather list countries
% weather list cities</code></pre>
<p>And I want each of those three things to have different options and parameters:</p>
<pre><code> % weather forecast [(--show-temperature | -T)] \
[--celsius|--fahrenheit] <country> <city>
% weather list countries
% weather list cities [(--country | -c) <country>]</code></pre>
<p>How would we like each of those commands to look like in the App::Weather class? How about a subroutine for each command:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">forecast</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="operator">,</span> <span class="symbol">$run</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$country</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">country</span><span class="structure">};</span><br /> <span class="keyword">my</span> <span class="symbol">$city</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">city</span><span class="structure">};</span><br /> <span class="keyword">my</span> <span class="symbol">$show_temp</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">options</span><span class="operator">-></span><span class="structure">{</span><span class="double">"show-temperature"</span><span class="structure">};</span><br /><br /><span class="comment"> # While you can use print directly, using C<out> makes<br /> # it easier to test the app, and give plugins the possibility<br /> # to modify the output<br /></span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">out</span><span class="structure">(</span><span class="double">"Snow in $city, $country"</span><span class="structure">);</span><br /><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$show_temp</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$symbol</span> <span class="operator">=</span> <span class="double">"\N{DEGREE SIGN}C"</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$temperature</span> <span class="operator">=</span> <span class="word">forecast</span><span class="structure">(</span><span class="operator">...</span><span class="structure">);</span><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$run</span><span class="operator">-></span><span class="word">options</span><span class="operator">-></span><span class="structure">{</span><span class="word">fahrenheit</span><span class="structure">})</span> <span class="structure">{</span><br /> <span class="symbol">$symbol</span> <span class="operator">=</span> <span class="double">"\N{DEGREE SIGN}F"</span><span class="structure">;</span><br /> <span class="symbol">$temperature</span> <span class="operator">=</span> <span class="word">c2f</span><span class="structure">(</span><span class="symbol">$temperature</span><span class="structure">);</span><br /> <span class="structure">}</span><br /> <span class="symbol">$run</span><span class="operator">-></span><span class="word">out</span><span class="structure">(</span><span class="double">"Temperature: $temperature$symbol"</span><span class="structure">);</span><br /> <span class="structure">}</span><br /><span class="structure">}</span></code></pre>
<h3 id="Desired-Feature:-One-place-for-specification-and-documentation">Desired Feature: One place for specification and documentation</h3>
<p>But how would we like to specify which subroutine mapped to which command or subcommand? With a simple YAML spec file:</p>
<pre><code class="code-listing"><span class="synIdentifier">name</span><span class="synSpecial">:</span> weather<br /><span class="synIdentifier">appspec</span><span class="synSpecial">:</span> <span class="synSpecial">{</span> <span class="synIdentifier">version</span><span class="synSpecial">:</span> <span class="synConstant">'0.001'</span> <span class="synSpecial">}</span><br /><span class="synIdentifier">title</span><span class="synSpecial">:</span> Weather forecast<br /><span class="synIdentifier">class</span><span class="synSpecial">:</span> App::Weather<br /><span class="synComment"># no global options; -h|--help will be there automatically</span><br /><span class="synIdentifier">options</span><span class="synSpecial">:</span> <span class="synSpecial">[]</span><br /><span class="synIdentifier">subcommands</span><span class="synSpecial">:</span><br /> <span class="synIdentifier">forecast</span><span class="synSpecial">:</span><br /> <span class="synIdentifier">summary</span><span class="synSpecial">:</span> Show forecast for a city<br /> <span class="synIdentifier">op</span><span class="synSpecial">:</span> forecast <span class="synComment"> # the method in App::Weather</span><br /> <span class="synIdentifier">parameters</span><span class="synSpecial">:</span><br /> <span class="synStatement">- </span><span class="synIdentifier">spec</span><span class="synSpecial">:</span> country=s --Country name<br /> <span class="synStatement">- </span><span class="synIdentifier">spec</span><span class="synSpecial">:</span> city=s --City name<br /> <span class="synIdentifier">options</span><span class="synSpecial">:</span><br /> <span class="synStatement">- </span><span class="synIdentifier">spec</span><span class="synSpecial">:</span> show-temperature|T --Display temperature<br /> <span class="synStatement">- </span><span class="synIdentifier">spec</span><span class="synSpecial">:</span> fahrenheit --Temperature in Fahrenheit<br /> <span class="synStatement">- </span><span class="synIdentifier">spec</span><span class="synSpecial">:</span> celsius --Temperature in Celsius<br /> <span class="synIdentifier">list</span><span class="synSpecial">:</span><br /> <span class="synIdentifier">subcommands</span><span class="synSpecial">:</span><br /> <span class="synIdentifier">countries</span><span class="synSpecial">:</span><br /> <span class="synIdentifier">summary</span><span class="synSpecial">:</span> List countries<br /> <span class="synIdentifier">op</span><span class="synSpecial">:</span> weather_countries<br /> <span class="synIdentifier">cities</span><span class="synSpecial">:</span><br /> <span class="synIdentifier">summary</span><span class="synSpecial">:</span> List cities<br /> <span class="synIdentifier">op</span><span class="synSpecial">:</span> weather_cities<br /> <span class="synIdentifier">options</span><span class="synSpecial">:</span><br /> <span class="synComment"> # The first element of the spec here is actually very similar</span><br /> <span class="synComment"> # to the syntax for Getopt::Long</span><br /> <span class="synStatement">- </span><span class="synIdentifier">spec</span><span class="synSpecial">:</span> country|c=s --Country name</code></pre>
<p>There are many advantages in having a seperate specification. It's the same idea as having an OpenAPI or similar specification for a REST API where everything is specified in one place and multiple tools can make use of the information to do things with it.</p>
<p>As we look at other features we'll see how having this specification is a really powerful idea.</p>
<h3 id="Desired-Feature:-Validation">Desired Feature: Validation</h3>
<p>I want to specify a type or other constraints in the spec for options and parameters. If validation fails, the error message and usage should be generated for me by the framework. Ideally the usage output will color the invalid/missing item in red.</p>
<pre><code> % multiply foo 23
Parameter x: invalid integer</code></pre>
<p>I also want the possibility to callback the app itself for validation where it's not possible ahead of time to know all the options in a fixed specification:</p>
<pre><code> % weather forecast Romania Cluj
...
% weather forecast Northpole Santa
...
% weather forecast Moon Darkside
Sorry, we don't have Darkside, Moon in our database</code></pre>
<p>In our hypotheticaly module the app command in my Perl program could be called with the information that the parameter <code>country</code> is about to be validated. The app should then return the list of possible countries, which the framework could then automatically compare to the parameter passed in.</p>
<p>The same happens for the parameter <code>city</code>. Now the app takes the <code>country</code> parameter and returns the list of cities in that country.</p>
<pre><code class="code-listing"><span class="comment"># in validation mode<br /></span><span class="keyword">my</span> <span class="symbol">$country</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">country</span><span class="structure">};</span><br /><br /><span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$param_to_validate</span> <span class="operator">eq</span> <span class="single">'country'</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="structure">[</span> <span class="word">country_list</span><span class="structure">()</span> <span class="structure">];</span><br /><span class="structure">}</span><br /><span class="keyword">elsif</span> <span class="structure">(</span><span class="symbol">$param_to_validate</span> <span class="operator">eq</span> <span class="single">'city'</span><span class="structure">)</span> <span class="structure">{</span><br /><span class="comment"> # Currently there's no way to add a custom error message<br /> # like this:<br /> # Sorry, we don't have Darkside, Moon in our database<br /></span> <span class="keyword">return</span> <span class="structure">[</span> <span class="word">city_list</span><span class="structure">(</span><span class="symbol">$country</span><span class="structure">)</span> <span class="structure">];</span><br /><span class="structure">}</span></code></pre>
<p>Of course, I could do that validation also myself, when the actual command is called, but this way I save the code for comparing the list with the given parameter, and for the error message.</p>
<p>Some of the modules on the CPAN already support features like this: <a href="https://metacpan.org/module/App::Cmd">App::Cmd</a> and <a href="https://metacpan.org/module/MooseX::App">MooseX::App</a> both support types in their own way, though I don't know about such callbacks though.</p>
<h3 id="Desired-Feature:-Shell-Tab-Completion">Desired Feature: Shell Tab Completion</h3>
<p>Tab is probably my most used key when working on the commandline. Even more since I switched from bash to zsh a couple of years ago.</p>
<p>Here are some simple things I want to have supported out of the box:</p>
<pre><code> # Static completion
% weather <TAB>
forecast -- Show forecast
list -- List countries or cities
% weather list <TAB>
cities -- List cities
countries -- List countries
% weather list cities --<TAB>
--country -- country name(s)
--help -h -- help</code></pre>
<p>This gets a bit more complicated:</p>
<pre><code> # Dynamic completion, calls back the app from the shell.
% weather list cities --country <TAB>
Romania Spain Netherlands
% weather forecast <TAB>
Romania Spain Netherlands
% weather forecast Netherlands <TAB>
Echt Amsterdam Rotterdam</code></pre>
<p>I want to be able to specify some static values for completion and validation in the spec, but also be able to call back the app, like in the previous examples.</p>
<p>Like in validation mode, the app is called with the information that a certain parameter is about to be completed. For example when completing the city in the last example. I have access to the country parameter and now return the list of cities. Completion code will then be generated and returned to the shell.</p>
<p>Additionally here I can also return a list of hashrefs so that the completion will be shown with a description.</p>
<p>I can even output some dynamic information in the completion description. As an example see the convert command which takes a unit type, a source unit, a value and a target unit.</p>
<pre><code> % convert distance foot 23 <TAB>
inch -- 276.000in
meter -- 7.010m</code></pre>
<p>So the convert app already calculates the corresponding values when doing completion.</p>
<pre><code class="code-listing"><span class="comment"># in completion mode<br /></span><span class="keyword">my</span> <span class="symbol">$type</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">type</span><span class="structure">};</span> <span class="comment"># distance or temperature</span><br /><span class="keyword">my</span> <span class="symbol">$source</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">source</span><span class="structure">};</span> <span class="comment"># source unit (meter, inch, ...)</span><br /><span class="keyword">my</span> <span class="symbol">$value</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">value</span><span class="structure">};</span><br /><span class="keyword">my</span> <span class="symbol">$target</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">parameters</span><span class="operator">-></span><span class="structure">{</span><span class="word">target</span><span class="structure">};</span> <span class="comment"># target unit</span><br /><br /><span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$param_to_complete</span> <span class="operator">eq</span> <span class="single">'target'</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="structure">[</span> <span class="word">target_units_for</span><span class="structure">(</span><span class="symbol">$type</span><span class="operator">,</span> <span class="symbol">$source</span><span class="structure">)</span> <span class="structure">];</span><br /><br /><span class="comment"> # alternatively here you can return a list of hashrefs with<br /> # descriptions or any dynamic value<br /> # depending on $type, $source and $value<br /> # return [<br /> # { name => "inch", "276.000in" },<br /> # ...<br /> # ];<br /></span><span class="structure">}</span></code></pre>
<h3 id="The-new-wheel">The new wheel</h3>
<p>Is there anything on the CPAN that can already do what we want? When searching for existing modules I found <a href="https://metacpan.org/module/App::Cmd">App::Cmd</a>, <a href="https://metacpan.org/module/MooseX::App">MooseX::App</a>, <a href="https://metacpan.org/module/MooseX::App::Cmd">MooseX::App::Cmd</a>, <a href="https://metacpan.org/module/MouseX::App::Cmd">MouseX::App::Cmd</a>, <a href="https://metacpan.org/module/MooX::Cmd">MooX::Cmd</a> and many more.</p>
<p><a href="https://metacpan.org/module/App::Cmd">App::Cmd</a> has some nice ideas: For the options it uses <a href="https://metacpan.org/module/Getopt::Long::Descriptive">Getopt::Long::Descriptive</a>. However, it doesn't support named parameters. I find the mix of writing pod and using methods it uses a bit confusing, and although it has the advantage of keeping the spec near the code it limits reuse of the specification in different ways. Shell tab completion integration is a bit complicated; I think I got it working for dzil and bash, no zsh, and still the completion seemed to do only basic things.</p>
<p><a href="https://metacpan.org/module/MooseX::App">MooseX::App</a> is also very nice and I stole some ideas from there also. I like the colorized output. Specification of options and parameters is of course very moosish. Disadvantage is that it's quite heavy. There was bash completion support and I wrote the port for zsh.</p>
<h4 id="Introducing-App::Spec">Introducing App::Spec</h4>
<p>So, to make the long story short, Santa said, there is no such module. I would have to write it myself.</p>
<p>I called it <a href="https://metacpan.org/module/App::Spec">App::Spec</a>. If this sounds interesting and useful, please have a look.</p>
<p>The examples here are variations of the <code>myapp</code> example command included in the distribution. I also use it for testing.</p>
<p>The things I described are already working.</p>
<h4 id="The-appspec-command">The <code>appspec</code> command</h4>
<p>For generating a quick start app, completion, pod and schema validation, look at <a href="https://metacpan.org/module/appspec">appspec</a> and <a href="https://metacpan.org/module/App::AppSpec">App::AppSpec</a>.</p>
<p>This is the core advantage in having the YAML spec file - the same file can then be used by several tools. So with the <code>appspec</code> tool I can simply give a spec file and generate completion and pod, or validate my file against the schema.</p>
<pre><code> % appspec validate myapp.yaml
% appspec completion myapp.yaml --zsh > dir/_myapp
% appspec pod myapp.yaml > myapp.pod</code></pre>
<p>If this framework is ported to another language, these things don't have to be ported, because there is already this Perl tool. (Of course, validation might include language specific restrictions, though.)</p>
<p>Also, if I have an existing command which lacks completion, I can write a spec for it and generate the completion files without needing to touch the app itself!</p>
<h4 id="Future-Improvements">Future Improvements</h4>
<p>There are many things that aren't fixed yet, but I hope most future changes will mostly concern the internals.</p>
<dl>
<dt>Types</dt>
<dd>
<p>I don't know how to define complex types for validation yet. For now, there's flag, string, integer, file, dir. <code>file</code> automatically checks if the file exists. I want to have some kind of alternation <code>file|integer</code>. Maybe I can use <a href="https://metacpan.org/module/Params::Validate">Params::Validate</a> somehow, like <a href="https://metacpan.org/module/Getopt::Long::Descriptive">Getopt::Long::Descriptive</a> does?</p>
</dd>
<dt>Classes and subcommands</dt>
<dd>
<p>Currently an app consists of one class and one method per subcommand, and you have to specify the method name. Other frameworks use one class per subcommand.</p>
<p>Both can make sense, so I want to suppprt both. I have to figure out how configuration would look like</p>
</dd>
<dt>Plugins</dt>
<dd>
<p>I started to work on plugins by converting the help subcommand to a plugin. I think I have to do some refactoring here.</p>
</dd>
</dl>
<p>The spec itself will have versioning, so that you can write a spec in an old format, and if there are changes, it will make the necessary conversions.</p>
<h3 id="Final-Desired-Feature:-Generating-whole-apps">Final Desired Feature: Generating whole apps!</h3>
<p>So it turns out I had one more desired feature, which turned out to be related and was one of the reasons why I really had to reinvent the wheel.</p>
<p>I like the command line, like you could have guessed by now, and I would like to be able to query an API from there.</p>
<p>I don't want to remember and type all the endpoints and possible options. I want to do:</p>
<pre><code> % githubcl <TAB>
DELETE -- DELETE call
GET -- GET call
PATCH -- PATCH call
POST -- POST call
PUT -- PUT call
help -- Show command help
% githubcl GET /<TAB>
zsh: do you wish to see all 568 possibilities (143 lines)? n
% githubcl GET /users/:username<TAB>
/users/:username -- Get a single user.
/users/:username/events -- If you are authenticated as the given user, you wi...
/users/:username/events/orgs/:org -- This is the user's organization dashboard. You mus...
...
% githubcl GET /issues --<TAB>
--q-direction
--q-sort
--q-labels -- String list of comma separated Label names.
--q-filter -- Issues assigned to you / created by you / mentioning you / ...
--q-since -- Optional string of a timestamp in ISO 8601 format: ...
% githubcl GET /issues --q-filter <TAB>
all assigned created mentioned subscribed</code></pre>
<p>As it turns out, there is an unofficial github OpenAPI spec.</p>
<p>So, I have a document which describes the API very well. I can write a script to turn that into an App::Spec commandline app!</p>
<p>When I played with MooseX::App, I tried to generate an app from an OpenAPI file. Every endpoint should be a separate subcommand, because the possible options and parameters depend on the endpoint.</p>
<p>So I would have generated over 500 Moose classes for this example. That didn't seem right.</p>
<p>In App::Spec, I can have a number of nested subcommands, but the command to be called can be the same for all. That's possible by defining the name of the op at the top command and leave the subcommands' op fields empty.</p>
<pre><code> # appspec
name: githubcl
...
subcommands:
GET:
op: request
subcommands:
/issues:
summary: List issues
# no op defined here
options: ...
/user:
summary: Info about the current authenticated user
...
POST:
op: request
options:
- spec: data-file= +file --File with the input for the post request
subcommands:
/gists:
summary: Create a gist
options: ...
PATCH:
op: request
...</code></pre>
<p>This way the <code>request</code> method of the app will be called, with additional information which subcommands were called.</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">request</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="operator">,</span> <span class="symbol">$run</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$commands</span> <span class="operator">=</span> <span class="symbol">$run</span><span class="operator">-></span><span class="word">commands</span><span class="structure">;</span><br /><span class="comment"> # [ 'POST', '/gists' ]<br /></span><span class="structure">}</span></code></pre>
<p>With this, I now have a generic REST API CLI framework: <a href="https://metacpan.org/module/API::CLI">API::CLI</a>.</p>
<p>It's still very experimental. There are some problems with completion under bash (probably caused by <code>:</code> in endpoints)</p>
<h3 id="SEE-ALSO">SEE ALSO</h3>
<dl>
<dt><a href="https://metacpan.org/module/App::Spec">App::Spec</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/App::AppSpec">App::AppSpec</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/App::Spec::Tutorial">App::Spec::Tutorial</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/API::CLI">API::CLI</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/Getopt::Long">Getopt::Long</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/Getopt::Long::Descriptive">Getopt::Long::Descriptive</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/Getopt::Long::DescriptivePod">Getopt::Long::DescriptivePod</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/MooseX::Getopt">MooseX::Getopt</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/Pod::Usage">Pod::Usage</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/Applify">Applify</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/App::Cmd">App::Cmd</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/MooseX::App">MooseX::App</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/MooseX::App::Cmd">MooseX::App::Cmd</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/MouseX::App::Cmd">MouseX::App::Cmd</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/MooX::Cmd">MooX::Cmd</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/CLI::Framework">CLI::Framework</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/CLI::Dispatch">CLI::Dispatch</a></dt>
<dd>
</dd>
<dt><a href="https://metacpan.org/module/Term::ShellUI">Term::ShellUI</a></dt>
<dd>
</dd>
<dt>OpenAPI specs <a href="https://github.com/APIs-guru/openapi-directory">https://github.com/APIs-guru/openapi-directory</a></dt>
<dd>
</dd>
</dl>
</div>2016-12-17T00:00:00ZTina MüllerA Geo Parser for vast amounts of Text http://perladvent.org/2016/2016-12-16.html<div class='pod'><h2 id="Building-Santas-Treasure-Map-with-Geo::Parser::Text">Building Santa's Treasure Map with Geo::Parser::Text</h2>
<h3 id="Or-finding-jewels-hidden-in-big-data">Or, finding jewels hidden in big data</h3>
<pre><code> I know there is a treasure
of rare pearls somewhere
I wish I could measure
I wish I knew where
treasures are fun to have
and even great to possess
but, the greatest fun to be had
is by mapping out exactly where.</code></pre>
<h2 id="Geoparsing-a-very-difficult-problem">Geoparsing, a very difficult problem</h2>
<p>According to Directions Magazine in the article <a href="http://www.directionsmag.com/entry/geoparsing-maps-the-future-of-text-documents/122487">"Geoparsing Maps The Future Of Text Documents"</a> geoparsing is an almost magical, complex technological process that relies on data to put geo-information into context.</p>
<p>The problem of geoparsing has been attempted many times over. The now defunct Yahoo Placenames was used to extract and disambiguate place names from text. Other notable defunct projects include BioGeomancer from Berkley and NGA GEOnet Names Server.</p>
<p>There are also a few other current open projects from both academia and the developer community: geolocate from Tulane University, CLAVIN - Cartographic Location And Vicinity INdexer, and <a href="https://github.com/openeventdata/mordecai">mordecai</a>.</p>
<p>Commercial heavyweights like MetaCarta extract information about place and time, while others like Digital Reasoning (GeoLocator), Lockheed Martin (AeroText), and SRA (NetOwl) extract place and person names.</p>
<p>All of current applications in this field focus on extracting place names from a piece of text, geotag, disambiguate, resolve them to the correct place, and return their coordinates and structured geographic information.</p>
<p>I don't know of any geoparser that goes beyond place names, by that I mean one that geoparses street addresses and street intersections too. Just like Santa, maybe it does exist. Maybe not. Either way, I will try to build it.</p>
<p><a href="https://metacpan.org/module/Geo::Parser::Text">Geo::Parser::Text</a> interfaces with <a href="http://geocode.xyz">http://geocode.xyz</a>, a geoparser written in Perl, to disambiguate and extract complete location information from text (locations expressed as combinations of street names and place names.)</p>
<h3 id="The-Motivation">The Motivation?</h3>
<p>I'd like to get a more detailed map of every location ever mentioned in literature. So, I'm currently geoparsing the gutenberg project (and a few others kind enough to offer free books for download)</p>
<p>Books are rich with actual location information (unless they are fantasy books mentioning fantasy locations, such as Santa's lair in the North Pole) - so are tweets, microblogs, chats, and more.</p>
<p>If I search for "Which books mention Rue Saint-Jacques in Paris?" I'll get a mixed bag of answers, some good some irrelevant. I will have the complete list soon, but from the books I have parsed this location is mentioned in the Count of Monte Cristo.</p>
<p>Nothing so far on the North Pole. Stay tuned.</p>
<h3 id="Finding-Where-in-What">Finding Where in What</h3>
<p>People keep reporting on what they do and sometimes we need to know where.</p>
<p>Knowing for certain, now that is a problem nobody quite knows how to solve. That includes me. For that I have a probabilistic model that computes a 'confidence score' which takes values between 0 and 1, with 1 being the highest confidence score. (The confidence score must be more than 0 for a result to be returned.)</p>
<p><a href="https://metacpan.org/module/Geo::Parser::Text">Geo::Parser::Text</a> is just an interface. The NLP module behind geocode.xyz does the heavy lifting.</p>
<h2 id="A-quick-overview">A quick overview</h2>
<pre><code class="code-listing"><span class="comment">#!/usr/bin/perl<br /></span><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Geo::Parser::Text</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Data::Dumper</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$g</span> <span class="operator">=</span> <span class="word">Geo::Parser::Text</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="single">'http://geocode.xyz'</span><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$str</span> <span class="operator">=</span> <span class="heredoc"><<'TEXT'</span><span class="structure">;</span><br /><span class="heredoc_content">"Why, my dear boy, when a man has been proscribed by the mountaineers,<br />has escaped from Paris in a hay-cart, been hunted over the plains of<br />Bordeaux by Robespierre's bloodhounds, he becomes accustomed to most things.<br />But go on, what about the club in the Rue Saint-Jacques?"<br />"Why, they induced General Quesnel to go there, and General Quesnel, who<br />quitted his own house at nine o'clock in the evening, was found the nextday<br />in the Seine."<br /></span><span class="heredoc_terminator">TEXT<br /></span><br /><span class="comment"># context aware parsing (may be slower)<br /></span><span class="keyword">my</span> <span class="symbol">$ref</span> <span class="operator">=</span> <span class="symbol">$g</span><span class="operator">-></span><span class="word">geocode</span><span class="structure">(</span><span class="word">scantext</span><span class="operator">=></span><span class="symbol">$str</span><span class="structure">);</span><br /><span class="word">print</span> <span class="word">Dumper</span> <span class="symbol">$ref</span><span class="structure">;</span></code></pre>
<p>And the response comes out as:</p>
<pre><code class="code-listing"><span class="symbol">$VAR1</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="single">'match'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'2.343'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.7'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'258,89,84'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'48.8463'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'RUE SAINT-JACQUES, PARIS, FR'</span><br /> <span class="structure">}</span><br /> <span class="structure">};</span></code></pre>
<p>This response provides latitude and longitude (latt, longt), the confidence score (0.7 in this case), where in text the match was made (MentionIndices) and the location.</p>
<p>There is only one location returned in this example (strict mode is default). It could however match many more in <code>nostrict</code> mode:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$ref</span> <span class="operator">=</span> <span class="symbol">$g</span><span class="operator">-></span><span class="word">geocode</span><span class="structure">(</span><span class="word">scantext</span><span class="operator">=></span><span class="symbol">$str</span><span class="operator">,</span><span class="word">nostrict</span> <span class="operator">=></span> <span class="number">1</span><span class="structure">);</span></code></pre>
<p>The ambiguity rich response might be:</p>
<pre><code class="code-listing"><span class="symbol">$VAR1</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="single">'match'</span> <span class="operator">=></span> <span class="structure">[</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'1'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'142,84'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'44.84122467294317'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'-0.5819759504279641'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'Bordeaux, FR'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.9'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'2.321780662038148'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'Paris, FR'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'48.8585406663244'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'89,84'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'89,131'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'Paris, PL'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'18.623269999999998'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'53.01594'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'13.6377402411685'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'50.50466656250016'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'most, CZ'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'206'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.15'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.1'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'206,9'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'51.767255'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'12.27308'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'most, DE'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.1'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'206'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'15.14741'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'most, SI'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'45.96118'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'104,340'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'11.93452'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'46.04092'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'cart, IT'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'122'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'52.69396936842102'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'-1.2728069473684216'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'over, UK'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'34'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'7.42547'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'46.94972'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'been, CH'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'131'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'plains, UK'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'-3.9319304347826085'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'55.88144239130435'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'363,76'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'-8.851603846153846'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'42.67637230769231'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'nine, ES'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'nine, PT'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'-8.54254'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'41.47052'</span><span class="operator">,</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'363'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'84'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'-2.32211'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'from, UK'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'51.22834'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'MentionIndices'</span> <span class="operator">=></span> <span class="single">'246,340'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'11.50476'</span><span class="operator">,</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'44.58116'</span><span class="operator">,</span><br /> <span class="single">'location'</span> <span class="operator">=></span> <span class="single">'club, IT'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.05'</span><br /> <span class="structure">}</span><br /> <span class="structure">]</span><br /> <span class="structure">};</span></code></pre>
<p>But then again, that's probably not what you want. (did you notice there is a town named "club" in Italy? Or a city named "From" in the UK? Is that where you are from?)</p>
<p>I want no ambiguity. So I've made strict mode the default, but if you must insist, pass a parameter named <code>nostrict.</code></p>
<h2 id="Geocoding">Geocoding</h2>
<p>And if I want only one result (geocoding is just a specific case of geoparsing) try:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$ref</span> <span class="operator">=</span> <span class="symbol">$g</span><span class="operator">-></span><span class="word">geocode</span><span class="structure">(</span><span class="word">locate</span><span class="operator">=></span><span class="symbol">$str</span><span class="structure">);</span></code></pre>
<p>So a geocoding answer comes back with a bit more info on the location:</p>
<pre><code class="code-listing"><span class="symbol">$VAR1</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="single">'latt'</span> <span class="operator">=></span> <span class="single">'48.84630'</span><span class="operator">,</span><br /> <span class="single">'longt'</span> <span class="operator">=></span> <span class="single">'2.34300'</span><span class="operator">,</span><br /> <span class="single">'standard'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'prov'</span> <span class="operator">=></span> <span class="single">'FR'</span><span class="operator">,</span><br /> <span class="single">'city'</span> <span class="operator">=></span> <span class="single">'Paris'</span><span class="operator">,</span><br /> <span class="single">'addresst'</span> <span class="operator">=></span> <span class="single">'RUE SAINT JACQUES'</span><span class="operator">,</span><br /> <span class="single">'confidence'</span> <span class="operator">=></span> <span class="single">'0.10'</span><span class="operator">,</span><br /> <span class="single">'postal'</span> <span class="operator">=></span> <span class="single">'75005'</span><br /> <span class="structure">}</span><br /> <span class="structure">};</span></code></pre>
<p>It gives you a breakdown of the elements (street, city, country) and sometimes provides the post code.</p>
<p>This is probably what you want to do when you want to parse locations out of short text or twitter feeds. Or if you want to parse out locations out of Santa's letters.</p>
<h2 id="Batch-Geocoding">Batch Geocoding</h2>
<p>As exciting as geoparsing and geocoding are, nothing is more useful than batch geocoding. Just supply a file of locations (one per line) and get those locations plotted on a map.</p>
<p>Santa gets a lot of letters, such as <a href="http://cuyahogafallshistory.com/2014/12/1930s-dear-santa-claus/">this one</a></p>
<pre><code> Dear Santa,
I am a little boy, six years old and I go to school. Please bring me a
motor boat, a gun, a go and stop sign, a tractor, football, pair of
roller skates, scooter, and house slippers and oranges and nuts. I am
a real good little boy. Your little friend,
Junior Hall - 1084 Stroman Ave Akron, O</code></pre>
<p>Suppose Santa keeps a text file of all these letters. All it needs to do is upload the file on a batch geocoding service to see where they come from.</p>
<pre><code> Junior Hall - 1084 Stroman Ave Akron
Bobbie Mitchell - RFD1 Cuyahoga Falls
3057 W. Bailey Road
... etc</code></pre>
<p>Save them on a text file, say letters.txt, then use this bash one liner to batch geocode them (If Santa knows Perl, I bet he also knows Bash)</p>
<pre><code> #!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
echo $line,`curl -X POST -d locate="$line" -d geoit="csv" https://geocoder.ca`;
done < "$1"</code></pre>
<p>Save this bash file as locate.sh then</p>
<pre><code> chmod a+rx locate.sh
./locate.sh letters.txt>letters.txt.out</code></pre>
<p>And, there you go:</p>
<pre><code> cat letters.txt.out
Junior Hall - 1084 Stroman Ave Akron,200,0.8,41.050184,-81.488269
Bobbie Mitchell - RFD1 Cuyahoga Falls,200,0.6,41.15102692522035,-81.4960373662322
3057 W. Bailey Road,200,0.5,43.50055337491875,-92.59065981245</code></pre>
<p>With Maps Thunderforest, Data OpenStreetMap contributors, Santa can visualize the results on a map:</p>
<center><img src="kids.jpg" width="284" height="331"/></center>
<center><img src="map.png" width="284" height="331"/></center>
<h2 id="LIMITATIONS">LIMITATIONS</h2>
<p>It may be slow sometimes, I've squizzed geocode.xyz into a small t1.micro AWS server (which is free) with 1G of RAM and 1vCPU (YES Perl runs reasonably fast on that). There is no rate limiting or throttling either, so that may be a factor too (depending on how people are abusing the API.) If that does not cut it for you, get your own server on AWS with the provided server image. (I suspect a P2 GPU instance will perform quite well)</p>
<p>Also, geocode.xyz is limited to about 50 European countries. geocoder.ca covers North America.</p>
<h3 id="FUTURE-PLANS">FUTURE PLANS</h3>
<p>The day of a geoparser with worldwide coverage is not far. There is a pretty good chance that will happen before the New Year with a high confidence score. Could be one of Santa's gifts along with a faster server to run it on.</p>
<p>And, That's where it's at!</p>
<h2 id="SEE-ALSO">SEE ALSO</h2>
<p>* <a href="https://metacpan.org/module/Geo::Coder::OpenCage">Geo::Coder::OpenCage</a> * <a href="https://metacpan.org/module/Text::NLP">Text::NLP</a> * <a href="https://geocode.xyz">Geocode.xyz</a> * <a href="https://geocoder.ca">Geocoder.ca</a></p>
</div>2016-12-16T00:00:00ZErvin RuciMaking Perl Functionalhttp://perladvent.org/2016/2016-12-15.html<div class='pod'><p>"So you see, I simply have too many nice children to sort from the naughty ones. There's only a few days until its time to take presents to only the nice ones," Santa said sadly. "To make things worse, our database provider got DDoS'd by the naughty children's botnets, and all I have is this CSV backup!"</p>
<p>"Santa! Don't worry!" said Isabella the elf cheerily. "I think I have a solution to your problem!"</p>
<p>Santa looked at Isabella with keen interest. "Well, what's that then?" he asked curiously.</p>
<h3 id="Using-functional-techniques-on-Perl-lists">Using functional techniques on Perl lists</h3>
<p>Perl is not often thought of as a functional programming language, but it has the key list evaluation primitives of <code>filter</code> known as <code>grep</code> in Perl, <code>map</code>, and <code>reduce</code> (sometimes called <code>fold</code> in other languages.)</p>
<p>Perl also supports functions as first-class variables, so it has all of the necessary ingredients to be a fruitful functional programming environment. (If this idea intrigues you, I very highly recommend Mark Jason Dominus' excellent book <a href="http://hop.perl.plover.com/book/">Higher Order Perl</a> which you may read and download for free from his website.)</p>
<p><code>grep</code> takes a list and applies a "predicate" - a code block that returns a true or false value. If the predicate returns true, the list item is passed into a new list. If the predicate returns false, the list item is discarded.</p>
<p>Here's part of Santa's CSV file:</p>
<pre><code> id, name, nice, siblings, present
1, Hermione, 1, 0, Wand
2, Delores, 0, 1, Coal
3, Draco, 0, 0, Dirty sock
4, Ronald, 1, 6, Scarf</code></pre>
<h3 id="Using-grep">Using grep</h3>
<p>Our task here is to filter the naughty names from the nice ones. We're going to use <a href="https://metacpan.org/module/Text::CSV">Text::CSV</a> to parse the data, because friends don't let friends parse CSV using <code>split</code>.</p>
<p>In this particular case, our predicate is very simple. We can check to see if the <code>nice</code> field contains a 1 or a 0. If it has a 1, we add the child to our list of nice children.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="float">5.014</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Text::CSV</span><span class="structure">;</span> <span class="comment"># could use Text::CSV_XS instead</span><br /><span class="keyword">use</span> <span class="word">IO::String</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$data</span> <span class="operator">=</span> <span class="heredoc"><<'DATA'</span><span class="structure">;</span><br /><span class="heredoc_content">id, name, nice, siblings, present<br />1, Hermione, 1, 0, Wand<br />2, Delores, 0, 1, Coal<br />3, Draco, 0, 0, Dirty sock<br />4, Ronald, 1, 6, Scarf<br /></span><span class="heredoc_terminator">DATA<br /></span><br /><span class="keyword">my</span> <span class="symbol">$fh</span> <span class="operator">=</span> <span class="word">IO::String</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="symbol">$data</span><span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">$csv</span> <span class="operator">=</span> <span class="word">Text::CSV</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="structure">{</span> <span class="word">allow_whitespace</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">}</span> <span class="structure">);</span><br /><br /><span class="symbol">$csv</span><span class="operator">-></span><span class="word">column_names</span><span class="structure">(</span> <span class="symbol">$csv</span><span class="operator">-></span><span class="word">getline</span><span class="structure">(</span><span class="symbol">$fh</span><span class="structure">)</span> <span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">@nice</span> <span class="operator">=</span> <span class="word">grep</span> <span class="structure">{;</span> <span class="magic">$_</span><span class="operator">-></span><span class="structure">{</span><span class="word">nice</span><span class="structure">}</span> <span class="structure">}</span> <span class="cast">@</span><span class="structure">{</span> <span class="symbol">$csv</span><span class="operator">-></span><span class="word">getline_hr_all</span><span class="structure">(</span><span class="symbol">$fh</span><span class="structure">)</span> <span class="structure">};</span><br /><span class="word">close</span> <span class="symbol">$fh</span><span class="structure">;</span></code></pre>
<h3 id="Why-do-I-use-as-the-first-character-in-my-code-block">Why do I use <code>;</code> as the first character in my code block?</h3>
<p>I want to make it abundantly clear to Perl that I am using a code block. There are various ways to ensure that curly braces are interpreted as a code block but I never remember them. Using a semi-colon (<code>;</code>) as the very first token in my block ensures that Perl knows that whatever follows should be considered a code block (and not, for example a hash operation.)</p>
<h3 id="Using-map">Using map</h3>
<p>Santa has a database with latitude and longitude for each child's home. We need to add this information to each nice child entry.</p>
<p>To do this we can use a <code>map</code> function. A map takes each element of a list and applies a function to it, and returns the transformed item in a new list.</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">%locations</span> <span class="operator">=</span> <span class="structure">(</span><br /> <span class="single">'Wrigley Field'</span> <span class="operator">=></span> <span class="structure">[</span><span class="float">41.94757</span><span class="operator">,</span> <span class="float">-87.6562</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="single">'The Getty'</span> <span class="operator">=></span> <span class="structure">[</span><span class="float">34.07905</span><span class="operator">,</span> <span class="float">-118.4744</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="single">'Austin'</span> <span class="operator">=></span> <span class="structure">[</span><span class="float">30.26759</span><span class="operator">,</span> <span class="float">-97.74299</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="single">'Honolulu'</span> <span class="operator">=></span> <span class="structure">[</span><span class="float">21.30485</span><span class="operator">,</span> <span class="float">-157.8578</span><span class="structure">]</span><span class="operator">,</span><br /><span class="structure">);</span></code></pre>
<p>We also need to do some calculations of how much distance Santa expects to cover when he's delivering the gifts, so instead of just using the latitude and longitude as given in the database, we're going to represent these points using <a href="https://metacpan.org/module/Geo::Calc::XS">Geo::Calc::XS</a></p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Geo::Calc::XS</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">@locations</span> <span class="operator">=</span> <span class="word">map</span> <span class="structure">{;</span><br /> <span class="keyword">my</span> <span class="symbol">$loc</span> <span class="operator">=</span> <span class="symbol">$locations</span><span class="structure">{</span><span class="magic">$_</span><span class="structure">};</span><br /> <span class="word">Geo::Calc::XS</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">lat</span> <span class="operator">=></span> <span class="symbol">$loc</span><span class="operator">-></span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="word">lon</span> <span class="operator">=></span> <span class="symbol">$loc</span><span class="operator">-></span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="word">units</span> <span class="operator">=></span> <span class="single">'mi'</span><br /> <span class="structure">)</span><br /><span class="structure">}</span> <span class="symbol">@nice</span><span class="structure">;</span></code></pre>
<h3 id="Using-reduce">Using reduce</h3>
<p>Now we can compute the total distance traveled between each location using a reduction. This is a list operation that performs a function on each list element and adds its results into an accumulator variable. When the list is exhausted, the function returns the accumulator's value.</p>
<p>In Perl, <code>reduce</code> is found in <a href="https://metacpan.org/module/List::Util">List::Util</a> and has been in core since 5.7.3, so it should (almost) always be available, the same as <code>map</code> and <code>grep</code>.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">List::Util</span> <span class="words">qw(reduce)</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$miles_traveled</span> <span class="operator">=</span> <span class="word">reduce</span> <span class="structure">{;</span><br /> <span class="symbol">$a</span><span class="operator">-></span><span class="word">distance_to</span><span class="structure">(</span> <span class="symbol">$b</span> <span class="structure">)</span><br /><span class="structure">}</span> <span class="symbol">@deliveries</span><span class="structure">;</span><br /><br /><span class="word">say</span> <span class="symbol">$miles_traveled</span><span class="structure">;</span></code></pre>
<p>I hope Santa gets plenty of cocoa and cookies for that long trip.</p>
<h3 id="Pipelines">Pipelines</h3>
<p>A really powerful technique is to build a pipeline of successive map and grep operations.</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$miles_traveled</span> <span class="operator">=</span> <span class="word">reduce</span> <span class="structure">{;</span><br /> <span class="symbol">$a</span><span class="operator">-></span><span class="word">distance_to</span><span class="structure">(</span> <span class="symbol">$b</span> <span class="structure">)</span><br /><span class="structure">}</span> <span class="word">map</span> <span class="structure">{;</span><br /> <span class="keyword">my</span> <span class="symbol">$loc</span> <span class="operator">=</span> <span class="symbol">$locations</span><span class="structure">{</span><span class="magic">$_</span><span class="structure">};</span><br /> <span class="word">Geo::Calc::XS</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">lat</span> <span class="operator">=></span> <span class="symbol">$loc</span><span class="operator">-></span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="word">lon</span> <span class="operator">=></span> <span class="symbol">$loc</span><span class="operator">-></span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">,</span><br /> <span class="word">units</span> <span class="operator">=></span> <span class="single">'mi'</span><br /> <span class="structure">)</span><br /><span class="structure">}</span> <span class="word">grep</span> <span class="structure">{;</span><br /> <span class="magic">$_</span><span class="operator">-></span><span class="structure">{</span><span class="word">nice</span><span class="structure">}</span><br /><span class="structure">}</span> <span class="cast">@</span><span class="structure">{</span> <span class="symbol">$csv</span><span class="operator">-></span><span class="word">getline_hr_all</span><span class="structure">(</span><span class="symbol">$fh</span><span class="structure">)</span> <span class="structure">};</span></code></pre>
<p>This is a compact and expressive way to signal your intention of how to modify data which comes in a list form. Some areas where this is especially useful Perl programmers are likely to encounter in everyday use are DBI results and JSON input and/or transformation.</p>
<p>I hope you enjoyed this exploration of <code>grep</code>, <code>map</code>, and <code>reduce</code>. I find them tremendously useful and I use them frequently instead of foreach or while or other such loops.</p>
<h3 id="SEE-ALSO">SEE ALSO</h3>
<ul>
<li><p><code>perldoc -f grep</code></p>
</li>
<li><p><code>perldoc -f map</code></p>
</li>
<li><p><a href="https://metacpan.org/module/List::Util">List::Util</a></p>
</li>
<li><p><a href="http://hop.perl.plover.com/book/">Higher Order Perl</a></p>
</li>
<li><p><a href="https://gist.github.com/mrallen1/f95cfc7ae9fc89020e574fc979faad05">Complete code listing and cpanfile</a></p>
</li>
</ul>
</div>2016-12-15T00:00:00ZMark AllenUsing PPI for static analysishttp://perladvent.org/2016/2016-12-14.html<div class='pod'><p>This article is based on a talk I presented this year in Orlando at The Perl Conference 2016 (formerly known as YAPC::NA 2016).</p>
<p>The talk itself was oriented on how and why to find dead code in a program. It's very common that a program loads many many modules where actually in its life cycle only a few of them are either used or partially used.</p>
<p>All of this 'unused' code could impact memory and performance of your program. If your program runs for a long time as a daemon then it can use up more memory than it should, and if your program runs for a short period of time loading all this extra code can slow it down significantly.</p>
<p>Finding 'dead code' by static analysis is vowed to failure and comes with its own caveats due to the dynamic nature of perl... which can create/call functions at run time... so you cannot know for sure that a specific function will never be called.</p>
<p>Rather than being a duplicate of the talk which you can find online (view links at the end of the article). I'm going to show some PPI usages after introducing the basics of INC.</p>
<h3 id="INC-Basics">INC Basics</h3>
<p>Perl uses two versions of INC: @INC as an array and %INC as a hash. As for any other variables sharing the same name but using a different 'type' these two are different.</p>
<ul>
<li><p><code>@INC</code>: is an array which contains path where to look for new modules</p>
</li>
<li><p><code>%INC</code>: is a hash which saved the location of each already loaded module</p>
</li>
</ul>
<h4 id="Sample-usage-of-INC-and-INC">Sample usage of @INC and %INC</h4>
<p>Note that you can check at anytime the default value of <code>@INC</code>, by running <code>perl -V</code> command. This is depending on your environment variables as the custom <code>PERL5LIB</code> path are taken into account.</p>
<pre><code> > echo $PERL5LIB
/Users/nicolas/.dotfiles/perl-must-have/lib:/Users/nicolas/perl5/lib/perl5/
> perl -V
...
@INC:
/Users/nicolas/.dotfiles/perl-must-have/lib
/Users/nicolas/perl5/lib/perl5/
/Users/nicolas/perl5/perlbrew/perls/perl-5.22.1/lib/site_perl/5.22.1/darwin-2level
/Users/nicolas/perl5/perlbrew/perls/perl-5.22.1/lib/site_perl/5.22.1
/Users/nicolas/perl5/perlbrew/perls/perl-5.22.1/lib/5.22.1/darwin-2level
/Users/nicolas/perl5/perlbrew/perls/perl-5.22.1/lib/5.22.1
.</code></pre>
<p>You can also display the content of <code>@INC</code> and <code>%INC</code> directly from your program</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><span class="keyword">use</span> <span class="version">v5.022</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Simple::Accessor</span><span class="structure">;</span><br /><br /><span class="word">say</span> <span class="literal">q{# @INC:}</span><span class="structure">;</span><br /><span class="word">say</span> <span class="word">foreach</span> <span class="symbol">@INC</span><span class="structure">;</span><br /><br /><span class="word">say</span> <span class="interpolate">qq{\n}</span><span class="operator">,</span> <span class="literal">q{# %INC:}</span><span class="structure">;</span><br /><span class="keyword">foreach</span> <span class="structure">(</span> <span class="word">sort</span> <span class="word">keys</span> <span class="symbol">%INC</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">say</span> <span class="magic">$_</span><span class="operator">,</span> <span class="single">' => '</span><span class="operator">,</span> <span class="symbol">$INC</span><span class="structure">{</span><span class="magic">$_</span><span class="structure">};</span><br /><span class="structure">}</span></code></pre>
<p>When run on my local system the output looks like this, it should be similar on yours.</p>
<pre><code class="code-listing"><span class="comment"># @INC:<br /></span><span class="match">/Users/nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="word">site_perl</span><span class="operator">/</span><span class="version">5.22.1</span><span class="operator">/</span><span class="word">darwin</span><span class="number">-2</span><span class="word">level</span><br /><span class="operator">/</span><span class="word">Users</span><span class="operator">/</span><span class="word">nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="word">site_perl</span><span class="operator">/</span><span class="version">5.22.1</span><br /><span class="operator">/</span><span class="word">Users</span><span class="operator">/</span><span class="word">nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="version">5.22.1</span><span class="operator">/</span><span class="word">darwin</span><span class="number">-2</span><span class="word">level</span><br /><span class="operator">/</span><span class="word">Users</span><span class="operator">/</span><span class="word">nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="version">5.22.1</span><br /><span class="operator">.</span><br /><br /><span class="comment"># %INC:<br /></span><span class="word">Simple</span><span class="operator">/</span><span class="word">Accessor</span><span class="operator">.</span><span class="word">pm</span> <span class="operator">=></span> <span class="match">/Users/nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="word">site_perl</span><span class="operator">/</span><span class="version">5.22.1</span><span class="operator">/</span><span class="word">Simple</span><span class="operator">/</span><span class="word">Accessor</span><span class="operator">.</span><span class="word">pm</span><br /><span class="word">strict</span><span class="operator">.</span><span class="word">pm</span> <span class="operator">=></span> <span class="match">/Users/nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="version">5.22.1</span><span class="operator">/</span><span class="word">strict</span><span class="operator">.</span><span class="word">pm</span><br /><span class="word">warnings</span><span class="operator">.</span><span class="word">pm</span> <span class="operator">=></span> <span class="match">/Users/nicolas</span><span class="operator">/</span><span class="word">perl5</span><span class="operator">/</span><span class="word">perlbrew</span><span class="operator">/</span><span class="word">perls</span><span class="operator">/</span><span class="word">perl</span><span class="version">-5.22.1</span><span class="operator">/</span><span class="word">lib</span><span class="operator">/</span><span class="version">5.22.1</span><span class="operator">/</span><span class="word">warnings</span><span class="operator">.</span><span class="word">pm</span></code></pre>
<p>Note that the keys of %INC are not using the module name as <code>Foo::Bar</code> but the short path version as <code>Foo/Bar.pm</code> (even on operating systems like Windows that do not use this path specification normally). Remember that a single file can provide multiple packages.</p>
<h4 id="Messing-around-with-INC-and-INC">Messing around with <code>@INC</code> and <code>%INC</code></h4>
<p>Both <code>@INC</code> and <code>%INC</code> are read/write... which means you can cheat and lie to perl at run time by customizing them.</p>
<dl>
<dt>Customizing @INC</dt>
<dd>
<p>By tweaking <code>@INC</code> you can change the behavior of a program by forcing it to load modules from a different path... or even do not look in a generic path at all.</p>
<p>The most common use case is to add one known path to it as in this sample code where a custom path is added in first position of <code>@INC</code>.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">BEGIN</span> <span class="structure">{</span> <span class="word">unshift</span> <span class="symbol">@INC</span><span class="operator">,</span> <span class="double">"/search/in/this/path/first"</span> <span class="structure">}</span><br /><br /><span class="keyword">use</span> <span class="word">My::Module</span> <span class="structure">();</span></code></pre>
<p>We need to wrap the modification to <code>@INC</code> in a <code>BEGIN { ... }</code> block to ensure it happens at <i>compile time</i>. This is the time where perl first scans over the source code and loads all the modules - and if we're changing the contents of <code>@INC</code> we need to use <code>BEGIN</code> to ensure it happens during compile time before perl processes any <code>use</code> statements that might be effected by it.</p>
<p>Adding things to <code>@INC</code> is common enough in Perl that there's the familiar syntactic sugar for it:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="pragma">lib</span> <span class="words">qw(/search/in/this/path/first)</span><span class="structure">;</span></code></pre>
</dd>
<dt>Customizing %INC</dt>
<dd>
<p>Updating %INC is less common but also provides some great value - only use it with care - but this is an easy way to avoid loading a useless (not used during the life of your program) module.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">BEGIN</span> <span class="structure">{</span> <span class="symbol">$INC</span><span class="structure">{</span><span class="double">"Useless/Module.pm"</span><span class="structure">}</span> <span class="operator">=</span> <span class="single">'__FAKE__'</span><span class="structure">;</span> <span class="structure">}</span><br /><br /><span class="comment"># this use or require will not try to load the module<br /></span><br /><span class="keyword">use</span> <span class="word">Useless::Module</span><span class="structure">;</span><br /><span class="word">require</span> <span class="word">Useless::Module</span><span class="structure">;</span></code></pre>
<p>Because perl now has a value in <code>%INC</code> for <code>Useless::Module</code> it won't try and actually load it whenever it finds a <code>use Useless::Module</code>, either directly in the script itself or indirectly any of the modules loaded by the script.</p>
</dd>
</dl>
<h3 id="Listing-dependencies-of-a-script">Listing dependencies of a script</h3>
<p>We now know that modules loaded by a script are advertised in %INC. If we could read %INC just before the program starts, this could gives a picture of all required modules at this time. (note that some module might be lazy loaded later...)</p>
<p>This is exactly what we can achieve using a CHECK block.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Devel::ListDeps</span><span class="structure">;</span><br /><br /><span class="keyword">CHECK</span> <span class="structure">{</span><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$module</span> <span class="structure">(</span> <span class="word">sort</span> <span class="word">keys</span> <span class="symbol">%INC</span> <span class="structure">)</span> <span class="structure">{</span><br /><span class="comment"> # exclude ourself: could also use __PACKAGE__<br /></span> <span class="word">next</span> <span class="word">if</span> <span class="symbol">$module</span> <span class="operator">eq</span> <span class="single">'Devel/ListDeps.pm'</span><span class="structure">;</span><br /> <span class="word">print</span> <span class="interpolate">qq{$module\n}</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>Let's now use it on a very simple program which is just using strict and warnings.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="word">print</span> <span class="interpolate">qq{hello World\n}</span><span class="structure">;</span> <span class="comment"># or say</span></code></pre>
<p>We are here using <code>-d</code> to load our Devel package (if the module is not in your <code>@INC</code> you might need to use <code>-I</code> to modify <code>@INC</code> from the command line so Perl knows where to load it from) )</p>
<pre><code> > perl -c -d:ListDeps samples/hello-world.pl
strict.pm
warnings.pm
samples/hello-world.pl syntax OK</code></pre>
<p>As we can see we can get the list of dependencies from a script without altering it. Also note that the program was not executed as we just asked perl to run with <code>-c</code> to stop executing after the CHECK block. <code>Hello World</code> is not printed.</p>
<p>We can also check that this is listing recursive dependencies by using something little more complex.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Test::More</span><span class="structure">;</span><br /><br /><span class="word">print</span> <span class="interpolate">qq{hello World\n}</span><span class="structure">;</span> <span class="comment"># or say</span></code></pre>
<p>We can see that Test::More is listed as part of dependencies but other modules coming from Test::More are also now loaded.</p>
<pre><code> > perl -Ilib -c -d:ListDeps samples/hello-world.pl
Config.pm
Exporter.pm
Exporter/Heavy.pm
PerlIO.pm
Test/Builder.pm
Test/Builder/Module.pm
Test/More.pm
strict.pm
vars.pm
warnings.pm
warnings/register.pm
samples/hello-world.pl syntax OK</code></pre>
<p>As shown in the talk we could then wrap this in a script that could remove the last line and convert the short path to a package namespace to be more human friendly.</p>
<p>Then we could use it this way:</p>
<pre><code> > ./perl-dependencies samples/hello-world.pl
Config
Exporter
Exporter::Heavy
PerlIO
Test::Builder
Test::Builder::Module
Test::More
strict
vars
warnings
warnings::register</code></pre>
<h3 id="Memory-profiling">Memory profiling</h3>
<p>Now that we are able to list all dependencies from a script. It would be nice to know the memory required by each module.</p>
<p>For this purpose I'm going to use a one liner, which mainly works on Linux system. This needs some adjustment on macOS or other operating systems.</p>
<pre><code> > perl -e 'print qx{grep VmRSS /proc/$$/status}'
VmRSS: 1836 kB</code></pre>
<p>On Linux systems this metric is globally stable</p>
<pre><code> > for i in $(seq 1 4); do perl -e 'print qx{grep VmRSS /proc/$$/status}'; done
VmRSS: 1836 kB
VmRSS: 1836 kB
VmRSS: 1832 kB
VmRSS: 1836 kB</code></pre>
<h4 id="Tracking-memory-usage-for-modules">Tracking memory usage for modules</h4>
<p>We can now track memory usage for each individual module</p>
<pre><code> > perl -MCarp -e 'print qx{grep VmRSS /proc/$$/status}'
VmRSS: 2872 kB
> perl -MData::Dumper -e 'print qx{grep VmRSS /proc/$$/status}'
VmRSS: 3728 kB
> perl -MMoose -e 'print qx{grep VmRSS /proc/$$/status}'
VmRSS: 16596 kB</code></pre>
<p>We can also check the memory used by more than a single module using the same oneliner.</p>
<pre><code> > perl -MCarp -MData::Dumper -e 'print qx{grep VmRSS /proc/$$/status}'
VmRSS: 3752 kB</code></pre>
<p>When loading Carp as well as Data::Dumper the total memory footprint stays same as when using Data::Dumper by itself. Why? The reason is that Data::Dumper itself is also using Carp</p>
<pre><code> > perl -MData::Dumper -E 'say $INC{"Carp.pm"}'
/usr/local/cpanel/3rdparty/perl/524/lib64/perl5/5.24.1/Carp.pm</code></pre>
<p>Similarly adding Moose to Data::Dumper, we can notice that the memory used is higher than Moose by itself but still below the sum</p>
<pre><code> > perl -MData::Dumper -MMoose -e 'print qx{grep VmRSS /proc/$$/status}'
VmRSS: 17132 kB</code></pre>
<p>We cannot do the simple math addition to know the memory used by multiple modules.</p>
<pre><code> Memory(Moose) < Memory( Moose & Data::Dumper ) < Memory(Moose) + Memory(Data::Dumper)</code></pre>
<p>Perl itself as its own startup memory cost, and modules could also used shared dependencies so we cannot count them more than once... as once a module is loaded, it's in and can be used at no additional cost by any other module.</p>
<h4 id="Tracking-memory-increase">Tracking memory increase</h4>
<p>This leads to an idea .. what about running the same oneliner each time a module is loaded? Is it possible? Yes we can use a custom debugger function which would be able to track all calls.</p>
<p>We could use either a change of state of <code>%INC</code> or also caller to know which file was just loaded. This script is pretty long and still need some tweaks but this is the main idea:</p>
<p>This is the current version</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Devel::ListDepsDetails</span><span class="structure">;</span><br /><br /><span class="keyword">BEGIN</span> <span class="structure">{</span><br /> <span class="keyword">sub</span> <span class="word">get_memory</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$m</span><span class="structure">;</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="operator">-e</span> <span class="literal">q{/proc}</span> <span class="structure">)</span> <span class="structure">{</span> <span class="comment"># unix</span><br /> <span class="symbol">$m</span> <span class="operator">=</span> <span class="command">qx{grep VmRSS /proc/$$/status}</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="keyword">else</span> <span class="structure">{</span> <span class="comment"># macOS (not consistent)</span><br /> <span class="symbol">$m</span> <span class="operator">=</span> <span class="command">qx{ps -o rss -p $$ | tail -1}</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="keyword">return</span> <span class="word">int</span> <span class="symbol">$m</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@inc</span> <span class="operator">=</span> <span class="word">sort</span> <span class="structure">{</span> <span class="word">length</span> <span class="symbol">$b</span> <span class="operator"><=></span> <span class="word">length</span> <span class="symbol">$a</span> <span class="operator">or</span> <span class="symbol">$a</span> <span class="operator">cmp</span> <span class="symbol">$b</span> <span class="structure">}</span> <span class="symbol">@INC</span><span class="structure">;</span><br /> <span class="keyword">sub</span> <span class="word">short</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$s</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$in</span> <span class="structure">(</span> <span class="symbol">@inc</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">next</span> <span class="word">unless</span> <span class="symbol">$s</span> <span class="operator">=~</span> <span class="substitute">s{^$in/?}{}</span><span class="structure">;</span><br /><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$s</span> <span class="operator">=~</span> <span class="regexp">qr{\.pm$}</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$s</span> <span class="operator">=~</span> <span class="substitute">s{\.pm$}{}</span><span class="structure">;</span><br /> <span class="symbol">$s</span> <span class="operator">=~</span> <span class="substitute">s{/+}{::}g</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="word">last</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">return</span> <span class="symbol">$s</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">my</span> <span class="symbol">%seen</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$total_mem</span> <span class="operator">=</span> <span class="number">0</span><span class="structure">;</span><br /> <span class="keyword">sub</span> <span class="word">DB::DB</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$package</span><span class="operator">,</span> <span class="symbol">$file</span><span class="operator">,</span> <span class="symbol">$line</span> <span class="structure">)</span> <span class="operator">=</span> <span class="word">caller</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="symbol">$file</span> <span class="operator">eq</span> <span class="single">'-e'</span> <span class="operator">||</span> <span class="symbol">$file</span> <span class="operator">eq</span> <span class="single">'-E'</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="symbol">$file</span> <span class="operator">=~</span> <span class="regexp">qr{^\(eval}</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="symbol">$seen</span><span class="structure">{</span><span class="symbol">$file</span><span class="structure">}</span><span class="operator">++</span><span class="structure">;</span><br /><br /> <span class="symbol">$file</span> <span class="operator">||=</span> <span class="single">''</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$mem</span> <span class="operator">=</span> <span class="word">get_memory</span><span class="structure">();</span><br /> <span class="keyword">my</span> <span class="symbol">$delta</span> <span class="operator">=</span> <span class="symbol">$mem</span> <span class="operator">-</span> <span class="symbol">$total_mem</span><span class="structure">;</span><br /> <span class="symbol">$total_mem</span> <span class="operator">=</span> <span class="symbol">$mem</span><span class="structure">;</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="word">keys</span> <span class="symbol">%seen</span> <span class="operator">==</span> <span class="number">1</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">print</span> <span class="double">"# [delta => total RSS in kB] module name (or eval)\n"</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /><span class="comment"> # try to guess where it comes from (manual longmess :-)<br /></span> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$frompkg</span><span class="operator">,</span> <span class="symbol">$fromfile</span><span class="operator">,</span> <span class="symbol">$fromline</span> <span class="structure">)</span> <span class="operator">=</span> <span class="word">caller</span><span class="structure">();</span><br /> <span class="keyword">my</span> <span class="symbol">$max</span> <span class="operator">=</span> <span class="number">1_000</span><span class="structure">;</span><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$level</span> <span class="structure">(</span> <span class="number">0</span> <span class="operator">..</span> <span class="symbol">$max</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$package</span><span class="operator">,</span> <span class="symbol">$filename</span><span class="operator">,</span> <span class="symbol">$line</span> <span class="structure">)</span> <span class="operator">=</span> <span class="word">caller</span><span class="structure">(</span><span class="symbol">$level</span><span class="structure">);</span><br /> <span class="word">last</span> <span class="word">unless</span> <span class="core">defined</span> <span class="symbol">$filename</span><span class="structure">;</span><br /><br /><span class="comment"> # when the filename differs, we know where it comes from<br /></span> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$fromfile</span> <span class="operator">ne</span> <span class="symbol">$filename</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="structure">(</span> <span class="symbol">$frompkg</span><span class="operator">,</span> <span class="symbol">$fromfile</span><span class="operator">,</span> <span class="symbol">$fromline</span> <span class="structure">)</span><br /> <span class="operator">=</span> <span class="structure">(</span> <span class="symbol">$package</span><span class="operator">,</span> <span class="symbol">$filename</span><span class="operator">,</span> <span class="symbol">$line</span> <span class="structure">);</span><br /> <span class="word">last</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$level</span> <span class="operator">==</span> <span class="symbol">$max</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="structure">(</span> <span class="symbol">$frompkg</span><span class="operator">,</span> <span class="symbol">$fromfile</span><span class="operator">,</span> <span class="symbol">$fromline</span> <span class="structure">)</span><br /> <span class="operator">=</span> <span class="structure">(</span> <span class="single">'????'</span><span class="operator">,</span> <span class="single">'????'</span><span class="operator">,</span> <span class="single">'?'</span> <span class="structure">);</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /><br /> <span class="word">print</span> <span class="word">sprintf</span><span class="structure">(</span><br /> <span class="double">"[%5s => %8d] %-50s from %-30s at line %d\n"</span><span class="operator">,</span><br /> <span class="structure">(</span> <span class="symbol">$delta</span> <span class="operator">></span> <span class="number">0</span> <span class="operator">?</span> <span class="single">'+'</span> <span class="operator">:</span> <span class="single">''</span> <span class="structure">)</span> <span class="operator">.</span> <span class="symbol">$delta</span><span class="operator">,</span><br /> <span class="symbol">$mem</span><span class="operator">,</span><br /> <span class="structure">(</span> <span class="word">short</span><span class="structure">(</span><span class="symbol">$file</span><span class="structure">)</span> <span class="operator">||</span> <span class="single">'undef'</span> <span class="structure">)</span><span class="operator">,</span><br /> <span class="word">short</span><span class="structure">(</span><span class="symbol">$fromfile</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="symbol">$fromline</span><br /> <span class="structure">);</span><br /><br /> <span class="keyword">return</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /><span class="structure">}</span><br /><br /><span class="keyword">CHECK</span> <span class="structure">{</span> <span class="word">exit</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>We can then use it this way for example, either loading a script or directly on a module and each time we see a new file we check the memory and check how much it was increased.</p>
<pre><code> > perl -Ilib -d:ListDepsDetails -e 'require "./samples/use-modules.pl"'
# [delta => total RSS in kB] module name (or eval)
[+2052 => 2052] samples/use-modules.pl from e at line 1
[ +172 => 2224] strict from samples/use-modules.pl at line 3
[ +356 => 2580] warnings from samples/use-modules.pl at line 4
[ +40 => 2620] Carp from samples/use-modules.pl at line 8
[ +516 => 3136] Exporter from Carp at line 99
[ 0 => 3136] Config from samples/use-modules.pl at line 9
[ 0 => 3136] vars from Config at line 11
[ +32 => 3168] warnings::register from vars at line 7
[ +104 => 3272] Data::Dumper from samples/use-modules.pl at line 10
[ +92 => 3364] XSLoader from Data::Dumper at line 33
[ +280 => 3644] constant from Data::Dumper at line 277
[+1028 => 4672] bytes from Data::Dumper at line 754
[ +24 => 4696] overload from Data::Dumper at line 20
[ +4 => 4700] overloading from overload at line 83
[ +76 => 4776] Digest from samples/use-modules.pl at line 11
[ +28 => 4804] Encode from samples/use-modules.pl at line 12
[ +8 => 4812] Encode::Alias from Encode at line 47
[ +536 => 5348] Encode::Config from Encode at line 52
[ +84 => 5432] Encode::Encoding from Encode at line 265
[ +24 => 5456] FindBin from samples/use-modules.pl at line 13
[ 0 => 5456] Cwd from FindBin at line 83
[ +448 => 5904] File::Basename from FindBin at line 84
[ +148 => 6052] File::Spec from FindBin at line 85
[ +12 => 6064] File::Spec::Unix from File::Spec at line 22
[ +268 => 6332] MyPackage from samples/use-modules.pl at line 15
[ +4 => 6336] MultiplePackages from samples/use-modules.pl at line 16
Use some CORE modules</code></pre>
<p>From there we can do some manual checking on the top modules and see if these are really required in the context of our program or if we can replace it with a less intrusive module... Most of the time you probably only care about modules used by your own code.</p>
<h3 id="Finding-unused-subroutines-using-PPI">Finding unused subroutines using PPI</h3>
<p>Now that we know that a module can brings dependencies, and these could use some extra memory, it would be nice to be able to detect if some of them are useless.</p>
<p>After some googling, I could quickly find a solution from brian d. foy: <a href="http://blogs.perl.org/users/brian_d_foy/2012/07/finding-unused-subroutines-but-with-ppi.html">Finding Unused Subroutines</a> which uses <a href="https://metacpan.org/module/PPI">PPI</a>.</p>
<p>PPI is a perl parser which tokenizes a source code to convert it to a list of 'tokens' making any code manipulation easier than playing with regexp and other complex methods. After analyzing the document tree, you can add/remove/update some tokens (like for example comments, pods...), then later safely render it as a source code using the Lexer.</p>
<p>You can read more about PPI from its perldoc itself, where you could learn why it's called what it is: The two meanings for PPI are "Parse::Perl::Isolated" but also 'I Parse Perl' (if you read it from right to left.)</p>
<p>Here is the main idea behind this PPI analysis:</p>
<p><ol> <li>Parse a script using PPI <li>get the list of all defined functions <li>get the list of all function used <ol> <li>can be a reference to the function: \&foo <li>direct call to function: foo() <li>find functions used as bareword: foo </ol> <li>do the diff between the two list to guess the unused functions </ol>
</p>
<p>brian d foy's solution is very smart, and works pretty well on small scripts.</p>
<p>Its main limitation comes from the fact that the analyze is only performed in the scope of your program. This can be solved by doing a fatpacking of your script. The second problem is that it should loop after removing functions as this could result in some extra optimizations. One way to do this is to perform the removal using PPI then perform a new analysis.</p>
<p>For example in this case, <code>c</code> is the function called which requires only <code>d</code> The algorithm described above will detect that <code>b</code> as a function is declared and never used but will not be able to do the same for <code>a</code> as it's used inside b.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">a</span> <span class="structure">{}</span><br /><span class="keyword">sub</span> <span class="word">b</span> <span class="structure">{</span> <span class="word">a</span><span class="structure">()</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">c</span> <span class="structure">{</span> <span class="word">d</span><span class="structure">()</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">d</span> <span class="structure">{</span> <span class="number">1</span> <span class="structure">}</span><br /><br /><span class="word">c</span><span class="structure">();</span></code></pre>
<p>We could simply fix this by making iterative changes to the program, each time modifying it until we cannot remove any extra function or we reach a stable state, where nothing new needs to be removed.</p>
<p>Each iteration we re-perform an analysis on the updated document. This can be performed without writing the updated file to disk, but this allows an easier way to debug to write it each time. Since we write it out we can also add an extra check like a <code>perl -c</code> between each iteration to be sure we are not accidentally doing something bad.</p>
<pre><code class="code-listing"><span class="comment">#!perl<br /></span><br /><span class="comment"># this is a pseudo code<br /></span><br /><span class="keyword">my</span> <span class="symbol">$doc</span> <span class="operator">=</span> <span class="word">PPI::Document</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="symbol">$script</span> <span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">$step</span><span class="structure">;</span><br /><span class="keyword">while</span> <span class="structure">(</span> <span class="number">1</span> <span class="structure">)</span> <span class="structure">{</span> <span class="comment"># analyze required</span><br /> <span class="keyword">my</span> <span class="symbol">@remove_subs</span> <span class="operator">=</span> <span class="symbol">$doc</span><span class="operator">-></span><span class="word">remove_unused_subs</span><span class="structure">();</span><br /> <span class="word">last</span> <span class="word">unless</span> <span class="word">scalar</span> <span class="symbol">@removed_subs</span><br /> <span class="operator">or</span> <span class="operator">!</span><span class="word">cmp_bag</span><span class="structure">(</span><span class="cast">\</span><span class="symbol">@removed_subs</span><span class="operator">,</span> <span class="cast">\</span><span class="symbol">@previous_state</span><span class="structure">);</span><br /> <span class="symbol">@previous_state</span> <span class="operator">=</span> <span class="symbol">@remove_subs</span><span class="structure">;</span><br /> <span class="operator">++</span><span class="symbol">$step</span><span class="structure">;</span><br /> <span class="symbol">$doc</span><span class="operator">-></span><span class="word">update_and_write_to</span><span class="structure">(</span> <span class="double">"$script.step-$step"</span> <span class="structure">);</span><br /><span class="comment"> # reread the file<br /></span> <span class="symbol">$doc</span> <span class="operator">=</span> <span class="word">PPI::Document</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="double">"$script.step-$step"</span> <span class="structure">);</span><br /><span class="structure">}</span></code></pre>
<h4 id="basic-modulino-to-play-with-PPI">basic modulino to play with PPI</h4>
<p>Let's try to package this in Object Oriented way, and give it a try with a program that could strip comments and pods from a script</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Analyze</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">PPI</span><span class="structure">;</span><br /><span class="comment"># similar to using Moo here, but in one line with no other<br /># deps (just need lazy builders and accessors)<br /></span><span class="keyword">use</span> <span class="word">Simple::Accessor</span> <span class="words">qw{<br /> Document content subs symbols list barewords<br /> methods packages<br />}</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">_build_Document</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="word">die</span> <span class="word">unless</span> <span class="word">ref</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">content</span> <span class="operator">eq</span> <span class="single">'SCALAR'</span><br /> <span class="operator">or</span> <span class="operator">-f</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">content</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$Document</span> <span class="operator">=</span> <span class="word">PPI::Document</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">content</span> <span class="structure">);</span><br /> <span class="word">die</span> <span class="double">"Could not create PDOM!"</span> <span class="word">unless</span> <span class="word">ref</span> <span class="symbol">$Document</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="symbol">$Document</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">stringify</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">serialize</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="comment"># PPI::Token::Comment<br /></span><br /><span class="keyword">sub</span> <span class="word">remove_pods</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="word">remove_tokens</span><span class="structure">(</span><span class="single">'PPI::Token::Pod'</span><span class="structure">)</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">remove_comments</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="word">remove_tokens</span><span class="structure">(</span><span class="single">'PPI::Token::Comment'</span><span class="structure">)</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">remove_tokens</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$self</span><span class="operator">,</span> <span class="symbol">$token</span> <span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$pods</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><span class="symbol">$token</span><span class="structure">)</span> <span class="operator">||</span> <span class="structure">[];</span><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$pod</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$pods</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$pod</span><span class="operator">-></span><span class="word">delete</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">return</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="keyword">package</span> <span class="word">main</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="version">v5.014</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">File::Slurp</span> <span class="words">qw{read_file write_file}</span><span class="structure">;</span><br /><br /><span class="word">exit</span> <span class="word">run</span><span class="structure">(</span><span class="symbol">@ARGV</span><span class="structure">)</span> <span class="word">unless</span> <span class="word">caller</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">run</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$script</span> <span class="operator">=</span> <span class="core">shift</span> <span class="operator">or</span> <span class="word">die</span> <span class="double">"Missing argument script name to analyze"</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$content</span> <span class="operator">=</span> <span class="word">read_file</span><span class="structure">(</span><span class="symbol">$script</span><span class="structure">)</span> <span class="operator">or</span> <span class="word">die</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$analyze</span> <span class="operator">=</span> <span class="word">Analyze</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="word">content</span> <span class="operator">=></span> <span class="cast">\</span><span class="symbol">$content</span> <span class="structure">);</span><br /><br /> <span class="symbol">$analyze</span><span class="operator">-></span><span class="word">remove_pods</span><span class="structure">();</span><br /> <span class="symbol">$analyze</span><span class="operator">-></span><span class="word">remove_comments</span><span class="structure">();</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$updated</span> <span class="operator">=</span> <span class="double">"$script.updated"</span><span class="structure">;</span><br /> <span class="word">write_file</span><span class="structure">(</span> <span class="symbol">$updated</span><span class="operator">,</span> <span class="symbol">$analyze</span><span class="operator">-></span><span class="word">stringify</span> <span class="structure">);</span><br /> <span class="word">say</span> <span class="double">"Write updated version to '$updated"</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="number">0</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>When used like this, a new file is going to be written on disk where all comments and pod are stripped. You can test it by yourself.</p>
<pre><code> > perl strip-comments-and-pods.pl script.pl
Write updated version to 'script.pl.updated</code></pre>
<h4 id="PPI-how-to-get-the-list-of-functions-defined">PPI how to get the list of functions defined</h4>
<p>A function is described in PPI by a 'PPI::Statement::Sub', we should exclude from this list the reserved one like BEGIN, CHECK...</p>
<p>Note: this code is extending the previous packages described above.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Analyze</span><span class="structure">;</span><br /><br /><span class="operator">...</span><br /><span class="comment"># Get all of the subroutine definitions<br /></span><span class="keyword">sub</span> <span class="word">_build_subs</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">%subs</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$all_ppi_subs</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><br /> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Statement::Sub'</span><span class="structure">)</span><br /><span class="comment"> # not a BEGIN, CHECK, UNITCHECK, INIT and END,<br /></span> <span class="operator">&&</span> <span class="operator">!</span><span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">reserved</span><span class="structure">()</span><br /> <span class="structure">}</span><br /> <span class="structure">)</span> <span class="operator">||</span> <span class="structure">[];</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$sub</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$all_ppi_subs</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$name</span> <span class="operator">=</span> <span class="symbol">$sub</span><span class="operator">-></span><span class="word">name</span><span class="structure">;</span><br /> <span class="symbol">$subs</span><span class="structure">{</span><span class="symbol">$name</span><span class="structure">}</span> <span class="operator">=</span> <span class="symbol">$sub</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="word">debug</span> <span class="double">"* All sub definitions: "</span><span class="operator">,</span> <span class="word">sort</span> <span class="word">keys</span> <span class="symbol">%subs</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">%subs</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>This is good but the problem here is that these two functions will have the same name...</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Foo</span><span class="structure">;</span> <span class="keyword">sub</span> <span class="word">hello</span> <span class="structure">{</span> <span class="structure">}</span><br /><span class="keyword">package</span> <span class="word">Bar</span><span class="structure">;</span> <span class="keyword">sub</span> <span class="word">hello</span> <span class="structure">{</span> <span class="structure">}</span></code></pre>
<p>So we would like to know in which package the function is defined, then use its fullname Foo::hello, Bar::hello or main::hello.</p>
<h4 id="PPI-getting-the-package-name-of-a-function">PPI getting the package name of a function</h4>
<p>We need a way to get the package name from a PPI element</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$namespace</span> <span class="operator">=</span> <span class="word">eval</span> <span class="structure">{</span><br /> <span class="symbol">$elt</span><span class="operator">-></span><span class="word">parent</span><br /> <span class="operator">-></span><span class="word">find_first</span><span class="structure">(</span><span class="single">'PPI::Statement::Package'</span><span class="structure">)</span><br /> <span class="operator">-></span><span class="word">namespace</span><br /><span class="structure">};</span></code></pre>
<p>In some cases this unfortunately does not provide the information we would expect. In order to get this to work we need to create a workaround to save the begin and end line for each packages so it will be easy to know where a function is located if we can know on which line it was defined.</p>
<p>Let's add an attribute 'packages' to the Analyze class and save the start and end line for each package. (note that a package namespace can be use more than once )</p>
<pre><code class="code-listing"><span class="comment"># do not cache the value as when removing a doc,<br /># cache needs to be cleared<br /></span><span class="keyword">sub</span> <span class="word">_build_packages</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /><span class="comment"> # use an array and not a hash as a package (like main or<br /> # any other) can be defined multiple times<br /></span> <span class="keyword">my</span> <span class="symbol">@packages</span><span class="structure">;</span><br /><br /><span class="comment"> # find return the elements sorted<br /></span> <span class="keyword">my</span> <span class="symbol">$search</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><span class="single">'PPI::Statement::Package'</span><span class="structure">)</span> <span class="operator">||</span> <span class="structure">[];</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$pkg</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$search</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="word">scalar</span> <span class="symbol">@packages</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$packages</span><span class="structure">[</span><span class="number">-1</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="single">'to'</span><span class="structure">}</span> <span class="operator">=</span> <span class="symbol">$pkg</span><span class="operator">-></span><span class="word">line_number</span> <span class="operator">-</span> <span class="number">1</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="word">push</span> <span class="symbol">@packages</span><span class="operator">,</span> <span class="structure">{</span><br /> <span class="word">name</span> <span class="operator">=></span> <span class="symbol">$pkg</span><span class="operator">-></span><span class="word">namespace</span><span class="operator">,</span><br /> <span class="word">from</span> <span class="operator">=></span> <span class="symbol">$pkg</span><span class="operator">-></span><span class="word">line_number</span><span class="operator">,</span><br /> <span class="word">to</span> <span class="operator">=></span> <span class="number">0</span><span class="operator">,</span><br /> <span class="word">file_scoped</span> <span class="operator">=></span> <span class="symbol">$pkg</span><span class="operator">-></span><span class="word">file_scoped</span><span class="operator">,</span><br /> <span class="structure">};</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">@packages</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>Now that we have that information it becomes easy to know in which namespace the function was declared. Rather than using <code>$sub->name</code> we could use <code>get_package_for($sub)</code> with the following</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">get_package_for</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$self</span><span class="operator">,</span> <span class="symbol">$elt</span> <span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$line</span> <span class="operator">=</span> <span class="symbol">$elt</span><span class="operator">-></span><span class="word">line_number</span><span class="structure">;</span><br /><br /><span class="comment"> # coming from previous function _build_packages<br /></span> <span class="keyword">my</span> <span class="symbol">$all_packages</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">packages</span><span class="structure">;</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$v</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$all_packages</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$pkg</span> <span class="operator">=</span> <span class="symbol">$v</span><span class="operator">-></span><span class="structure">{</span><span class="word">name</span><span class="structure">};</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$v</span><span class="operator">-></span><span class="structure">{</span><span class="word">from</span><span class="structure">}</span> <span class="operator"><</span> <span class="symbol">$line</span><br /> <span class="operator">&&</span> <span class="structure">(</span> <span class="symbol">$v</span><span class="operator">-></span><span class="structure">{</span><span class="word">to</span><span class="structure">}</span> <span class="operator">==</span> <span class="number">0</span> <span class="operator">||</span> <span class="symbol">$line</span> <span class="operator"><=</span> <span class="symbol">$v</span><span class="operator">-></span><span class="structure">{</span><span class="word">to</span><span class="structure">}</span> <span class="structure">)</span><br /> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="symbol">$pkg</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">return</span> <span class="single">'main'</span><span class="structure">;</span> <span class="comment"># default</span><br /><span class="structure">}</span></code></pre>
<h4 id="PPI-get-list-of-methods-used-as-function-call">PPI get list of methods used as function call</h4>
<p>If your script is using object oriented style, then you will quickly have function calls instead of method calls: <code>$object->foo()</code> rather than <code>foo()</code></p>
<p>We cannot do anything for something like <code>$object->$foo()</code>, but we can try to check all static function calls trying to find the PPI::Token::Operator <code>-></code></p>
<pre><code class="code-listing"><span class="comment"># list of methods with the scope where they might be used<br /></span><span class="keyword">sub</span> <span class="word">_build_methods</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@methods</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$search</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Token::Operator'</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">content</span> <span class="operator">eq</span> <span class="single">'->'</span><span class="structure">;</span><br /> <span class="structure">})</span> <span class="operator">||</span> <span class="structure">[];</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$op</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$search</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">next</span> <span class="word">unless</span> <span class="word">eval</span> <span class="structure">{</span><br /> <span class="symbol">$op</span><span class="operator">-></span><span class="word">snext_sibling</span><span class="operator">-></span><span class="word">class</span> <span class="operator">eq</span> <span class="single">'PPI::Token::Word'</span><br /> <span class="structure">};</span><br /><br /><span class="comment"> # maybe something special for nw ?<br /></span> <span class="word">push</span> <span class="symbol">@methods</span><span class="operator">,</span> <span class="symbol">$op</span><span class="operator">-></span><span class="word">snext_sibling</span><span class="operator">-></span><span class="word">content</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="word">debug</span> <span class="double">"* All methods: "</span><span class="operator">,</span> <span class="word">sort</span> <span class="symbol">@methods</span><span class="structure">;</span><br /><br /><span class="comment"> #note explain $all_statements;<br /></span> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">@methods</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<h4 id="PPI-detect-function-used-as-reference-or-stash">PPI detect function used as reference or stash</h4>
<p>We can get the list of symbols, functions not called with parens but with &foo, \&foo, *foo...</p>
<p>The code is not much more complex than finding defined functions.</p>
<pre><code class="code-listing"><span class="comment"># # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #<br /># find the sub calls that use &<br /># &foo<br /># &foo()<br /># \&foo<br /># *foo<br /></span><span class="keyword">sub</span> <span class="word">_build_symbols</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@symbols</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$search</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><br /> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Token::Symbol'</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="structure">(</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">symbol_type</span> <span class="operator">eq</span> <span class="single">'&'</span><br /> <span class="operator">||</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">symbol_type</span> <span class="operator">eq</span> <span class="single">'*'</span> <span class="structure">);</span><br /> <span class="structure">}</span><br /> <span class="structure">)</span> <span class="operator">||</span> <span class="structure">[];</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$elt</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$search</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$name</span> <span class="operator">=</span> <span class="symbol">$elt</span><span class="operator">-></span><span class="word">content</span> <span class="operator">=~</span> <span class="substitute">s/\A[&*]//r</span><span class="structure">;</span> <span class="comment"># /</span><br /><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$name</span> <span class="operator">!~</span> <span class="regexp">qr{::}</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$name</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">get_package_for</span><span class="structure">(</span><span class="symbol">$elt</span><span class="structure">)</span> <span class="operator">.</span> <span class="single">'::'</span> <span class="operator">.</span> <span class="symbol">$name</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="word">push</span> <span class="symbol">@symbols</span><span class="operator">,</span> <span class="symbol">$name</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="symbol">@symbols</span> <span class="operator">=</span> <span class="word">sort</span> <span class="symbol">@symbols</span><span class="structure">;</span><br /> <span class="word">debug</span> <span class="double">"* All symbols: @symbols"</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">@symbols</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<h4 id="PPI-find-the-sub-calls-that-use-parens">PPI find the sub calls that use parens</h4>
<p>Building the list of function called with some parens isn't more complex than listing all defined functions. We simply need to find a 'Token::Word' followed by an open paren.</p>
<p>The following code tries to get the fullname of the function depending if it's called as <code>foo()</code> or <code>Bar::foo()</code></p>
<pre><code class="code-listing"><span class="comment"># # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #<br /># find the sub calls that use parens<br /># foo()<br /># foo( @args )<br /></span><span class="keyword">sub</span> <span class="word">_build_list</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@list</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$search</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Token::Word'</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">snext_sibling</span><br /> <span class="operator">&&</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">snext_sibling</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Structure::List'</span><span class="structure">);</span><br /> <span class="structure">})</span> <span class="operator">||</span> <span class="structure">[];</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$elt</span> <span class="structure">(</span><span class="cast">@</span><span class="symbol">$search</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$name</span> <span class="operator">=</span> <span class="symbol">$elt</span><span class="operator">-></span><span class="word">literal</span><span class="structure">;</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$name</span> <span class="operator">!~</span> <span class="regexp">qr{::}</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$name</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">get_package_for</span><span class="structure">(</span><span class="symbol">$elt</span><span class="structure">)</span> <span class="operator">.</span> <span class="single">'::'</span> <span class="operator">.</span> <span class="symbol">$name</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="word">push</span> <span class="symbol">@list</span><span class="operator">,</span> <span class="symbol">$name</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="word">debug</span> <span class="double">"* All list: @list"</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">@list</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<h4 id="PPI-find-the-sub-calls-that-are-barewords">PPI find the sub calls that are barewords</h4>
<p>In a very similar way we can also build the list of functions used as barewords.</p>
<pre><code class="code-listing"><span class="comment"># # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #<br /># find the sub calls that are barewords<br /># foo<br /># foo + bar<br /># but not<br /># use vars qw( baz );<br /># sub quux { ... }<br /></span><span class="keyword">sub</span> <span class="word">_build_barewords</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">%reserved</span> <span class="operator">=</span> <span class="word">map</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">,</span> <span class="magic">$_</span> <span class="structure">}</span> <span class="words">qw(<br /> use vars sub my BEGIN INIT new<br /> )</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">@barewords</span> <span class="operator">=</span> <span class="word">map</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">literal</span> <span class="structure">}</span><br /> <span class="word">grep</span> <span class="structure">{</span><br /><span class="comment"> # Take out the Words that are preceded by 'sub'<br /> # That is, take out the subroutine definitions<br /> # I couldn't get this to work inside the find()<br /></span> <span class="keyword">my</span> <span class="symbol">$previous</span> <span class="operator">=</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">previous_sibling</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$sprevious</span> <span class="operator">=</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">sprevious_sibling</span><span class="structure">;</span><br /><br /> <span class="operator">!</span><span class="structure">(</span> <span class="word">blessed</span><span class="structure">(</span><span class="symbol">$previous</span><span class="structure">)</span> <span class="operator">&&</span> <span class="word">blessed</span><span class="structure">(</span><span class="symbol">$sprevious</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="symbol">$previous</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Token::Whitespace'</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="symbol">$sprevious</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Token::Word'</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="symbol">$sprevious</span><span class="operator">-></span><span class="word">literal</span> <span class="operator">eq</span> <span class="single">'sub'</span> <span class="structure">)</span><br /> <span class="structure">}</span> <span class="cast">@</span><span class="structure">{</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">Document</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span><br /> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><span class="single">'PPI::Token::Word'</span><span class="structure">)</span><br /> <span class="operator">&&</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">next_sibling</span><span class="operator">-></span><span class="word">isa</span><span class="structure">(</span><br /> <span class="words">qw(<br /> PPI::Token::Whitespace<br /> PPI::Token::Structure<br /> PPI::Token::List<br /> PPI::Token::Operator<br /> )</span><br /> <span class="structure">)</span> <span class="operator">&&</span> <span class="structure">(</span> <span class="operator">!</span><span class="word">exists</span> <span class="symbol">$reserved</span><span class="structure">{</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">1</span><span class="structure">]</span><span class="operator">-></span><span class="word">literal</span> <span class="structure">}</span> <span class="structure">);</span><br /> <span class="structure">}</span><br /> <span class="structure">)</span><br /> <span class="operator">||</span> <span class="structure">[]</span><br /> <span class="structure">};</span><br /><br /> <span class="word">debug</span> <span class="double">"* All barewords: @barewords"</span><span class="structure">;</span><br /><br /> <span class="word">push</span> <span class="symbol">@barewords</span><span class="operator">,</span> <span class="word">sort</span> <span class="word">keys</span> <span class="symbol">%reserved</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">@barewords</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<h3 id="Combining-everything-togeteher">Combining everything togeteher</h3>
<p>While we can now list used and defined functions, we still need a few extra helpers around it to make it useful.</p>
<h4 id="Get-the-list-of-used-functions">Get the list of used functions</h4>
<p>Getting the list of all used functions then become very easy, we just need to combine, symbols, list and barewords.</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">get_used_sub</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$symbols</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">symbols</span> <span class="operator">//</span> <span class="structure">[];</span><br /> <span class="keyword">my</span> <span class="symbol">$list</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">list</span> <span class="operator">//</span> <span class="structure">[];</span><br /> <span class="keyword">my</span> <span class="symbol">$barewords</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">barewords</span> <span class="operator">//</span> <span class="structure">[];</span><br /><br /> <span class="keyword">my</span> <span class="symbol">%used</span> <span class="operator">=</span> <span class="word">map</span> <span class="structure">{</span> <span class="magic">$_</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">}</span> <span class="structure">(</span> <span class="cast">@</span><span class="symbol">$symbols</span><span class="operator">,</span> <span class="cast">@</span><span class="symbol">$list</span><span class="operator">,</span> <span class="cast">@</span><span class="symbol">$barewords</span> <span class="structure">);</span><br /> <span class="word">debug</span> <span class="double">"* All used:"</span><span class="operator">,</span> <span class="word">map</span> <span class="structure">{</span> <span class="structure">(</span> <span class="single">' '</span><span class="operator">,</span> <span class="magic">$_</span> <span class="structure">)</span> <span class="structure">}</span> <span class="word">sort</span> <span class="cast">@</span><span class="structure">{</span> <span class="structure">[</span> <span class="word">keys</span> <span class="symbol">%used</span> <span class="structure">]</span> <span class="structure">};</span><br /><br /> <span class="keyword">return</span> <span class="cast">\</span><span class="symbol">%used</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>We also need an additional helper to check if a function is used as a method call</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">is_used_method</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$self</span><span class="operator">,</span> <span class="symbol">$sub</span> <span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$methods</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">methods</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="word">scalar</span> <span class="word">grep</span> <span class="structure">{</span> <span class="symbol">$sub</span> <span class="operator">=~</span> <span class="regexp">qr{::$_$}</span> <span class="structure">}</span> <span class="cast">@</span><span class="symbol">$methods</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<h4 id="Removing-unused-functions">Removing unused functions</h4>
<p>So all we need to do after creating a PPI doc using the Analyze module is ask:</p>
<ul>
<li><p>what are the defined functions</p>
</li>
<li><p>what are the used functions</p>
</li>
<li><p>remove any defined function which is not used or not used a as a method</p>
</li>
<li><p>delete the function from the PPI tree (except if it's blacklisted)</p>
</li>
</ul>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">remove_unused_subs</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$subs</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">subs</span><span class="structure">;</span> <span class="comment"># all subs</span><br /> <span class="keyword">my</span> <span class="symbol">$used</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">get_used_sub</span><span class="structure">;</span><br /><br /><span class="comment"> # # # # # # # # # # # # # # # # # # # # # # # # # # # # # #<br /> # The unused have to be the left over ones<br /> # exception for methods:<br /> # if a method is called on any object do not remove the<br /> # function (can be improve for new & co)<br /></span> <span class="keyword">my</span> <span class="symbol">@unused</span> <span class="operator">=</span> <span class="word">sort</span> <span class="word">grep</span> <span class="structure">{</span><br /> <span class="operator">!</span><span class="word">exists</span> <span class="symbol">$used</span><span class="operator">-></span><span class="structure">{</span><span class="magic">$_</span><span class="structure">}</span> <span class="operator">&&</span> <span class="operator">!</span><span class="symbol">$self</span><span class="operator">-></span><span class="word">is_used_method</span><span class="structure">(</span><span class="magic">$_</span><span class="structure">)</span><br /> <span class="structure">}</span> <span class="word">keys</span> <span class="cast">%</span><span class="symbol">$subs</span><span class="structure">;</span><br /><br /> <span class="word">debug</span> <span class="double">"* All unused: @unused"</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@removed</span><span class="structure">;</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$sub</span> <span class="structure">(</span><span class="symbol">@unused</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">next</span> <span class="word">if</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">is_blacklist_sub</span><span class="structure">(</span><span class="symbol">$sub</span><span class="structure">);</span><br /><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="operator">!</span><span class="core">defined</span> <span class="symbol">$subs</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$sub</span><span class="structure">}</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">debug</span> <span class="double">"error: sub '$sub' not defined"</span><span class="structure">;</span><br /> <span class="word">next</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="symbol">$subs</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$sub</span><span class="structure">}</span><span class="operator">-></span><span class="word">delete</span><span class="structure">;</span><br /> <span class="word">push</span> <span class="symbol">@removed</span><span class="operator">,</span> <span class="symbol">$sub</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /><span class="comment"> # return removed sub list<br /></span> <span class="keyword">return</span> <span class="symbol">@removed</span><span class="structure">;</span><br /><span class="structure">}</span> </code></pre>
<h3 id="In-Conclusion">In Conclusion</h3>
<p>We've seen in the first part that each module comes with its own memory cost. In the second part we can now remove unused functions from a script, after fatpacking it. Even if in some cases the result script works this should probably still mainly be used as advice on how to refactor your code.</p>
<p>Detecting dead code with static analysis is prone to errors due to the dynamic nature of perl. <code>$function()</code> or <code>$object->$method()...</code> As this could potentially come from the current ENV we have no accurate way to perform a safe removal.</p>
<p>This method comes with its own limits, as it's performed as a static analysis. Another very interesting approach to this problem would be to analyze what happens at run time! and see what code paths are triggered in the life (minutes, hours, weeks...) of a program. And this is exactly what is done by Gonzalo via Devel::QuickCover.</p>
<p>You can play with the scripts described here available from the github repository. They are in a quick&dirty mode, and could be improved in many ways but can still provide a first approach to play with PPI.</p>
<h2 id="SEE-ALSO">SEE ALSO</h2>
<ul>
<li><p><a href="https://metacpan.org/module/PPI">PPI</a></p>
</li>
<li><p><a href="http://blogs.perl.org/users/brian_d_foy/2012/07/finding-unused-subroutines-but-with-ppi.html">brian d foy, Finding Unused Subroutines, but with PPI</a></p>
</li>
<li><p><a href="https://www.youtube.com/watch?v=7EbLC3M5n6g">Gonzalo Diethelm, Finding dead code, the quick and easy way, with Devel::QuickCover</a></p>
</li>
<li><p><a href="https://youtu.be/AMa1JWgG4yY">@atoomic, TPC 2016 Video on YouTube</a></p>
</li>
<li><p><a href="http://bit.do/b7N79">TPC 2016 slides</a></p>
</li>
<li><p><a href="https://github.com/atoomic/yapc-na-2016">github repo</a></p>
</li>
</ul>
<h2 id="POD-ERRORS">POD ERRORS</h2>
<p>Hey! <b>The above document had some coding errors, which are explained below:</b></p>
<dl>
<dt>Around line 533:</dt>
<dd>
<p>Unterminated L<...> sequence</p>
</dd>
</dl>
</div>2016-12-14T00:00:00ZNicolas R.Getting Drunk with Mojolicious and Memoizehttp://perladvent.org/2016/2016-12-13.html<div class='pod'><p>I've always wondered what Santa Claus and the elves do after the Christmas Eve's work. Perhaps they'd go get smashed afterwards with a pub crawl finding the best egg milk punch all over the world, then perhaps spend the rest of the night recovering in a nearby hotel? If they ever do though, I think I could help them a bit, being the naughty boy and all...</p>
<p>You see, last October, a friend and I joined a <a href="http://www.booking.com/promotions/hah-manila.en-gb.html">hackathon</a> that let me play around a couple of hotel <i>and</i> microbrewery information APIs: the result is The Drunkery, a webapp that Santa (and everyone else) can use to (hopefully) get inebriated. For this, I used the awesome <a href="https://metacpan.org/module/Mojolicious">Mojolicious</a> framework for the backend API, powering a <a href="http://reactjs.org">React</a> frontend showing a Google Map with pins to hotels in a given city, with the nearest pubs or breweries. We ran out of time trying to set up paths between the pins to visualize a pub crawl, but nevertheless we won, and hopefully we can get to improve it on the next stage :D</p>
<p>In this story, let me tell you about a couple of things that we learned and used for this app:</p>
<h3 id="A-matter-of-search">A matter of search</h3>
<p>To start, writing the backend part with Mojolicious was the <i>easiest</i> part, but only so when <i>much thought</i> was put into the design of the backend's interface. I ended up with providing only <i>two</i> API endpoints for my partner's frontend:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Mojolicious::Lite</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Mojo::URL</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Drunkery::Search</span><span class="structure">;</span><br /><br /><span class="word">helper</span> <span class="word">search</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="word">state</span> <span class="symbol">$search</span> <span class="operator">=</span> <span class="word">Drunkery::Search</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="word">ua</span> <span class="operator">=></span> <span class="core">shift</span><span class="operator">-></span><span class="word">ua</span> <span class="structure">);</span><br /><span class="structure">};</span><br /><br /><span class="word">get</span> <span class="single">'/search_by_city'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="operator">...</span><span class="structure">;</span><br /><span class="structure">};</span><br /><br /><span class="word">get</span> <span class="single">'/search_by_endpoint'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="operator">...</span><span class="structure">;</span><br /><span class="structure">};</span><br /><br /><span class="word">app</span><span class="operator">-></span><span class="word">start</span><span class="structure">;</span></code></pre>
<p>These routes provide my JavaScript frontend a way to search for hotels and and breweries in either a given city, or a given geolocation endpoint, by making simple AJAX-style GET requests and receiving JSON back. Both these routes emit an array containing a city object, a list of nearby breweries, and a list of nearby hotels, and are powered via a <code>search</code> helper that uses the logic in <code>Drunkery::Search</code>.</p>
<p>Well, <i>logic</i> might not be the right word. At the time of the hackathon, the package was just as simple as this:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Drunkery::Search</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Mojo::Base</span> <span class="word">-base</span><span class="structure">;</span><br /><br /><span class="word">has</span> <span class="words">qw(ua)</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">fetch</span> <span class="structure">{</span> <span class="core">shift</span><span class="operator">-></span><span class="word">get</span><span class="structure">(</span><span class="core">shift</span><span class="structure">)</span><span class="operator">-></span><span class="word">res</span><span class="operator">-></span><span class="word">json</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>Yep! The search helper is just simply making a HTTP request to the Booking.com API, and then feeding the parsed results back unchanged to the caller.</p>
<h3 id="A-matter-of-caching">A matter of caching</h3>
<p>One of the problems I dealt with during this hackathon was the issue of making the Perl backend respond faster to the frontend. The first implementation I showed above simply had the backend fetch the hotel and brewery information every time it was requested: in short, we needed to have caching. I was loathe to set up another service like SQLite or Redis though, as I thought I didn't have enough time to wire those to the backend...</p>
<p>Enter the real star of this story, <a href="https://metacpan.org/module/Memoize">Memoize</a>!</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Drunkery::Search</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Memoize</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">normalize_url</span> <span class="structure">{</span> <span class="core">shift</span><span class="operator">-></span><span class="word">to_string</span> <span class="structure">}</span><br /><br /><span class="word">memoize</span><span class="structure">(</span> <span class="single">'fetch;, NORMALIZER => '</span><span class="word">normalize_url</span><span class="single">' );</span></code></pre>
<p>This effectively gave me caching at nearly no cost (thanks, <a href="http://hop.perl.plover.com/">Higher Order Perl</a> for reminding me!) Granted, it was imperfect (for starters, it was only an in-memory cache,) but at the time, it made sense.</p>
<p>As you might remember Memoize is a module that can replace any function with a function that does the same thing, but every time you call it remembers the result for any given parameters and if it's called again with those same parameters returns the same result without having to re-run the main body of the original function - i.e. if you call the function with the same URL, it doesn't have to go use the Booking.com API a second time!</p>
<p>The slight wrinkle is that Memoize has to be able to recognize that the arguments the function is being called with are the same - and our <code>fetch</code> function is being called with a URI object which will look to memoize to be a different object - and therefore different argument - each time it's called even if that URI object represents the same underlying URL each time. To address this we use a <code>NORMALIZER</code> function to turn our arguments - our URI object - into something that memoize can compare: We turn the URI objects into strings so that two function calls arguments for the same URL can be compared as identical.</p>
<h3 id="But-why-or-what-more-could-be-done">But why? (or, what more could be done?)</h3>
<p>I joined this hackathon on a whim, with nary an idea for what to build at all, so instead of going all-out serious, I decided to wing this the <a href="http://o-fun.github.io/">-Ofun</a> way. Indeed, it was very funny (and easy) to build a Perl backend in such a short time, with still room for improvements.</p>
<h4 id="Saving-Memoized-results">Saving Memoized results</h4>
<p>Fast forward to late November, where a succeeding Booking hackathon also had me use Memoize yet again for keeping API responses. This time, I tried to keep saved responses to disk by making use of <a href="https://metacpan.org/module/DB_File">DB_File</a>:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">DB_File</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Memoize</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">fetch</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$c</span><span class="operator">,</span> <span class="symbol">$url</span> <span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="symbol">$c</span><span class="operator">-></span><span class="word">ua</span><span class="operator">-></span><span class="word">get</span><span class="structure">(</span><span class="symbol">$url</span><span class="structure">)</span><span class="operator">-></span><span class="word">res</span><span class="operator">-></span><span class="word">body</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">normalize_url</span> <span class="structure">{</span> <span class="core">shift</span><span class="structure">;</span> <span class="core">shift</span><span class="operator">-></span><span class="word">to_string</span> <span class="structure">}</span><br /><br /><span class="word">tie</span> <span class="word">my</span> <span class="symbol">%cache</span> <span class="operator">=></span> <span class="single">'DB_File'</span><span class="operator">,</span> <span class="single">'cache'</span><span class="operator">,</span> <span class="word">O_RDWR</span> <span class="operator">|</span> <span class="word">O_CREAT</span><span class="operator">,</span> <span class="octal">0666</span><span class="structure">;</span><br /><span class="word">memoize</span><span class="structure">(</span><br /> <span class="single">'fetch'</span><span class="operator">,</span><br /> <span class="word">NORMALIZER</span> <span class="operator">=></span> <span class="single">'normalize_url'</span><span class="operator">,</span><br /> <span class="word">SCALAR_CACHE</span> <span class="operator">=></span> <span class="structure">[</span> <span class="word">HASH</span> <span class="operator">=></span> <span class="cast">\</span><span class="symbol">%cache</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="word">LIST_CACHE</span> <span class="operator">=></span> <span class="single">'FAULT'</span><br /><span class="structure">);</span></code></pre>
<p>By having <code>memoize</code> cache the results into a tied hash that saves any changes to the hash to file on disk the cache lasts longer than just the runtime of the demo application.</p>
<h2 id="SEE-ALSO">SEE ALSO</h2>
<ul>
<li><p><a href="https://metacpan.org/module/Mojolicious">Mojolicious</a></p>
</li>
<li><p><a href="https://metacpan.org/module/Memoize">Memoize</a></p>
</li>
<li><p><a href="http://reactjs.org">React</a></p>
</li>
</ul>
</div>2016-12-13T00:00:00ZZak B. ElepTrying for a Happy Christmashttp://perladvent.org/2016/2016-12-12.html<div class='pod'><p>Every year around this time Santa has some presents to deliver. And every year, try as he might, sometimes things don't go very well. There's always someone whose chimney is just too small for him to fit down, or the fire's still lit, or maybe even someone doesn't have a chimney so he'll have to come in the door like everybody else does.</p>
<p>If Santa is going to succeed at delivering as many presents as possible, he can't just stop at the first failure. He'll have to carry on past those, making as best an effort as possible.</p>
<p>In Perl, there's a number of ways we can handle failed attempts to call a function. We could use <code>eval</code>, though this has a number of non-ideal properties and can lead to code that doesn't read very well. Better is to use one of the CPAN modules that wrap this in some nicer syntax.</p>
<p>One nice module for doing this with is <a href="https://metacpan.org/module/Syntax::Keyword::Try">Syntax::Keyword::Try</a>, which provides a neat syntax similar to that used by a number of other languages, being marked by two new keywords <code>try</code> and <code>catch</code>:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Syntax::Keyword::Try</span> <span class="single">'try'</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">attempt_delivery</span><br /><span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$present</span> <span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="word">try</span> <span class="structure">{</span><br /> <span class="word">deliver_via_chimney</span><span class="structure">(</span> <span class="symbol">$present</span> <span class="structure">);</span><br /> <span class="structure">}</span><br /> <span class="word">catch</span> <span class="structure">{</span><br /> <span class="word">print</span> <span class="double">"We couldn't deliver it because $@"</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">}</span><br /><br /><span class="word">attempt_delivery</span><span class="structure">(</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">present</span> <span class="structure">)</span> <span class="word">for</span><br /> <span class="word">grep</span> <span class="structure">{</span> <span class="operator">not</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">is_naughty</span> <span class="structure">}</span> <span class="symbol">@children</span><span class="structure">;</span></code></pre>
<p>As compared to simple <code>eval</code> syntax and checking the value of <code>$@</code> afterwards, we can see this looks a lot neater. Instead of looking at the truth of <code>$@</code> (which already is a buggy antipattern), or testing the truth of the return value of <code>eval</code> itself, we simply use the <code>catch</code> keyword to provide the code to handle a failure. Because it's using the syntax plugin system, the keyword already acts like a full statement and not an expression, so no semicolon is needed at the end of it.</p>
<p>So far our error handling hasn't been very good though, because all we did was print that a failure happened. Perhaps we can do better. If Santa can't deliver the present through the chimney, he'll just have to come in the door instead.</p>
<pre><code class="code-listing"><span class="word">try</span> <span class="structure">{</span><br /> <span class="word">deliver_via_chimney</span><span class="structure">(</span> <span class="symbol">$present</span> <span class="structure">);</span><br /><span class="structure">}</span><br /><span class="word">catch</span> <span class="structure">{</span><br /> <span class="word">try</span> <span class="structure">{</span><br /> <span class="word">deliver_via_door</span><span class="structure">(</span> <span class="symbol">$present</span> <span class="structure">);</span><br /> <span class="structure">}</span><br /> <span class="word">catch</span> <span class="structure">{</span><br /> <span class="word">print</span> <span class="double">"We couldn't deliver it at all, because $@"</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">}</span></code></pre>
<p>A neater way to write this code, and more extensible in case we find even more ways to deliver presents, is to use a <code>return</code> statement inside a <code>try</code> block. This is another useful ability that <code>Syntax::Keyword::Try</code> has that regular <code>eval</code> does not.</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$failure</span><span class="structure">;</span><br /><br /><span class="word">try</span> <span class="structure">{</span><br /> <span class="word">deliver_via_chimney</span><span class="structure">();</span><br /> <span class="keyword">return</span><span class="structure">;</span><br /><span class="structure">}</span><br /><span class="word">catch</span> <span class="structure">{</span> <span class="symbol">$failure</span> <span class="operator">//=</span> <span class="magic">$@</span> <span class="structure">}</span><br /><br /><span class="word">try</span> <span class="structure">{</span><br /> <span class="word">deliver_via_door</span><span class="structure">();</span><br /> <span class="keyword">return</span><span class="structure">;</span><br /><span class="structure">}</span><br /><span class="word">catch</span> <span class="structure">{</span> <span class="symbol">$failure</span> <span class="operator">//=</span> <span class="magic">$@</span> <span class="structure">}</span><br /><br /><span class="operator">...</span><br /><br /><span class="word">print</span> <span class="double">"We couldn't deliver at all, because $failure"</span><span class="structure">;</span></code></pre>
<p>We also now have the advantage that it's now the first failure message that we print at the end. If one of the later attempts succeeds, it doesn't really matter any more what the earlier failure was.</p>
<p>Well, to a point. It's not particularly nice to ignore any possible error, because it could have been something unrelated - an unexpected type of data passed in, a missing module dependency, all sorts of things.</p>
<p>Some languages have typed exceptions, but in Perl we generally make do with string messages and testing them with regexps. If we <code>die</code> an exception from a catch block it re-throws it, effectively acting like we didn't catch it in the first place.</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">attempt_delivery</span><br /><span class="structure">{</span><br /> <span class="word">try</span> <span class="structure">{</span><br /> <span class="word">deliver_via_chimney</span><span class="structure">();</span><br /> <span class="keyword">return</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="word">catch</span> <span class="structure">{</span><br /> <span class="word">die</span> <span class="magic">$@</span> <span class="word">if</span> <span class="magic">$@</span> <span class="operator">!~</span> <span class="match">m/^Cannot fit in the chimney/</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<h3 id="Comparing-against-other-try-catch-modules">Comparing against other try / catch modules</h3>
<p>As we previously mentioned there is more than one way to do error handling on the CPAN. Let's see how Syntax::Keyword::Try holds up against the possible solutions.</p>
<h4 id="Comparing-Syntax">Comparing Syntax</h4>
<p>First let's look at an example written with each of the techniques. We've got a subroutine that refills the sack Santa's carrying with presents from the main store on the sleigh. We want our subroutine to return immediately if there's already more than ten things in Santa's sack, and it's possible that counting the items might throw an exception if Santa's not using a sack this trip (sometimes he just throws a bike over his shoulder.) We want to be careful to re-throw any error that isn't to do with Santa not using a sack.</p>
<dl>
<dt><b>eval</b></dt>
<dd>
<p>The inbuilt eval syntax is the basis for exception handling, and being a native keyword is very quick. However, it can't return from the subroutine from within the eval block. The eval-as-statement-not-as-block instead of a try block and no dedicated catch syntax make it confusing. Worse still, you have to check the return value of the eval statement rather than checking $@ if you want to avoid potential bugs (on some versions of perl <code>$@</code> can be accidentally unset during custom object destruction which might happen between the time the error is thrown and you check for it.)</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">refill_sack</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$should_return</span><span class="structure">;</span><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="word">eval</span> <span class="structure">{</span><br /> <span class="symbol">$should_return</span> <span class="operator">=</span> <span class="word">sack_item_count</span><span class="structure">()</span> <span class="operator">></span> <span class="number">10</span><span class="structure">;</span><br /> <span class="number">1</span><span class="structure">;</span><br /> <span class="structure">})</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="symbol">$should_return</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="magic">$@</span> <span class="operator">=~</span> <span class="match">/not using sack this trip/</span><span class="structure">;</span><br /> <span class="word">die</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>Writing it properly is verbose and error prone.</p>
</dd>
<dt><b>Try::Tiny</b></dt>
<dd>
<p>Try::Tiny is a simple pure Perl solution that fairs very well on the CPAN. It's two main drawbacks is that it's very slow compared to the other techniques described here, and that it's not possible to return from the subroutine in either the try or catch blocks (as they're just syntactic sugar for anonymous subroutines.)</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">refill_sack</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$should_return</span> <span class="operator">=</span> <span class="number">1</span><span class="structure">;</span><br /> <span class="word">try</span> <span class="structure">{</span><br /> <span class="symbol">$should_return</span> <span class="operator">=</span> <span class="word">sack_item_count</span><span class="structure">()</span> <span class="operator">></span> <span class="number">10</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="word">catch</span> <span class="structure">{</span><br /> <span class="word">die</span> <span class="magic">$_</span> <span class="word">unless</span> <span class="match">/not using sack this trip/</span><span class="structure">;</span><br /> <span class="structure">};</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="symbol">$should_return</span><span class="structure">;</span><br /><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>With its simplistic approach Try::Tiny also has a few oddities that niggle - it uses <code>$_</code> instead of <code>$@</code> (meaning you can't just use a bare <code>die</code> statement to re-throw the current error) and it requires that annoying semicolon at the end of the blocks.</p>
</dd>
<dt><b>TryCatch</b></dt>
<dd>
<p>TryCatch is a module based on Devel::Declare, which is module to subvert the Perl parser to allow new syntax. By swapping out the Perl parser with a custom parser when keywords are detected, and then calling the Perl parser back again to parse the code within the blocks, new try syntax is created.</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">refill_sack</span> <span class="structure">{</span><br /> <span class="word">try</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="word">sack_item_count</span><span class="structure">()</span> <span class="operator">></span> <span class="number">10</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="word">catch</span> <span class="structure">(</span><span class="symbol">$e</span> <span class="word">where</span> <span class="structure">{</span> <span class="match">/not using sack this trip/</span> <span class="structure">})</span> <span class="structure">{</span><br /> <span class="keyword">return</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>TryCatch is the only module listed here that allows conditional checking of the return value with explicit syntax; We don't need to explicitly have to re-throw the unhanded error, if the regular expression doesn't match it'll automatically re-thrown for us.</p>
<p>The main problem with TryCatch is that the custom parser technology that it uses isn't considered as reliable as the pluggable keyword technology provided by modern perls that Syntax::Keyword::Try makes use of.</p>
</dd>
<dt><b>Syntax::Keyword::Try</b></dt>
<dd>
<p>For completeness, here's the same example written for Syntax::Keyword::Try:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">refill_sack</span> <span class="structure">{</span><br /> <span class="word">try</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="word">sack_item_count</span><span class="structure">()</span> <span class="operator"><</span> <span class="number">10</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="word">catch</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">if</span> <span class="magic">$@</span> <span class="operator">=~</span> <span class="match">/not using sack this trip/</span><span class="structure">;</span><br /> <span class="word">die</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
</dd>
</dl>
<h4 id="Benchmarking">Benchmarking</h4>
<p><p>Some simple <a href="benchmark.pl.txt">benchmarking</a> shows that Try::Tiny is very slow, eval is fast, but the other two modules come in the same order of magnitude as eval.</p>
</p>
<p><img src="benchmark.png" width="878" height="469" alt="Benchmark Graph">
</p>
<h4 id="Feature-Comparison">Feature Comparison</h4>
<p>Finally A proper comparison would be amiss without a feature comparison chart:</p>
<p><table class="pretty-table"> <tr> <th></th> <th>eval</th> <th>Try::Tiny</th> <th>TryCatch</th> <th>Syntax::Keyword::Try</th> </tr> <tr class="alt"> <td>Requires no dependencies</td> <td class="supports">✔</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> </tr> <tr> <td>Pure Perl solution</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> </tr> <tr class="alt"> <td>Runs on perl 5.8</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="notsupports">✗</td> </tr> <tr> <td>Runs on perl 5.14</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="supports">✔</td> </tr> <tr class="alt"> <td>No Devel::Declare parser swapout</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="notsupports">✗</td> <td class="supports">✔</td> </tr> <tr> <td>Addresses $@ accidentally cleared bug</td> <td class="notsupports">✗</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="supports">✔</td> </tr> <tr class="alt"> <td>Supports try / catch like syntax</td> <td class="notsupports">✗</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="supports">✔</td> </tr> <tr> <td>Doesn't require semicolon after block</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> <td class="supports">✔</td> <td class="supports">✔</td> </tr> <tr class="alt"> <td>Can 'return' from within block</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> <td class="supports">✔</td> <td class="supports">✔</td> </tr> <tr> <td>Can 'last' from within the block</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> <td class="notsupports">✗</td> <td class="supports">✔</td> </tr> <tr class="alt"> <td>Allows rethrowing with no arg 'die'</td> <td class="supports">✔</td> <td class="notsupports">✗</td> <td class="supports">✔</td> <td class="supports">✔</td> </tr> <tr> <td>Maintance release in the last 3 years?</td> <td class="supports">✔</td> <td class="supports">✔</td> <td class="notsupports">✗</td> <td class="supports">✔</td> </tr> </table>
</p>
</div>2016-12-12T00:00:00ZPaul "LeoNerd" EvansREST-oring Christmas Tranquilityhttp://perladvent.org/2016/2016-12-11.html<div class='pod'><h2 id="REST-oring-Christmas-Tranqulity">REST-oring Christmas Tranqulity</h2>
<p>So you've been working for the last four years on the <a href="http://www.perladvent.org/2012/2012-12-24.html">Flibber API</a> your boss required that one time. Turns out that over that time you've added APIs for Jibber, Jabber, and Flubber as well. The code base has grown and you're starting to discover that you're duplicating a lot of code between the various <a href="https://metacpan.org/module/Web::Machine">Web::Machine</a> controllers you've built. Code like:</p>
<pre><code class="code-listing"><span class="word">has</span> <span class="word">json_encoder</span> <span class="operator">=></span> <span class="structure">(</span><br /> <span class="word">is</span> <span class="operator">=></span> <span class="single">'bare'</span><span class="operator">,</span><br /> <span class="word">lazy</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">builder</span> <span class="operator">=></span> <span class="single">'_build_json_encoder'</span><span class="operator">,</span><br /> <span class="word">handles</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">encode_json</span> <span class="operator">=></span> <span class="single">'encode'</span><span class="operator">,</span><br /> <span class="word">decode_json</span> <span class="operator">=></span> <span class="single">'decode'</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /><span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">to_json</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">encode_json</span><span class="structure">(</span><span class="symbol">$self</span><span class="operator">-></span><span class="word">resource</span><span class="structure">);</span><br /><span class="structure">}</span></code></pre>
<p>Which could be refactored into a common base class, but the boss is making noises that make you think things are gonna get ugly if you're not careful. You start to wonder if maybe there is a better way.</p>
<h3 id="One-for-Sorrow-Two-for-Mirth">One for Sorrow / Two for Mirth</h3>
<p><a href="https://metacpan.org/pod/Magpie">Magpie</a> is a resource oriented framework that is based on a pipelined state machine rather than a single state machine. It's been on the CPAN for a couple years now but it's mostly been used internally by the elves at Tamarou. As a warning, the documentation is a bit rough but we're hoping to work on it over the holidays.</p>
<p>Let's start by looking back to the resource we started with four years ago.</p>
<pre><code class="code-listing"> <span class="keyword">use</span> <span class="version">5.16.2</span><span class="structure">;</span><br /> <span class="keyword">use</span> <span class="word">Web::Machine</span><span class="structure">;</span><br /><br /> <span class="structure">{</span><br /> <span class="keyword">package</span> <span class="word">WasteOfTime::Resource</span><span class="structure">;</span><br /> <span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /> <span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /> <span class="keyword">use</span> <span class="pragma">parent</span> <span class="single">'Web::Machine::Resource'</span><span class="structure">;</span><br /><br /> <span class="keyword">use</span> <span class="word">JSON::XS</span> <span class="words">qw(encode_json)</span><span class="structure">;</span><br /><br /> <span class="keyword">sub</span> <span class="word">content_types_provided</span> <span class="structure">{</span> <span class="structure">[{</span> <span class="single">'application/json'</span> <span class="operator">=></span> <span class="single">'to_json'</span> <span class="structure">}]</span> <span class="structure">}</span><br /><br /> <span class="keyword">sub</span> <span class="word">to_json</span> <span class="structure">{</span> <span class="word">encode_json</span><span class="structure">({</span> <span class="word">time</span> <span class="operator">=></span> <span class="word">scalar</span> <span class="word">localtime</span> <span class="structure">})</span> <span class="structure">}</span><br /><span class="structure">}</span><br /><br /> <span class="word">Web::Machine</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="word">resource</span> <span class="operator">=></span> <span class="single">'WasteOfTime::Resource'</span> <span class="structure">)</span><span class="operator">-></span><span class="word">to_app</span><span class="structure">;</span></code></pre>
<p>Let's show what that looks like in <code>Magpie</code>:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="version">5.24.0</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">experimental</span> <span class="single">'signatures'</span><span class="structure">;</span><br /><br /><span class="structure">{</span><br /> <span class="keyword">package</span> <span class="word">WasteOfTime::Resource</span><span class="structure">;</span><br /> <span class="keyword">use</span> <span class="pragma">parent</span> <span class="words">qw(Magpie::Resource)</span><span class="structure">;</span><br /><br /> <span class="keyword">sub</span> <span class="word">GET</span> <span class="prototype">( $self, $ctxt )</span> <span class="structure">{</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">parent_handler</span><span class="operator">-></span><span class="word">resource</span><span class="structure">(</span><span class="symbol">$self</span><span class="structure">);</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">data</span><span class="structure">(</span><span class="word">scalar</span> <span class="word">localtime</span><span class="structure">);</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">response</span><span class="operator">-></span><span class="word">status</span><span class="structure">(</span><span class="number">200</span><span class="structure">);</span><br /> <span class="keyword">return</span> <span class="word">Magpie::Constants::OK</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">}</span><br /><br /><span class="keyword">use</span> <span class="word">Plack::Builder</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Plack::Middleware::Magpie</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$app</span> <span class="operator">=</span> <span class="word">builder</span> <span class="structure">{</span><br /> <span class="word">enable</span> <span class="word">Magpie</span> <span class="operator">=></span> <span class="structure">(</span><br /> <span class="word">accept_matrix</span> <span class="operator">=></span> <span class="structure">[</span> <span class="structure">[</span> <span class="word">json</span> <span class="operator">=></span> <span class="structure">[</span><span class="single">'application/json'</span><span class="structure">]</span> <span class="structure">]</span><span class="operator">,</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="word">pipeline</span> <span class="operator">=></span> <span class="structure">[</span><br /> <span class="word">machine</span> <span class="structure">{</span><br /> <span class="word">match</span> <span class="regexp">qr|^/$|</span> <span class="operator">=></span> <span class="structure">[</span><span class="single">'WasteOfTime::Resource'</span><span class="structure">];</span><br /> <span class="word">match_accept</span> <span class="single">'json'</span> <span class="operator">=></span> <span class="structure">[</span><span class="single">'Magpie::Transformer::JSON'</span><span class="structure">];</span><br /> <span class="structure">}</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><span class="structure">};</span></code></pre>
<p>So it's a little bit longer, but you probably discovered when you went to add the Jibber API that the original lacked routing for different APIs. So the extra lines are probably there in your app anyway. Let's step through and show what's going on in the new version.</p>
<p>We've updated our standard boilerplate. We want to use signatures in our code now so it looks cleaner and more modern, and the things Perl 5.24.0 brings in are nice (postfix dereferencing knocked one of our elves' socks clean off!). We also need to import Magpie Plack middleware. Unlike <code>Web::Machine</code>, <code>Magpie</code> doesn't automatically set up a PSGI application for you so we'll need <code>Plack::Builder</code>. After that we build the same <code>WasteOfTime::Resource</code> class but this time it's a <code>Magpie::Resource</code>.</p>
<p>Rather than splitting out the HTTP request cycle into the state machine that <code>Web::Machine</code> does, Plack hands everything to methods named after the HTTP Method. These methods take a copy of the instance (<code>$self</code>) and a "context object" (<code>$ctxt</code>). The context object is a holdover from Magpie's early days where it was much more generic. We also have to inform <code>Magpie</code> that *this class* is the resource, so we do that with the call to <code>parent_resource</code>. Finally, we need to respond with <code>scalar localtime</code> like we did the last time. Because we're a pipeline we can't be stateless, so we save the <code>localtime</code> to our <code>data</code> attribute. Set the response status, tell <code>Magpie</code> everything went OK, and we're done.</p>
<p>Notice at no point in the resource did we care what representations we could handle nor did we do any transformations. That's because that's handled in different stage in the pipeline by an entirely different class.</p>
<p>After the class we build the app. The last time this was handled for us by <code>Web::Machine</code>, <code>Magpie</code> however was built to handle more complex applications by default so the configuration is a bit more manual and more complex. First we use <code>Plack::Builder</code> and the <code>Magpie</code> middleware. The <code>Magpie</code> middleware gives us a little domain specific language (DSL) that is based off of <code>Plack::Builder</code>'s. We tell Plack we're enabling <code>Magpie</code>. Then we set up the content types we can accept. Like before, we look for Accept headers that match the <code>application/json</code> content type. We tell magpie to call these <code>json</code>. Next, we set up the pipeline Machine for the application. The <code>match</code> directive matches the input URL (in this case the root '/') and adds our resource accordingly. Finally, the <code>match_accept</code> header matches the accept type we setup earlier and adds the JSON transformer. In this case the JSON transformer that ships with <code>Magpie</code> is good enough for us.</p>
<h3 id="The-Magpies-Nest">The Magpie's Nest</h3>
<p>So in a more real world scenario we'd not have a single resource but would instead have multiple resources doing many things.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Plack::Builder</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Plack::Middleware::Magpie</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$app</span> <span class="operator">=</span> <span class="word">builder</span> <span class="structure">{</span><br /> <span class="word">enable</span> <span class="word">Magpie</span> <span class="operator">=></span> <span class="structure">(</span><br /> <span class="word">accept_matrix</span> <span class="operator">=></span> <span class="structure">[</span><br /> <span class="structure">[</span> <span class="word">json</span> <span class="operator">=></span> <span class="single">'application/vnd.northpole.gifts+json'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="word">xml</span> <span class="operator">=></span> <span class="single">'application/vnd.northpole.gifts+xml'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="word">html_en</span> <span class="operator">=></span> <span class="single">'text/html'</span><span class="operator">,</span> <span class="core">undef</span><span class="operator">,</span> <span class="core">undef</span><span class="operator">,</span> <span class="single">'en'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="word">html_es</span> <span class="operator">=></span> <span class="single">'text/html'</span><span class="operator">,</span> <span class="core">undef</span><span class="operator">,</span> <span class="core">undef</span><span class="operator">,</span> <span class="single">'es'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="word">html_de</span> <span class="operator">=></span> <span class="single">'text/html'</span><span class="operator">,</span> <span class="core">undef</span><span class="operator">,</span> <span class="core">undef</span><span class="operator">,</span> <span class="single">'de'</span> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="word">pipeline</span> <span class="operator">=></span> <span class="structure">[</span><br /> <span class="single">'NP::Authen::Passwd'</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">limit_user</span> <span class="operator">=></span> <span class="single">'Santa'</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">machine</span> <span class="structure">{</span><br /> <span class="word">match_template</span> <span class="single">'/TheList/{kid}'</span> <span class="operator">=></span><br /> <span class="structure">[</span><span class="single">'NP::Resource::TheList.pm'</span><span class="structure">];</span><br /> <span class="word">match_template</span> <span class="single">'/TheList/{kid}/nice'</span> <span class="operator">=></span><br /> <span class="structure">[</span> <span class="single">'NP::Resource::Nice.pm'</span><span class="structure">];</span><br /> <span class="word">match_template</span> <span class="single">'/TheList/{kid}/naughty'</span> <span class="operator">=></span><br /> <span class="structure">[</span><span class="single">'NP::Resource::Naughty.pm'</span><span class="structure">];</span><br /><br /> <span class="word">match_accept</span> <span class="single">'json'</span> <span class="operator">=></span> <span class="structure">[</span><span class="single">'Magpie::Transformer::JSON'</span><span class="structure">];</span><br /> <span class="word">match_accept</span> <span class="single">'xml'</span> <span class="operator">=></span> <span class="structure">[</span><span class="single">'NP::Transformer::GiftsXML'</span><span class="structure">];</span><br /> <span class="word">match_accept</span> <span class="single">'html_en'</span> <span class="operator">=></span><br /> <span class="structure">[</span> <span class="single">'NP::Transformer::TT2'</span><span class="operator">,</span> <span class="single">'NP::I18N::EN'</span><span class="operator">,</span> <span class="structure">];</span><br /> <span class="word">match_accept</span> <span class="single">'html_es'</span> <span class="operator">=></span><br /> <span class="structure">[</span> <span class="single">'NP::Transformer::TT2'</span><span class="operator">,</span> <span class="single">'NP::I18N::ES'</span><span class="operator">,</span> <span class="structure">];</span><br /> <span class="word">match_accept</span> <span class="single">'html_de'</span> <span class="operator">=></span><br /> <span class="structure">[</span> <span class="single">'NP::Transformer::TT2'</span><span class="operator">,</span> <span class="single">'NP::I18N::DE'</span><span class="operator">,</span> <span class="structure">];</span><br /> <span class="structure">}</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><span class="structure">};</span></code></pre>
<p>As applications begin to scale in complexity, it becomes increasingly important to keep the different pieces of complexity corralled into their own places. While you can do this with <code>Web::Machine</code> and judicious use of base classes and roles, <code>Magpie</code> was designed to give you guidance on where to put things in an increasingly more complicated application.</p>
<p>As an example of how this would work, let's take a look at two of the output classes <code>NP::Transformer::TT2</code> and <code>NP::I18N::EN</code>.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">NP::Transformer::TT2</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="version">5.24.0</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Moose</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">experimental</span> <span class="words">qw(signatures)</span><span class="structure">;</span><br /><br /><span class="word">extends</span> <span class="words">qw(Magpie::Transformer::TT2)</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">get_tt_conf</span><span class="prototype">($self, $ctxt)</span> <span class="structure">{</span><br /> <span class="core">shift</span><span class="operator">-></span><span class="word">tt_conf</span><span class="structure">({</span> <span class="word">RELATIVE</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">});</span><br /> <span class="keyword">return</span> <span class="word">OK</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">get_template</span><span class="prototype">($self, $ctxt)</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">DECLINED</span> <span class="word">if</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">parent_handler</span><span class="operator">-></span><span class="word">has_error</span><span class="structure">;</span><br /><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">template_file</span><span class="structure">(</span><span class="symbol">$ctxt</span><span class="operator">-></span><span class="structure">{</span><span class="word">template</span><span class="structure">}</span> <span class="operator">//</span> <span class="single">'error.tt2'</span><span class="structure">);</span><br /><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">response</span><span class="operator">-></span><span class="word">content_type</span><span class="structure">(</span><span class="single">'text/html'</span><span class="structure">);</span><br /> <span class="keyword">return</span> <span class="word">OK</span><span class="structure">;</span><br /><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">get_tt_vars</span><span class="prototype">($self, $ctxt)</span> <span class="structure">{</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">tt_vars</span><span class="structure">({</span><br /> <span class="word">request</span> <span class="operator">=></span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">request</span><span class="operator">,</span><br /> <span class="word">resource</span> <span class="operator">=></span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">resource</span><span class="operator">,</span><br /> <span class="structure">});</span><br /> <span class="keyword">return</span> <span class="word">OK</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span><br /><span class="separator">__END__</span></code></pre>
<p>This looks a little more complicated than it really is. First we're inheriting from <code>Magpie::Transformer::TT2</code> which handles building the <a href="https://metacpan.org/release/Template-Toolkit">Template Toolkit</a> object for us. We just need to provide some callback hooks. First <code>get_tt_conf</code> provides the configuration block, then <code>get_template</code> will look up the actual template name. We check for a <code>template</code> key in the context object. The context object is a great way to pass out-of-band data that is important to processing but really isn't resource data. If we don't have a template with the right name, we use a default error template, but we could just as easily throw an exception here.</p>
<p>Finally we set up the template variables. We pass in the request object (usually a <code>Plack::Request</code> object), and the Resource object (something similar to our <code>WasteOfTime::Resource</code> class above). This means that we can access the resource data directly and write a template like:</p>
<pre><code class="code-listing"><span class="synComment"><!DOCTYPE html></span><br /><span class="synIdentifier"><</span><span class="synStatement">html</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">head</span><span class="synIdentifier">><</span><span class="synStatement">title</span><span class="synIdentifier">></span>{The List}<span class="synIdentifier"></</span><span class="synStatement">title</span><span class="synIdentifier">></</span><span class="synStatement">head</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">body</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">h1</span><span class="synIdentifier">></span>{The List}<span class="synIdentifier"></</span><span class="synStatement">h1</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">article</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">h2</span><span class="synIdentifier">></span>{Nice}<span class="synIdentifier"></</span><span class="synStatement">h2</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">ul</span><span class="synIdentifier">></span><br /> [% FOR child IN resource.data.niceList %]<br /> <span class="synIdentifier"><</span><span class="synStatement">li</span><span class="synIdentifier">></span>[% child.name %] <span class="synSpecial">&mdash;</span> [% child.gift %]<span class="synIdentifier"></</span><span class="synStatement">li</span><span class="synIdentifier">></span><br /> [% END %]<br /> <span class="synIdentifier"></</span><span class="synStatement">ul</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"></</span><span class="synStatement">article</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">article</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">h2</span><span class="synIdentifier">></span>{Naughty}<span class="synIdentifier"></</span><span class="synStatement">h2</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"><</span><span class="synStatement">ul</span><span class="synIdentifier">></span><br /> [% FOR child IN resource.data.naughtyList %]<br /> <span class="synIdentifier"><</span><span class="synStatement">li</span><span class="synIdentifier">></span>[% child.name %] <span class="synSpecial">&mdash;</span> [% child.coalAmount %] {lumps}<span class="synIdentifier"></</span><span class="synStatement">li</span><span class="synIdentifier">></span><br /> [% END %]<br /> <span class="synIdentifier"></</span><span class="synStatement">ul</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"></</span><span class="synStatement">article</span><span class="synIdentifier">></span><br /> <span class="synIdentifier"></</span><span class="synStatement">body</span><span class="synIdentifier">></span><br /><span class="synIdentifier"></</span><span class="synStatement">html</span><span class="synIdentifier">></span></code></pre>
<p>Notice our text strings look like <code>{Nice}</code>. This is because we expect the output from our template to be sent through a localization and internationalization (I18N) filter. The <code>NP::I18N::EN</code> class looks something like this:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">NP::I18N::EN</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="version">5.24.0</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Moose</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">experimental</span> <span class="words">qw(signatures)</span><span class="structure">;</span><br /><span class="word">extends</span> <span class="single">'Magpie::Transformer'</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Magpie::Constants</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Local::Simple</span><span class="structure">;</span><br /><br /><span class="word">__PACKAGE__</span><span class="operator">-></span><span class="word">register_events</span><span class="structure">(</span> <span class="words">qw( config transform )</span><span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">load_queue</span> <span class="structure">{</span> <span class="keyword">return</span> <span class="words">qw(config transform)</span> <span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">config</span><span class="prototype">($self, $ctxt)</span> <span class="structure">{</span><br /> <span class="word">l_lang</span><span class="structure">(</span><span class="single">'en_US'</span><span class="structure">);</span><br /> <span class="word">l_dir</span><span class="structure">(</span><span class="single">'locale'</span><span class="structure">);</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">transform</span><span class="prototype">($self, $ctxt)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$html</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">resource</span><span class="operator">-></span><span class="word">data</span><span class="structure">;</span><br /> <span class="symbol">$html</span> <span class="operator">=~</span> <span class="substitute">s|\{ # open brace<br /> ([^}]+) # anything not a closing brace<br /> \} # closing brace<br /> |<br /> l(\1) # translate it<br /> |xgr</span><span class="structure">;</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">resource</span><span class="operator">-></span><span class="word">data</span><span class="structure">(</span><span class="symbol">$html</span><span class="structure">);</span><br /><span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span><br /><span class="separator">__END__</span></code></pre>
<p>Because <code>Magpie</code> doesn't currently ship with an I18N framework, <code>NP::I18N</code> classes inherit directly from the <code>Magpie::Transformer</code> class. This means they're exposed to a bit more of the low-lying mechanics of setting up the state machine. This is what the call to <code>register_events</code> and the <code>load_queue</code> methods are for. They tell <code>Magpie</code> that we have two methods in this pipeline stage and the order in which to call them.</p>
<p>Assuming we have our <code>.po</code> files and whatnot set up properly, this will take any string in curly braces and replace it with the appropriate translation. Obviously this is a vastly simplified version, and Santa's real system probably uses a much more complex parsing system for pulling out the message IDs and translation strings so that things like the coal count can be translated properly. But this illustrates how a pipeline of pieces means that as we step through each piece of the application we can focus down and work on each step individually.</p>
<h3 id="La-gazza-ladra">La gazza ladra</h3>
<p><code>Magpie</code> is heavily influenced by a number of different things. You may recognize some mod_perl and some Catalyst. We used both of those extensively before sitting down to write <code>Magpie</code>. The <code>match_template</code> directive matches on <a href="https://tools.ietf.org/html/rfc6570">URI`Templates</a>. It was also heavily influenced by the book <a href="http://restinpractice.com/book/">Rest In Practice</a> and the work of <a href="http://www.amundsen.com">Mike Amundsen</a>.</p>
<p>As I mentioned in the introduction it's still very much a work in progress but if you'd like to take it for a spin, <code>#magpie</code> on <code>irc.perl.org</code> can answer your questions.</p>
</div>2016-12-11T00:00:00ZChris PratherCombining wishlists for production and shipping planninghttp://perladvent.org/2016/2016-12-10.html<div class='pod'><div class='pod'><h3 id="New-logistics">New logistics</h3>
<p>With slavery being questioned all around the world, quite a few elves rioted and turned their back on Santa. He is now facing a downsized force of helpers and has to work even more efficient than ever before.</p>
<p>Especially the logistics department was cut down to a number that made production of toys a nightmare, as the production department has no idea how many toys to make and with what specifications.</p>
<p>The mail group already being reduced in size last year now works way more efficiently, as they are capable to control the annexes in the countries that collect the wish-lists. They now use CSV files. Or do they?</p>
<p>Due to locales and local regulations, all the country annexes use their local settings to produce the CSV files with the gathered wish-lists. These are then sent to the North Pole where the mail department collects, sorts and distributes them to the logistics department that will then steer the production.</p>
<p>That plan sounds fine, until they note that the Dutch use a semicolon for separation character, the Romanians use UTF-16LE encoding, the Spanish lists come with \r\n line endings and the Germans keep misspelling count as kount. They chose CSV as the definition was so easy nothing could go wrong. Well, now they have to think again.</p>
<p>How many Red Fire Trucks will production have to make so logistics can set up a scheme for Santa to not have to return to the Pole if does not have enough red trucks to fulfill the children's wishes.</p>
<h3 id="Data-received">Data received</h3>
<p>The master elf showed his programmers the format of the wish-list he got from the UK department, named <a href="wish-uk.csv"><code>wish-uk.csv</code></a>:</p>
<pre><code> Name,Date of Birth,Postal Code,Address,Wish,Specs
Hilbert Potter,20121220,NE66 1NQ,Alnwick Castle,Wand,"10¾"" Vine w/ dragon heartstring"
Jane Granger,20111111,NE13 6LY,"Random Edge, Warkworth Drive",Magic beaded handbag,Unlimited space
Wolff Weasly,20131211,NE13 5AX,The Grove,Wand,Crooked oak w/ Muggle blood</code></pre>
<h4 id="Basic-knowledge">Basic knowledge</h4>
<p>The logistic elves are fond of the features perl offers them, so they start out the <a href="wish-uk-1.pl">simple way</a>:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: </code><br /> </td><td class='code'><br /><code><span class="keyword">use</span> <span class="word">Text::CSV_XS</span> <span class="words">qw( csv )</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-uk.csv"</span><span class="structure">);</span></code><br /> </td></table>
<p>and they say all went well. They showed the master elf the content using</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: </code><br /> </td><td class='code'><br /><code><span class="keyword">use</span> <span class="word">Data::Peek</span><span class="structure">;</span><br /><br /><span class="word">DDumper</span> <span class="structure">(</span><span class="symbol">$aoa</span><span class="structure">);</span></code><br /> </td></table>
<p>and it showed</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: <br />4: <br />5: <br />6: <br />7: <br />8: <br />9: <br />10: <br />11: <br />12: <br />13: <br />14: <br />15: <br />16: <br />17: <br />18: <br />19: <br />20: <br />21: <br />22: <br />23: <br />24: <br />25: <br />26: <br />27: <br />28: <br />29: <br />30: </code><br /> </td><td class='code'><br /><code><span class="structure">[</span><br /> <span class="structure">[</span> <span class="single">'Name'</span><span class="operator">,</span><br /> <span class="single">'Date of Birth'</span><span class="operator">,</span><br /> <span class="single">'Postal Code'</span><span class="operator">,</span><br /> <span class="single">'Address'</span><span class="operator">,</span><br /> <span class="single">'Wish'</span><span class="operator">,</span><br /> <span class="single">'Specs'</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="single">'Hilbert Potter'</span><span class="operator">,</span><br /> <span class="number">20121220</span><span class="operator">,</span><br /> <span class="single">'NE66 1NQ'</span><span class="operator">,</span><br /> <span class="single">'Alnwick Castle'</span><span class="operator">,</span><br /> <span class="single">'Wand'</span><span class="operator">,</span><br /> <span class="single">'10¾" Vine w/ dragon heartstring'</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="single">'Jane Granger'</span><span class="operator">,</span><br /> <span class="number">20111111</span><span class="operator">,</span><br /> <span class="single">'NE13 6LY'</span><span class="operator">,</span><br /> <span class="single">'Random Edge, Warkworth Drive'</span><span class="operator">,</span><br /> <span class="single">'Magic beaded handbag'</span><span class="operator">,</span><br /> <span class="single">'Unlimited space'</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="structure">[</span> <span class="single">'Wolff Weasly'</span><span class="operator">,</span><br /> <span class="number">20131211</span><span class="operator">,</span><br /> <span class="single">'NE13 5AX'</span><span class="operator">,</span><br /> <span class="single">'The Grove'</span><span class="operator">,</span><br /> <span class="single">'Wand'</span><span class="operator">,</span><br /> <span class="single">'Crooked oak w/ Muggle blood'</span><br /> <span class="structure">]</span><br /> <span class="structure">]</span></code><br /> </td></table>
<h4 id="Advanced-knowledge">Advanced knowledge</h4>
<p>The master elf suggested to <a href="wish-uk-2.pl">use the headers</a>:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: <br />4: <br />5: <br />6: <br />7: <br />8: <br />9: <br />10: <br />11: <br />12: <br />13: <br />14: <br />15: <br />16: <br />17: <br />18: <br />19: <br />20: <br />21: <br />22: <br />23: <br />24: <br />25: </code><br /> </td><td class='code'><br /><code><span class="word">DDumper</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-uk.csv"</span><span class="operator">,</span> <span class="word">headers</span> <span class="operator">=></span> <span class="double">"auto"</span><span class="structure">);</span><br /><br /><span class="structure">[</span><br /> <span class="structure">{</span> <span class="word">Address</span> <span class="operator">=></span> <span class="single">'Alnwick Castle'</span><span class="operator">,</span><br /> <span class="single">'Date of Birth'</span> <span class="operator">=></span> <span class="number">20121220</span><span class="operator">,</span><br /> <span class="word">Name</span> <span class="operator">=></span> <span class="single">'Hilbert Potter'</span><span class="operator">,</span><br /> <span class="single">'Postal Code'</span> <span class="operator">=></span> <span class="single">'NE66 1NQ'</span><span class="operator">,</span><br /> <span class="word">Specs</span> <span class="operator">=></span> <span class="single">'10¾" Vine w/ dragon heartstring'</span><span class="operator">,</span><br /> <span class="word">Wish</span> <span class="operator">=></span> <span class="single">'Wand'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">Address</span> <span class="operator">=></span> <span class="single">'Random Edge, Warkworth Drive'</span><span class="operator">,</span><br /> <span class="single">'Date of Birth'</span> <span class="operator">=></span> <span class="number">20111111</span><span class="operator">,</span><br /> <span class="word">Name</span> <span class="operator">=></span> <span class="single">'Jane Granger'</span><span class="operator">,</span><br /> <span class="single">'Postal Code'</span> <span class="operator">=></span> <span class="single">'NE13 6LY'</span><span class="operator">,</span><br /> <span class="word">Specs</span> <span class="operator">=></span> <span class="single">'Unlimited space'</span><span class="operator">,</span><br /> <span class="word">Wish</span> <span class="operator">=></span> <span class="single">'Magic beaded handbag'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">Address</span> <span class="operator">=></span> <span class="single">'The Grove'</span><span class="operator">,</span><br /> <span class="single">'Date of Birth'</span> <span class="operator">=></span> <span class="number">20131211</span><span class="operator">,</span><br /> <span class="word">Name</span> <span class="operator">=></span> <span class="single">'Wolff Weasly'</span><span class="operator">,</span><br /> <span class="single">'Postal Code'</span> <span class="operator">=></span> <span class="single">'NE13 5AX'</span><span class="operator">,</span><br /> <span class="word">Specs</span> <span class="operator">=></span> <span class="single">'Crooked oak w/ Muggle blood'</span><span class="operator">,</span><br /> <span class="word">Wish</span> <span class="operator">=></span> <span class="single">'Wand'</span><br /> <span class="structure">}</span><br /> <span class="structure">]</span></code><br /> </td></table>
<p>and the programming elves were in awe for they did not expect the master elf to know about this magic. They then started to read the manual ...</p>
<h4 id="Planning-the-trips">Planning the trips</h4>
<p>The sleigh is not of unlimited size - in contrary of common belief - and cannot hold unlimited weight (due to reindeer fatigue), so Santa must plan well in advance how many presents he can take on a single trip.</p>
<p>The aim of the logistical elves was to count presents per region, so they were not interested in the personal data. Just the postal code and the wish-list item were important (for the planning department). On their efforts they also decided that they just wanted to count on the first part of the postal code, so they amended the <a href="wish-uk-3.pl">script</a>:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: <br />4: <br />5: <br />6: <br />7: <br />8: <br />9: <br />10: <br />11: <br />12: <br />13: <br />14: </code><br /> </td><td class='code'><br /><code><span class="keyword">my</span> <span class="symbol">%count</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-uk.csv"</span><span class="operator">,</span> <span class="word">headers</span> <span class="operator">=></span> <span class="double">"auto"</span><span class="operator">,</span> <span class="word">filter</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="number">1</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="symbol">$count</span><span class="structure">{</span><span class="magic">$_</span><span class="structure">{</span><span class="word">Wish</span><span class="structure">}}{</span><span class="magic">$_</span><span class="structure">{</span><span class="double">"Postal Code"</span><span class="structure">}</span> <span class="operator">=~</span> <span class="substitute">s/\s.*//r</span><span class="structure">}</span><span class="operator">++</span><span class="structure">;</span> <span class="number">0</span><span class="structure">;</span> <span class="structure">}});</span><br /><br /><span class="word">DDumper</span> <span class="cast">\</span><span class="symbol">%count</span><span class="structure">;</span><br /><br /><span class="structure">{</span> <span class="single">'Magic beaded handbag'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">NE13</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">Wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">NE13</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">NE66</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span></code><br /> </td></table>
<p>Happy with the result they now wanted to combine that with their own knowledge on the size and the weight of the presents, which they stored as</p>
<pre><code> present,weight,size
wand,0.2,1
handbag,1,2</code></pre>
<p>They had a long session on how to match the wish description with the present data they had, and came up with <a href="wish-uk-4.pl">what they think was accurate enough</a>:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: <br />4: <br />5: <br />6: <br />7: <br />8: <br />9: <br />10: <br />11: <br />12: <br />13: <br />14: <br />15: <br />16: <br />17: <br />18: <br />19: <br />20: <br />21: <br />22: <br />23: <br />24: <br />25: <br />26: <br />27: <br />28: <br />29: <br />30: <br />31: <br />32: <br />33: <br />34: <br />35: <br />36: <br />37: <br />38: <br />39: <br />40: <br />41: <br />42: <br />43: <br />44: <br />45: <br />46: <br />47: <br />48: <br />49: <br />50: <br />51: <br />52: <br />53: <br />54: <br />55: <br />56: <br />57: <br />58: <br />59: <br />60: <br />61: <br />62: <br />63: <br />64: <br />65: <br />66: <br />67: <br />68: <br />69: <br />70: <br />71: <br />72: <br />73: <br />74: <br />75: <br />76: <br />77: <br />78: <br />79: <br />80: <br />81: <br />82: </code><br /> </td><td class='code'><br /><code><span class="keyword">use</span> <span class="word">List::Util</span> <span class="words">qw( sum first )</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Text::CSV_XS</span> <span class="words">qw( csv )</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$props</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"presents.csv"</span><span class="operator">,</span> <span class="word">key</span> <span class="operator">=></span> <span class="double">"present"</span><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">%count</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-uk.csv"</span><span class="operator">,</span> <span class="word">headers</span> <span class="operator">=></span> <span class="double">"auto"</span><span class="operator">,</span> <span class="word">filter</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="number">1</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="symbol">$count</span><span class="structure">{</span><span class="magic">$_</span><span class="structure">{</span><span class="word">Wish</span><span class="structure">}}{</span><span class="magic">$_</span><span class="structure">{</span><span class="double">"Postal Code"</span><span class="structure">}</span> <span class="operator">=~</span> <span class="substitute">s/\s.*//r</span><span class="structure">}</span><span class="operator">++</span><span class="structure">;</span><br /> <span class="number">0</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">});</span><br /><br /><span class="keyword">my</span> <span class="symbol">@presents</span> <span class="operator">=</span> <span class="word">keys</span> <span class="cast">%</span><span class="symbol">$props</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">%ship</span><span class="structure">;</span><br /><br /><span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$wish</span> <span class="structure">(</span><span class="word">keys</span> <span class="symbol">%count</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$w</span> <span class="operator">=</span> <span class="word">lc</span> <span class="symbol">$wish</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="operator">,</span> <span class="symbol">$p</span><span class="structure">);</span><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$w</span><span class="structure">})</span> <span class="structure">{</span><br /> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$p</span><span class="operator">-></span><span class="structure">{</span><span class="word">weight</span><span class="structure">}</span><span class="operator">,</span> <span class="symbol">$p</span><span class="operator">-></span><span class="structure">{</span><span class="word">size</span><span class="structure">});</span><br /> <span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$w</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="keyword">elsif</span> <span class="structure">(</span><span class="symbol">$p</span> <span class="operator">=</span> <span class="word">first</span> <span class="structure">{</span> <span class="magic">$_</span> <span class="structure">}</span> <span class="word">grep</span> <span class="match">m/\b $w \b/x</span> <span class="operator">=></span> <span class="symbol">@presents</span> <span class="operator">or</span><br /> <span class="symbol">$p</span> <span class="operator">=</span> <span class="word">first</span> <span class="structure">{</span> <span class="symbol">$w</span> <span class="operator">=~</span> <span class="match">m/\b $_ \b/x</span> <span class="structure">}</span> <span class="symbol">@presents</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$p</span><span class="structure">}{</span><span class="word">weight</span><span class="structure">}</span><span class="operator">,</span> <span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$p</span><span class="structure">}</span><span class="operator">-></span><span class="structure">{</span><span class="word">size</span><span class="structure">});</span><br /> <span class="structure">}</span><br /> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="number">1</span><span class="operator">,</span> <span class="number">1</span><span class="structure">);</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$postal</span> <span class="structure">(</span><span class="word">keys</span> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$count</span><span class="structure">{</span><span class="symbol">$wish</span><span class="structure">}})</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$n</span> <span class="operator">=</span> <span class="symbol">$count</span><span class="structure">{</span><span class="symbol">$wish</span><span class="structure">}{</span><span class="symbol">$postal</span><span class="structure">};</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">presents</span><span class="structure">}{</span><span class="symbol">$wish</span><span class="structure">}</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="symbol">$p</span> <span class="operator">//</span> <span class="double">"unknown"</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="symbol">$weight</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="symbol">$size</span><span class="operator">,</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="symbol">$n</span><span class="operator">,</span><br /> <span class="structure">};</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">count</span><span class="structure">}</span> <span class="operator">+=</span> <span class="symbol">$n</span><span class="structure">;</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">size</span><span class="structure">}</span> <span class="operator">+=</span> <span class="symbol">$n</span> <span class="operator">*</span> <span class="symbol">$size</span><span class="structure">;</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">weight</span><span class="structure">}</span> <span class="operator">+=</span> <span class="symbol">$n</span> <span class="operator">*</span> <span class="symbol">$weight</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /><br /><span class="word">DDumper</span> <span class="cast">\</span><span class="symbol">%ship</span><span class="structure">;</span><br /><br /> <span class="operator">=></span><br /><br /><span class="structure">{</span> <span class="word">NE13</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'Magic beaded handbag'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'handbag'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">Wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'wand'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">3</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'1.2'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">NE66</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">Wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'wand'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span></code><br /> </td></table>
<h4 id="Planning-production">Planning production</h4>
<p>Using the same wish-list data, they can control the production department. They just need to know how many presents they should manufacture and with what specs. They are not interested at all in where the presents will be shipped or who will be the happy child to receive the present.</p>
<p>To get the right data to production they still need the present properties in order to categorize the presents. This turned out to be very <a href="wish-uk-5.pl">easy</a>:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: <br />4: <br />5: <br />6: <br />7: <br />8: <br />9: <br />10: <br />11: <br />12: <br />13: <br />14: <br />15: <br />16: <br />17: <br />18: <br />19: <br />20: <br />21: <br />22: <br />23: <br />24: <br />25: <br />26: <br />27: <br />28: <br />29: </code><br /> </td><td class='code'><br /><code><span class="keyword">my</span> <span class="symbol">$props</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"presents.csv"</span><span class="operator">,</span> <span class="word">key</span> <span class="operator">=></span> <span class="double">"present"</span><span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">@presents</span> <span class="operator">=</span> <span class="word">keys</span> <span class="cast">%</span><span class="symbol">$props</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">%count</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-uk.csv"</span><span class="operator">,</span> <span class="word">headers</span> <span class="operator">=></span> <span class="double">"auto"</span><span class="operator">,</span> <span class="word">filter</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="number">1</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$w</span> <span class="operator">=</span> <span class="word">lc</span> <span class="magic">$_</span><span class="structure">{</span><span class="word">Wish</span><span class="structure">};</span><br /> <span class="keyword">my</span> <span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$w</span><span class="structure">}</span> <span class="operator">?</span> <span class="symbol">$w</span><br /> <span class="operator">:</span> <span class="structure">(</span><span class="word">first</span> <span class="structure">{</span> <span class="magic">$_</span> <span class="structure">}</span> <span class="word">grep</span> <span class="match">m/\b $w \b/x</span> <span class="operator">=></span> <span class="symbol">@presents</span><span class="structure">)</span> <span class="operator">||</span><br /> <span class="structure">(</span><span class="word">first</span> <span class="structure">{</span> <span class="symbol">$w</span> <span class="operator">=~</span> <span class="match">m/\b $_ \b/x</span> <span class="structure">}</span> <span class="symbol">@presents</span><span class="structure">)</span> <span class="operator">||</span> <span class="symbol">$w</span><span class="structure">;</span><br /> <span class="symbol">$count</span><span class="structure">{</span><span class="symbol">$p</span><span class="structure">}{</span><span class="symbol">$w</span><span class="structure">}{</span><span class="magic">$_</span><span class="structure">{</span><span class="word">Specs</span><span class="structure">}}</span><span class="operator">++</span><span class="structure">;</span><br /> <span class="number">0</span><span class="structure">;</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">});</span><br /><br /> <span class="operator">=></span><br /><br /><span class="structure">{</span> <span class="word">handbag</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'magic beaded handbag'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'Unlimited space'</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'10¾" Vine w/ dragon heartstring'</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="single">'Crooked oak w/ Muggle blood'</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span></code><br /> </td></table>
<h3 id="More-of-Europe">More of Europe</h3>
<p>So, with the UK covered, let's see what the Dutch and the Germans sent to Santa.</p>
<p>The <a href="wish-nl.csv">Dutch sent Santa</a>:</p>
<pre><code> naam;geboortedatum;postcode;adres;wens;specificaties
Gordon;19680706;1000AZ;Blaricum;handbag;pink size 12
Chantal Janzen;19790215;1071HW;Willemsparkweg;handbag;Dior, black</code></pre>
<p>The <a href="wish-de.csv">Germans sent Santa</a>:</p>
<pre><code> Name,Geburtstag,PostLeitzahl,Adresse,Wunsch,Spezifikationen
Kurt Weller,20131211,80331,Unsöldstraße,Tattoo,the one with the hidden message</code></pre>
<p>What they see here is that the Dutch use a semi-colon instead of a colon and both the German and the Dutch use localized headers, but thankfully still use the English spelling for the presents.</p>
<p>As for the separator, Text::CSV_XS' csv function has a ready to go no-brainer option available:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: </code><br /> </td><td class='code'><br /><code><span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-nl.csv"</span><span class="operator">,</span> <span class="word">sep_set</span> <span class="operator">=></span> <span class="structure">[</span> <span class="double">","</span><span class="operator">,</span> <span class="double">";"</span> <span class="structure">]);</span></code><br /> </td></table>
<p>problem solved :)</p>
<p>When the elves were digging into the manual pages to find that option, they also discovered that they did not need to take any special action to deal with the different line-ending that the Dutch used (<code>\r\n</code> instead of <code>\n</code> as used in the UK version). It was already dealt with.</p>
<p>When they parsed the German file, they were unable to find the <code>Name</code> column and they got an empty last row. Puzzled by that, they used several tools to view the actual content to find that the file contained a lot of <code>00</code> bytes and it started with two mystery bytes <code>FE</code> and <code>FF</code>.</p>
<p>In despair they went to Santa, who explained them the meaning of the Byte Order Mark or short BOM. Again in awe of Santa's wisdom, they reread the manual pages and found that <code>csv ()</code> did support BOM.</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: </code><br /> </td><td class='code'><br /><code><span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"wish-de.csv"</span><span class="operator">,</span> <span class="word">bom</span> <span class="operator">=></span> <span class="number">1</span><span class="structure">);</span></code><br /> </td></table>
<p>So, the only issue they needed to fix was the header names in order to combine the wish-lists.</p>
<p>Instead of trying to translate all headers to a uniform language (be that elvish or English), it is easier to just define the headers and skip the localized name scheme. If all countries keep sending their wish-lists with the same columns, as they do, it does not really matter what the header states, so we choose our own tags.</p>
<p>The csv function supports setting the header and skipping the header in the filter is also possible, so that problem can be tackled relatively easy. They can read all wish-list files in a single loop and <a href="wish-xx.pl">collect all required data</a>:</p>
<table class='code-listing'><tr><td class='line-numbers'><br /><code>1: <br />2: <br />3: <br />4: <br />5: <br />6: <br />7: <br />8: <br />9: <br />10: <br />11: <br />12: <br />13: <br />14: <br />15: <br />16: <br />17: <br />18: <br />19: <br />20: <br />21: <br />22: <br />23: <br />24: <br />25: <br />26: <br />27: <br />28: <br />29: <br />30: <br />31: <br />32: <br />33: <br />34: <br />35: <br />36: <br />37: <br />38: <br />39: <br />40: <br />41: <br />42: <br />43: <br />44: <br />45: <br />46: <br />47: <br />48: <br />49: <br />50: <br />51: <br />52: <br />53: <br />54: <br />55: <br />56: <br />57: <br />58: <br />59: <br />60: <br />61: <br />62: <br />63: <br />64: <br />65: <br />66: <br />67: <br />68: <br />69: <br />70: <br />71: <br />72: <br />73: <br />74: <br />75: <br />76: <br />77: <br />78: <br />79: <br />80: <br />81: <br />82: <br />83: <br />84: <br />85: <br />86: <br />87: <br />88: <br />89: <br />90: <br />91: <br />92: <br />93: <br />94: <br />95: <br />96: <br />97: <br />98: <br />99: <br />100: <br />101: <br />102: <br />103: <br />104: <br />105: <br />106: <br />107: <br />108: <br />109: <br />110: <br />111: <br />112: <br />113: <br />114: <br />115: <br />116: <br />117: <br />118: <br />119: <br />120: <br />121: <br />122: <br />123: <br />124: <br />125: <br />126: <br />127: <br />128: <br />129: <br />130: <br />131: <br />132: <br />133: <br />134: <br />135: <br />136: <br />137: <br />138: <br />139: <br />140: <br />141: <br />142: <br />143: <br />144: <br />145: <br />146: <br />147: <br />148: <br />149: <br />150: <br />151: <br />152: <br />153: <br />154: <br />155: <br />156: <br />157: <br />158: <br />159: <br />160: <br />161: <br />162: <br />163: <br />164: <br />165: <br />166: <br />167: <br />168: <br />169: <br />170: <br />171: <br />172: <br />173: <br />174: <br />175: <br />176: <br />177: <br />178: </code><br /> </td><td class='code'><br /><code><span class="keyword">use</span> <span class="word">Data::Peek</span><span class="structure">;</span><br /><span class="word">binmode</span> <span class="word">STDERR</span><span class="operator">,</span> <span class="double">":encoding(utf-8)"</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">List::Util</span> <span class="words">qw( sum first )</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Text::CSV_XS</span> <span class="words">qw( csv )</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$props</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><span class="word">in</span> <span class="operator">=></span> <span class="double">"presents.csv"</span><span class="operator">,</span> <span class="word">key</span> <span class="operator">=></span> <span class="double">"present"</span><span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">@presents</span> <span class="operator">=</span> <span class="word">keys</span> <span class="cast">%</span><span class="symbol">$props</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">%ship</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">@wlh</span> <span class="operator">=</span> <span class="words">qw( name birth pc address wish spec )</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">%count</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">%prod</span><span class="structure">;</span><br /><span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$wlfn</span> <span class="structure">(</span><span class="word">glob</span> <span class="double">"wish-??.csv"</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$cc</span><span class="structure">)</span> <span class="operator">=</span> <span class="word">uc</span> <span class="structure">(</span><span class="symbol">$wlfn</span><span class="structure">)</span> <span class="operator">=~</span> <span class="match">m/-(\w+)/</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$aoa</span> <span class="operator">=</span> <span class="word">csv</span> <span class="structure">(</span><br /> <span class="word">in</span> <span class="operator">=></span> <span class="symbol">$wlfn</span><span class="operator">,</span><br /> <span class="word">bom</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">headers</span> <span class="operator">=></span> <span class="cast">\</span><span class="symbol">@wlh</span><span class="operator">,</span><br /> <span class="word">filter</span> <span class="operator">=></span> <span class="structure">{</span> <span class="number">1</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /><span class="comment"> # Skip lacalized header<br /></span> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="word">record_number</span> <span class="operator">==</span> <span class="number">1</span> <span class="operator">and</span> <span class="keyword">return</span> <span class="number">0</span><span class="structure">;</span><br /><br /><span class="comment"> # For shipping/logistics<br /></span> <span class="symbol">$count</span><span class="structure">{</span><span class="magic">$_</span><span class="structure">{</span><span class="word">wish</span><span class="structure">}}{</span><span class="symbol">$cc</span><span class="structure">}{</span><span class="magic">$_</span><span class="structure">{</span><span class="word">pc</span><span class="structure">}</span> <span class="operator">=~</span> <span class="substitute">s/\s.*//r</span><span class="structure">}</span><span class="operator">++</span><span class="structure">;</span><br /><br /><span class="comment"> # For production<br /></span> <span class="keyword">my</span> <span class="symbol">$w</span> <span class="operator">=</span> <span class="word">lc</span> <span class="magic">$_</span><span class="structure">{</span><span class="word">wish</span><span class="structure">};</span><br /> <span class="keyword">my</span> <span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$w</span><span class="structure">}</span> <span class="operator">?</span> <span class="symbol">$w</span><br /> <span class="operator">:</span> <span class="structure">(</span><span class="word">first</span> <span class="structure">{</span> <span class="magic">$_</span> <span class="structure">}</span> <span class="word">grep</span> <span class="match">m/\b $w \b/x</span> <span class="operator">=></span> <span class="symbol">@presents</span><span class="structure">)</span> <span class="operator">||</span><br /> <span class="structure">(</span><span class="word">first</span> <span class="structure">{</span> <span class="symbol">$w</span> <span class="operator">=~</span> <span class="match">m/\b $_ \b/x</span> <span class="structure">}</span> <span class="symbol">@presents</span><span class="structure">)</span> <span class="operator">||</span> <span class="symbol">$w</span><span class="structure">;</span><br /> <span class="symbol">$prod</span><span class="structure">{</span><span class="symbol">$p</span><span class="structure">}{</span><span class="symbol">$w</span><span class="structure">}{</span><span class="magic">$_</span><span class="structure">{</span><span class="word">spec</span><span class="structure">}}</span><span class="operator">++</span><span class="structure">;</span><br /><br /><span class="comment"> # Don't store<br /></span> <span class="number">0</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">});</span><br /> <span class="structure">}</span><br /><br /><span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$wish</span> <span class="structure">(</span><span class="word">keys</span> <span class="symbol">%count</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$w</span> <span class="operator">=</span> <span class="word">lc</span> <span class="symbol">$wish</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="operator">,</span> <span class="symbol">$p</span><span class="structure">);</span><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$w</span><span class="structure">})</span> <span class="structure">{</span><br /> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$p</span><span class="operator">-></span><span class="structure">{</span><span class="word">weight</span><span class="structure">}</span><span class="operator">,</span> <span class="symbol">$p</span><span class="operator">-></span><span class="structure">{</span><span class="word">size</span><span class="structure">});</span><br /> <span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$w</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="keyword">elsif</span> <span class="structure">(</span><span class="symbol">$p</span> <span class="operator">=</span> <span class="word">first</span> <span class="structure">{</span> <span class="magic">$_</span> <span class="structure">}</span> <span class="word">grep</span> <span class="match">m/\b $w \b/x</span> <span class="operator">=></span> <span class="symbol">@presents</span> <span class="operator">or</span><br /> <span class="symbol">$p</span> <span class="operator">=</span> <span class="word">first</span> <span class="structure">{</span> <span class="symbol">$w</span> <span class="operator">=~</span> <span class="match">m/\b $_ \b/x</span> <span class="structure">}</span> <span class="symbol">@presents</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$p</span><span class="structure">}{</span><span class="word">weight</span><span class="structure">}</span><span class="operator">,</span> <span class="symbol">$props</span><span class="operator">-></span><span class="structure">{</span><span class="symbol">$p</span><span class="structure">}</span><span class="operator">-></span><span class="structure">{</span><span class="word">size</span><span class="structure">});</span><br /> <span class="structure">}</span><br /> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="structure">(</span><span class="symbol">$weight</span><span class="operator">,</span> <span class="symbol">$size</span><span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="number">1</span><span class="operator">,</span> <span class="number">1</span><span class="structure">);</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$cc</span> <span class="structure">(</span><span class="word">keys</span> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$count</span><span class="structure">{</span><span class="symbol">$wish</span><span class="structure">}})</span> <span class="structure">{</span><br /> <span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$postal</span> <span class="structure">(</span><span class="word">keys</span> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$count</span><span class="structure">{</span><span class="symbol">$wish</span><span class="structure">}{</span><span class="symbol">$cc</span><span class="structure">}})</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$n</span> <span class="operator">=</span> <span class="symbol">$count</span><span class="structure">{</span><span class="symbol">$wish</span><span class="structure">}{</span><span class="symbol">$cc</span><span class="structure">}{</span><span class="symbol">$postal</span><span class="structure">};</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$cc</span><span class="structure">}{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">presents</span><span class="structure">}{</span><span class="symbol">$wish</span><span class="structure">}</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="symbol">$p</span> <span class="operator">//</span> <span class="double">"unknown"</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="symbol">$weight</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="symbol">$size</span><span class="operator">,</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="symbol">$n</span><span class="operator">,</span><br /> <span class="structure">};</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$cc</span><span class="structure">}{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">count</span><span class="structure">}</span> <span class="operator">+=</span> <span class="symbol">$n</span><span class="structure">;</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$cc</span><span class="structure">}{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">size</span><span class="structure">}</span> <span class="operator">+=</span> <span class="symbol">$n</span> <span class="operator">*</span> <span class="symbol">$size</span><span class="structure">;</span><br /> <span class="symbol">$ship</span><span class="structure">{</span><span class="symbol">$cc</span><span class="structure">}{</span><span class="symbol">$postal</span><span class="structure">}{</span><span class="word">weight</span><span class="structure">}</span> <span class="operator">+=</span> <span class="symbol">$n</span> <span class="operator">*</span> <span class="symbol">$weight</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /><br /><span class="word">DDumper</span> <span class="structure">{</span> <span class="word">ship</span> <span class="operator">=></span> <span class="cast">\</span><span class="symbol">%ship</span><span class="operator">,</span> <span class="word">prod</span> <span class="operator">=></span> <span class="cast">\</span><span class="symbol">%prod</span> <span class="structure">};</span><br /><br /> <span class="operator">=></span><br /><br /><span class="structure">{</span> <span class="word">prod</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">handbag</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">handbag</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'Dior, black'</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="single">'pink size 12'</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'magic beaded handbag'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'Unlimited space'</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">tattoo</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">tattoo</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'the one with the hidden message'</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'10¾" Vine w/ dragon heartstring'</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="single">'Crooked oak w/ Muggle blood'</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">ship</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">DE</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="number">80331</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">Tattoo</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'unknown'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">NL</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'1000AZ'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">handbag</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'handbag'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'1071HW'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">handbag</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'handbag'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">UK</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">NE13</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="single">'Magic beaded handbag'</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'handbag'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">2</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">Wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'wand'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">3</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'1.2'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">NE66</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">presents</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">Wand</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="word">count</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">present</span> <span class="operator">=></span> <span class="single">'wand'</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">size</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span><br /> <span class="word">weight</span> <span class="operator">=></span> <span class="single">'0.2'</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span></code><br /> </td></table>
</div>2016-12-10T00:00:00ZH.Merijn BrandToo Many Choices For Santahttp://perladvent.org/2016/2016-12-09.html<div class='pod'><h3 id="A-history-lesson-and-a-management-problem">A history lesson and a management problem</h3>
<p><a href="http://www.perladvent.org/2002/10th/">Several years ago</a> this <strike>august</strike>December calendar told you all about <a href="https://metacpan.org/module/Params::Validate">Params::Validate</a>. It's very useful, and if you're not already using it you are in a state of sin.</p>
<p>Recently, however, Santa has run into a limitation. It's the right tool for the job if you want to validate that parameters are of the right type, but what if your requirements are more complex?</p>
<p>Santa's problem arose because of a customer complaint he got. You see, a few years ago he got an email address so that children could write to him online instead of through the post. I know, I know, it takes some of the magic out of the experience, but really, have you seen modern kids' handwriting? Santa blames modern parenting and schools' acceptance of homework done on computers and tablets. Back in the good old days, children did lots of handwriting and were birched if it was illegible. These days, we're only allowed to birch consenting adults in the sauna. Anyway, one thing led to another and Santa created a website so they could choose their presents, and because he'd read about it in an in-flight magazine, he had the website feed data into his back-end warehouse and ordering systems.</p>
<p>And that leads us to The Complaint. You see, the website had a form on it that allowed the little darlings to choose what broad categories of gifts they wanted, such as toys, food and clothing. Unfortunately Santa forgot that part of his job (which he'd delegated to a team of elves) back when he'd got individual letters and emails from the little darlings was to figure out what would be appropriate when the child was a bit vague. When he created the form, all he got was vague data, which got fed straight into his warehouse systems and sent out to the lowest cost suppliers. And then a nice Jewish kid asked for some food, and got a pork pie. Oops. Cue angry letter from his mother.</p>
<p>What was a Jewish kid doing asking for a <i>Christmas</i> gift? Well, be it from the wonders of the great melting pot, where Children are being raised in multi-faith families, or just the reality that Christmas had becoming less of a Christian festival and more of a generic cultural event, Santa didn't care - if any kid wanted a gift, he was all for providing one.</p>
<p>So he's upgraded the web form to allow lots more options. If you pick 'food' you get options like 'vegetarian', 'halal', and 'kosher'. But during testing it was noticed that you could pick both halal <i>and</i> kosher. It's perfectly possible (not that Santa is an expert, what with being a Christian bishop living in northern Finland; he has a great Sami recipe for mushroom beer which makes his head go all funny but that's about the limit of his multicultural cuisine knowledge) but his suppliers could only do one or the other, not both.</p>
<p>And there were similar problems elsewhere. You could pick a traditional electronic toy. You could pick clothing that was both a pair of socks and a hat at the same time.</p>
<p>This time, Params::Validate isn't enough. It can't check that you've only ticked one of halal and kosher, the most it can do is check that if you specify halal or kosher you must be asking for food.</p>
<h3 id="Params::Validate::Dependencies-to-the-rescue">Params::Validate::Dependencies to the rescue!</h3>
<p><a href="https://metacpan.org/module/Params::Validate::Dependencies">Params::Validate::Dependencies</a> extends Params::Validate, leaving the original module to continue to do what it's good at, and adds functionality for checking all sorts of dependencies between parameters.</p>
<p>Ignoring all the gory details (and they are very very gory indeed, almost as bad as that time a reindeer <a href="https://en.wikipedia.org/wiki/Foreign_object_damage">FODded</a> a 747 over the Pacific) you provide a subroutine reference to the <code>validate()</code> function. That reference can be anything you like, but Params::Validate::Dependencies provides several building blocks for you to use:</p>
<dl>
<dt>none_of</dt>
<dd>
<p>returns a subroutine reference that requires that none of its arguments be present in the data being checked;</p>
</dd>
<dt>one_of</dt>
<dd>
<p>requires that exactly one of its arguments be present;</p>
</dd>
<dt>any_of</dt>
<dd>
<p>requires that one <i>or more</i> of its arguments be present;</p>
</dd>
<dt>all_of</dt>
<dd>
<p>requires that all of its arguments be present</p>
</dd>
</dl>
<p>They can all take strings or further subroutine references as arguments. For example:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Params::Validate::Dependencies</span> <span class="words">qw(:all)</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">foo</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">%params</span> <span class="operator">=</span> <span class="word">validate</span><span class="structure">(</span><span class="magic">@_</span><span class="operator">,</span><br /> <span class="word">all_of</span><span class="structure">(</span><br /> <span class="single">'food'</span><span class="operator">,</span><br /> <span class="word">one_of</span><span class="structure">(</span><br /> <span class="word">none_of</span><span class="structure">(</span><span class="words">qw(halal kosher)</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="word">one_of</span><span class="structure">(</span><span class="words">qw(halal kosher)</span><span class="structure">)</span><br /> <span class="structure">)</span><br /> <span class="structure">)</span><br /> <span class="structure">);</span><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>Let's take that validator apart.</p>
<dl>
<dt>one_of(qw(halal kosher))</dt>
<dd>
<p>this is true if the parameters contain either 'halal' or 'kosher' but not both.</p>
</dd>
<dt>none_of(qw(halal kosher))</dt>
<dd>
<p>this is true if the parameters do <i>not</i> contain either 'halal' or 'kosher'.</p>
</dd>
</dl>
<p>Those are in turn contained within a 'one_of', so that says you must have either one of them or none of them. Finally, that is contained within:</p>
<dl>
<dt>all_of('food', ...)</dt>
<dd>
<p>so you must have a 'food' parameter and either zero or one of 'halal' and 'kosher'.</p>
</dd>
</dl>
<p>We can already see a problem here. You've had to say 'halal' and 'kosher' twice, which is both annoying and also a source of bugs if you misspell one of them once, but that's easily fixed. All of the various <code>*_of</code> functions just return subroutine references (although see the LIES section in the documentation), so we can make up our own reusable subroutine generator:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">zero_or_one_of</span> <span class="structure">{</span><br /> <span class="word">one_of</span><span class="structure">(</span><br /> <span class="word">none_of</span><span class="structure">(</span><span class="magic">@_</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="word">one_of</span><span class="structure">(</span><span class="magic">@_</span><span class="structure">)</span><br /> <span class="structure">)</span><br /><span class="structure">}</span></code></pre>
<p>and reduce the validation code to this:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">foo</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">%params</span> <span class="operator">=</span> <span class="word">validate</span><span class="structure">(</span><span class="magic">@_</span><span class="operator">,</span><br /> <span class="word">all_of</span><span class="structure">(</span><br /> <span class="single">'food'</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(halal kosher)</span><span class="structure">)</span><br /> <span class="structure">)</span><br /> <span class="structure">);</span><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>Right now that actually looks like more code to do the same work, but we can of course reuse the <code>zero_or_one_of</code> function many times. This becomes clear when we also allow vegetarian and vegan options, and start checking for when the kiddies want toys. To validate for toys we add another 'all_of' section just like the above, replacing parameter names where necessary, and wrap both the validator for toys and the validator for food in an 'any_of' so that the user can ask for a toy, or food, or both:</p>
<pre><code class="code-listing"><span class="word">any_of</span><span class="structure">(</span><br /> <span class="word">all_of</span><span class="structure">(</span><br /> <span class="single">'toy'</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(electronic traditional)</span><span class="structure">)</span><br /> <span class="structure">)</span><span class="operator">,</span><br /> <span class="word">all_of</span><span class="structure">(</span><br /> <span class="single">'food'</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(halal kosher)</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(vegetarian vegan)</span><span class="structure">)</span><br /> <span class="structure">)</span><br /><span class="structure">)</span></code></pre>
<p>And we could put yet another section in there for any other major category of gift like clothing or craft supplies.</p>
<p>At this point, we can pass sets of parameters like the following and everything will work:</p>
<p><ul> <li>food</li> <li>food vegetarian</li> <li>food vegetarian halal (and likewise for kosher)</li> <li>food halal</li> <li>toy</li> <li>toy electronic</li> <li>toy traditional</li> <li>food halal vegan toy electronic</li> </ul>
</p>
<p>and if we pass nonsense like this it will fail:</p>
<p><ul> <li>halal kosher food</li> <li>traditional electronic toy</li> </ul>
</p>
<p>Hurrah!</p>
<h3 id="Dont-forget-to-use-Params::Validates-functionality-as-well">Don't forget to use Params::Validate's functionality as well</h3>
<p>Unfortunately there's some other nonsense we can pass, such as:</p>
<ul>
<p><ul> <li>electronic food</li> </ul>
</p>
</ul>
<p>Santa does not yet deliver to robot children even if they've been very very good and not crushed any puny human skulls beneath their steel feet. Thankfully, plain old Params::Validate can check simple dependencies, such as that if you pass the 'electronic' parameter you must also pass the 'toy' parameter. And Params::Validate::Dependencies <i>extends</i> Params::Validate, so all of the old functionality is still available. We extend our little subroutine thus:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">foo</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">%params</span> <span class="operator">=</span> <span class="word">validate</span><span class="structure">(</span><span class="magic">@_</span><span class="operator">,</span><br /><span class="comment"> # here's the traditional Params::Validate checking<br /></span> <span class="structure">{</span><br /><span class="comment"> # these are optional<br /></span> <span class="word">food</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">toy</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">}</span><span class="operator">,</span><br /><br /><span class="comment"> # these are also optional but if present must be accompanied by one of the above<br /></span> <span class="word">electronic</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span> <span class="word">depends</span> <span class="operator">=></span> <span class="structure">[</span> <span class="single">'toy'</span> <span class="structure">]</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">traditional</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span> <span class="word">depends</span> <span class="operator">=></span> <span class="structure">[</span> <span class="single">'toy'</span> <span class="structure">]</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">kosher</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span> <span class="word">depends</span> <span class="operator">=></span> <span class="structure">[</span> <span class="single">'food'</span> <span class="structure">]</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">halal</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span> <span class="word">depends</span> <span class="operator">=></span> <span class="structure">[</span> <span class="single">'food'</span> <span class="structure">]</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">vegetarian</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span> <span class="word">depends</span> <span class="operator">=></span> <span class="structure">[</span> <span class="single">'food'</span> <span class="structure">]</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="word">vegan</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">optional</span> <span class="operator">=></span> <span class="number">1</span><span class="operator">,</span> <span class="word">depends</span> <span class="operator">=></span> <span class="structure">[</span> <span class="single">'food'</span> <span class="structure">]</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /><span class="comment"> # and now for the complex combinations that P::V can't check<br /></span> <span class="word">any_of</span><span class="structure">(</span><br /> <span class="word">all_of</span><span class="structure">(</span><br /> <span class="single">'toy'</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(electronic traditional)</span><span class="structure">)</span><br /> <span class="structure">)</span><span class="operator">,</span><br /> <span class="word">all_of</span><span class="structure">(</span><br /> <span class="single">'food'</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(halal kosher)</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="word">zero_or_one_of</span><span class="structure">(</span><span class="words">qw(vegetarian vegan)</span><span class="structure">)</span><br /> <span class="structure">)</span><br /> <span class="structure">)</span><br /> <span class="structure">);</span><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>and we're finished. The traditional Params::Validate section checks the simple dependencies to make sure that you don't try to ask for electronic food or vegetarian toys (you can also use it to check data types), and the extra Params::Validate::Dependencies section checks that you're not asking for traditional electronic toys.</p>
<h3 id="See-also">See also</h3>
<p><a href="https://metacpan.org/module/Data::Domain">Data::Domain</a> is another module that does a similar job to Params::Validate, and PVD has <a href="https://metacpan.org/module/Data::Domain::Dependencies">Data::Domain::Dependencies</a> bundled with it. Unfortunately it only works reliably on some versions of perl because of problems in one of Data::Domain's dependencies.</p>
<p>If you want to read the module's code I strongly recommend that you read version 1.0 first. Excepting subsequent bug fixes, it has all the functionality discussed above. Version 1.1 then adds some nasty tentacles to make your validation functions self-documenting. But honestly, I don't recommend looking at that without a glass of strong drink. It will make your eyes bleed.</p>
</div>2016-12-09T00:00:00ZDavid CantrellGeocoding the world at volume with open datahttp://perladvent.org/2016/2016-12-08.html<div class='pod'><h2 id="Open-Geodata-saves-Christmas">Open Geodata saves Christmas</h2>
<p>Santa stood up from the desk and shook his head in disgust.</p>
<p>"No, this just won't do at all," he mumbled. The gentle melody of carols in the background couldn't hide the sense of disgust in his voice.</p>
<p>"What's the problem?" asked the Head Elf, while studiously double checking the present list on his clipboard.</p>
<p>"As if the massive bill wasn't enough, you should see the insane terms and conditions those clowns over at MegaCorp want us to agree to to use their geocoding services. This, my little friend, is the final straw. For years we've talked about it, and this year we are going to make the switch to using open geodata. I've seen more and more good things about the OpenStreetMap project. Thousands of contributors around the world, millions of edits per day. We shouldn't just be using open-source software, we should also be using open data!!!"</p>
<p>"Umm, Santa, you know I love open-source, but Christmas is only a few weeks away, and the engineers are already completely overloaded. OpenStreetMap is great, I agree, but we really don't have time to get anyone up to speed on it." answered the Head Elf. "Much as I dislike it, I think we'll have to stay with MegaCorp and agree to their terms and fees. If only there were some service that would let us use open geodata, but also have enterprise level reliability and almost no learning curve."</p>
<p>Across the room, Rudolph the reindeer looked up from his screen.</p>
<p>"We could use the OpenCage Geocoder", he said. "OpenCage provide a simple API to query multiple open geodata backends, including OpenStreetMap. It's ace. And there's a Perl module."</p>
<p>Rudolph picked up his eggnog, wandered over to the whiteboard and started taking them through some code:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Geo::Coder::OpenCage</span><span class="structure">;</span><br /><br /><span class="comment"># get a key at https://geocoder.opencagedata.com/<br /></span><span class="keyword">my</span> <span class="symbol">$api_key</span> <span class="operator">=</span> <span class="single">'secret_key'</span><span class="structure">;</span> <br /><span class="keyword">my</span> <span class="symbol">$Geocoder</span> <span class="operator">=</span> <span class="word">Geo::Coder::OpenCage</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="word">api_key</span> <span class="operator">=></span> <span class="symbol">$api_key</span><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$response</span> <span class="operator">=</span> <span class="symbol">$Geocoder</span><span class="operator">-></span><span class="word">geocode</span><span class="structure">(</span><span class="word">location</span> <span class="operator">=></span> <span class="single">'the north pole'</span><span class="structure">);</span><br /><span class="comment"># $response is hash ref to result set<br /></span><br /><span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$response</span><span class="operator">-></span><span class="structure">{</span><span class="word">total_results</span><span class="structure">}</span> <span class="operator">></span> <span class="number">0</span><span class="structure">){</span><br /><span class="comment"> # grab the first result<br /></span> <span class="keyword">my</span> <span class="symbol">$result</span> <span class="operator">=</span> <span class="symbol">$response</span><span class="operator">-></span><span class="structure">{</span><span class="word">results</span><span class="structure">}[</span><span class="number">0</span><span class="structure">];</span><br /> <span class="word">say</span> <span class="double">"Formatted location name: "</span> <span class="operator">.</span> <span class="symbol">$result</span><span class="operator">-></span><span class="structure">{</span><span class="word">formatted</span><span class="structure">};</span><br /> <span class="word">say</span> <span class="double">"long,lat: "</span> <span class="operator">.</span> <span class="symbol">$result</span><span class="operator">-></span><span class="structure">{</span><span class="word">geometry</span><span class="structure">}{</span><span class="word">lng</span><span class="structure">}</span> <span class="operator">.</span> <span class="single">','</span> <span class="operator">.</span> <span class="symbol">$result</span><span class="operator">-></span><span class="structure">{</span><span class="word">geometry</span><span class="structure">}{</span><span class="word">lat</span><span class="structure">};</span> <br /><span class="structure">}</span> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="word">say</span> <span class="double">"no results found"</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>"That's nice, but as you may recall a few years back we stopped using addresses and switched to coordinates for finding the kids' homes. So what we really need is the opposite - to turn coordinates into locations. " said Santa.</p>
<p>"Yes, I know. I was just showing you a quick example. Have a look at this," replied Rudolph.</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$longitude</span> <span class="operator">=</span> <span class="float">2.1287224</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$latitude</span> <span class="operator">=</span> <span class="float">41.4014067</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$response</span> <span class="operator">=</span> <span class="symbol">$Geocoder</span><span class="operator">-></span><span class="word">reverse_geocode</span><span class="structure">(</span><span class="word">lng</span> <span class="operator">=></span> <span class="symbol">$longitude</span><span class="operator">,</span> <span class="word">lat</span> <span class="operator">=></span> <span class="symbol">$latitude</span><span class="structure">);</span><br /><span class="word">say</span> <span class="symbol">$longitude</span> <span class="operator">.</span> <span class="single">','</span> <span class="operator">.</span> <span class="symbol">$latitude</span> <span class="operator">.</span> <span class="double">" = "</span> <span class="operator">.</span> <span class="symbol">$response</span><span class="operator">-></span><span class="structure">{</span><span class="word">results</span><span class="structure">}[</span><span class="number">0</span><span class="structure">]</span><span class="operator">-></span><span class="structure">{</span><span class="word">formatted</span><span class="structure">};</span><br /><span class="comment"># prints '2.1287224,41.4014067 = Carrer de Calatrava, 68, 08017 Barcelona, Spain'</span></code></pre>
<p>"Wow, that's great. I love the way the address is formatted correctly. Even for a global business like ours getting little i18n details like that right is a huge pain."</p>
<p>Rudolph nodded in agreement. "Behind the scenes they use <b>Geo::Address::Formatter</b>, all the underlying templates are open source."</p>
<h3 id="But-wait-theres-more">But wait, there's more</h3>
<p>"Great stuff. This looks like it could work Rudolph. Let's go tell the engineers, there's still a lot to do. We need to use the coordinates to get all kinds of other data for the trip," said Santa.</p>
<p>"Like what?" asked Rudolph.</p>
<p>"Well, first we need to figure out the local time in each location. Then we'll need to determine when it gets dark there. Then we need to ..."</p>
<p>Rudolph let him talk for a few minutes, took a long drag of eggnog, and then said "Santa, the kids at OpenCage have solved all that. By default each result includes all kinds of information about the location already, they call them <i>annotations</i>"</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$lng</span> <span class="operator">=</span> <span class="float">2.1287224</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$lat</span> <span class="operator">=</span> <span class="float">41.4014067</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$response</span> <span class="operator">=</span> <span class="symbol">$Geocoder</span><span class="operator">-></span><span class="word">reverse_geocode</span><span class="structure">(</span><span class="word">lng</span> <span class="operator">=></span> <span class="symbol">$lng</span><span class="operator">,</span> <span class="word">lat</span> <span class="operator">=></span> <span class="symbol">$lat</span><span class="structure">);</span><br /><br /><span class="comment"># get first result<br /></span><span class="keyword">my</span> <span class="symbol">$result</span> <span class="operator">=</span> <span class="symbol">$response</span><span class="operator">-></span><span class="structure">{</span><span class="word">results</span><span class="structure">}[</span><span class="number">0</span><span class="structure">];</span><br /><span class="keyword">my</span> <span class="symbol">$annotations</span> <span class="operator">=</span> <span class="symbol">$result</span><span class="operator">-></span><span class="structure">{</span><span class="word">annotations</span><span class="structure">};</span><br /><br /><span class="comment"># timezone info<br /></span><span class="word">say</span> <span class="symbol">$annotations</span><span class="operator">-></span><span class="structure">{</span><span class="word">timezone</span><span class="structure">};</span><br /><br /><span class="comment"># information about the local currency<br /></span><span class="word">say</span> <span class="symbol">$annotations</span><span class="operator">-></span><span class="structure">{</span><span class="word">currency</span><span class="structure">};</span><br /><br /><span class="comment"># time of sunrise and sunset<br /></span><span class="word">say</span> <span class="symbol">$annotations</span><span class="operator">-></span><span class="structure">{</span><span class="word">sun</span><span class="structure">};</span><br /><br /><span class="comment"># direct link to edit in OpenStreetMap<br /></span><span class="word">say</span> <span class="symbol">$annotations</span><span class="operator">-></span><span class="structure">{</span><span class="word">OSM</span><span class="structure">};</span><br /><br /><span class="comment"># and many more including what3words code, qibla orientation, geohash<br /># see the full list: https://geocoder.opencagedata.com/api#annotations</span></code></pre>
<p>A broad grin spread across Santa's face as he said "Rudolph, this is genius! OpenCage will save us so much dev time, our most expensive resource. Wonderful."</p>
<p>"Yes. And best of all it's written in Perl. Basically they've created a thin layer of CPAN goodness on top of the massive, global OpenStreetMap community," replied Rudolph.</p>
"And if you're looking for a final stocking stuffer there's even <a href="https://github.com/tadzik/Geo-Coder-OpenCage">an OpenCage Perl 6 module</a> already. So we'll be all set when we switch over to using Perl 6 for everything."
<p>"I just revisited the project plan, Rudolph," interjected the Head Elf. "We should be good to go by Christmas. But not this Christmas."</p>
</div>2016-12-08T00:00:00ZEd FreyfogleWriting git hooks with Git::Hookshttp://perladvent.org/2016/2016-12-07.html<div class='pod'><h2 id="Lets-start-with-the-hooks">Let's start with the hooks.</h2>
<p>As a developer, you are probably using git for source control. And if you are using git, you are probably using git hooks too, even if you don't realize it! <i>hooks</i> are simply programs that git runs automatically whenever you do certain things in a git repo - for example, when you commit, or push, or merge code.</p>
<p>If you use some remote repository like GitHub or GitLab it is very likely you might actually be using git hooks installed by those services to rebuild web pages, trigger deployments or myriad other stuff. Both GitHub and GitLab allow us to configure them quite easily via the various web services' web interfaces.</p>
<p>However, the point of this article is to show you how to write them yourself, in your own local repos. With git hooks - and the power of Perl - we can do all kinds of automations making your life as a developer easier.</p>
<p>So let's first find out what kind of events can trigger a hook. The whole list is <a href="https://git-scm.com/docs/githooks">on the Git reference page</a> and, in the last git version, includes 18 different events:</p>
<p><ul> <li>applypatch-msg <li>pre-applypatch <li>post-applypatch <li>pre-commit <li>prepare-commit-msg <li>commit-msg <li>post-commit <li>pre-rebase <li>post-checkout <li>post-merge <li>pre-push <li>pre-receive <li>update <li>post-receive <li>post-update <li>push-to-checkout <li>pre-auto-gc <li>post-rewrite </ul>
</p>
<p>We can divide them in different ways; for instance, according to <i>when</i> they actually happen. You'll notice that a lot of these have names starting with <code>pre</code> and <code>post</code>. <code>pre</code> and <code>post</code> hooks are run <i>before</i> and <i>after</i> a particular git command does its normal operation respectively.</p>
<p><code>pre</code> hooks can return a value that will effect the outcome of the git command, possibly indicating that the particular action has failed, and can be used to implement policies at the client level (for example, if a commit message contains typos or fails company policy in some other way a pre commit can stop the commit being actually committed.)</p>
<p><code>post</code> hooks are the opposite: They do not affect the command itself, just the way the repository is arranged after the command is run. A <code>post-receive</code> hook, for instance, can send an email to the user or administrator when a push has been processed on a remote repository, or send a message to a continuous integration server to start a test run, or trigger a rebuild of the web site.</p>
<p>Hooks can also be divided according to the git command that triggers them. Four of them, are related to <code>commit</code>, others to <code>am</code>, and yet others to <code>push</code> and to <code>receive-pack</code>, a <i>plumbing</i> command run when a push is received in a repository.</p>
<h3 id="Now-that-we-mention-plumbing">Now that we mention plumbing</h3>
<p>Writing hooks involves diving into the depths of git, going, so to say, into the plumbing. In fact, git commands are divided, using a washroom metaphor, into two classes: <i>porcelain</i> and <i>plumbing</i>. It is <a href="http://stackoverflow.com/questions/39847781/which-are-the-plumbing-and-porcelain-commands/39848056">not too clear which is which</a>, the division being "what can be seen from outside" (porcelain) and "what is used from there" (plumbing). According to that, most of the stuff we use day to day from the command line, clone, add, commit and so on, are "porcelain". And most of the stuff you do not <i>usually</i> run is plumbing: <code>git-unpack-file</code> or <code>git-read-tree</code> is not the kind of thing you usually run from the command line. However, these are precisely the kinds of commands we are going to run from our hooks.</p>
<h3 id="And-plumbing-works-on-the-pipes">And plumbing works on the pipes.</h3>
<p>Or, actually, the trees. Let's again <a href="https://metacpan.org/module/https:#git--scm.com-book-ch1-3.html-The-Three-States">get back to basics</a> and study the three states of a project in git: the working directory (also called the <i>working tree</i>), the staging area (also called the <i>index</i>), and the <code>.git</code> directory that contains everything we normally consider to be <i>in the repository</i>. In a nutshell, the git command <code>add</code> adds files to the staging area, the git command <code>commit</code> passes files from the staging area through to the <code>.git</code> directory, and the git command <code>checkout</code> takes files back from the <code>.git</code> directory to the working directory.</p>
<p>If we poke around inside the <code>.git</code> directory we see this plumbing area is full of trees. <a href="https://git-scm.com/book/es/v2/Git-Internals-Git-Objects#Tree-Objects">Trees</a> are used to store directories with the files and other directories, also stored as trees, in them. This is an efficient, packed way of storing content; git is actually a content-addressable file system. Which is great. But takes us away from the simple concept of a-source-control-management-system-storing-changes-and-that's-it.</p>
<p>The good news about this is that git allows us to work easily, through plumbing commands, with the internal structure so we don't have to manipulate these trees directly. The bad news is that we pretty much need to know how to program our own git in order to write a good hook.</p>
<h2 id="Writing-our-First-Hook">Writing our First Hook</h2>
<p>As said above, hooks are simply scripts. They can be as simple as a shell script or as complicated as a REST client consuming an API. In general, a hook will work this way</p>
<dl>
<dt>1. Examine stuff</dt>
<dd>
<p>Look at what's going on. Examine its inputs and use them to dig a bit deeper using plumbing commands, set the stage.</p>
</dd>
<dt>2. Do Work</dt>
<dd>
<p>Do the real work. Change log messages, rearrange files or create new ones, do lots of different things.</p>
</dd>
<dt>3. Return</dt>
<dd>
<p>Return, possibly with a message, including a failure or success message if it is a <code>pre</code>-class hook.</p>
</dd>
</dl>
<p>Let us see how it works in practice. Here is an example of a simple three line script that can serve as a <code>prepare-commit-msg</code> hook to add some text to our commit message:</p>
<pre><code> lines_changed=$(git diff-index HEAD --cached -p | grep "^\+\w" | wc -l)
message="\nYou have changed $lines_changed lines"
grep -qs "^$message" "$1" || echo "$message" >> "$1"</code></pre>
<p>No Perl yet, but just wait there and you will see! To install this three line script as a hook the three lines should be saved as a file called <code>prepare-commit-msg</code>, that file needs to be <code>chmod +x</code>'ed to become executable, and finally that file must be placed into the <code>.git/hooks</code> directory.</p>
<p>By the way, you can find lots of other <a href="https://www.google.es/search?client=ubuntu&channel=fs&q=prepare-commit-message&ie=utf-8&oe=utf-8&gfe_rd=cr&ei=EdcdWM-aGc6s8wfdqozoDw#safe=off&channel=fs&q=%22prepare-commit-msg%22">examples as gists</a> of hooks to modify your commit message with Google.</p>
<p>Anyway, the first two lines of our example script compute first the number of lines changed. They use a plumbing command, <code>git-diff-index</code>. Following the <a href="https://git-scm.com/docs/git-diff-index">online documentation</a> this command</p>
<pre><code> Compare[s] a tree to the working tree or index</code></pre>
<p>Which one are we doing here? Since we are using <code>--cached</code> we</p>
<pre><code> compares the tree-ish and the index.</code></pre>
<p>In this case, <code>HEAD</code> (i.e. the last commit already previously committed) is the <i>tree-ish</i> thing we're comparing with the <i>index</i> (i.e. whatever files we have already added to this commit in the staging area). The <code>-p</code> option is putting everything in a <i>patch</i> format, something like:</p>
<pre><code> diff --git a/2016/submission/git-hooks.pod b/2016/submission/git-hooks.pod
index 1fb44ed..1c8fc79 100644
--- a/2016/submission/git-hooks.pod
+++ b/2016/submission/git-hooks.pod
@@ -95,7 +95,7 @@ general, a hook will work this way
=item Check out stuff
- Look at what's going on. Check out its inputs and use them to
+Look at what's going on. Check out its inputs and use them to
dig a bit deeper using plumbing commands, set the stage.</code></pre>
<p>You will see a <code>+</code> sign, which indicates the lines that have been added. The <code>grep</code> pipe in our first line filters out everything but those lines and <code>wc -l</code> counts them. The next line will create a message to be added to the commit message, and the last line effectively does that, after checking that it is not already there (since we would not want multiple messages like that, ever). <code>$1</code> will contain the name of the file that contains the commit message, generally <code>COMMIT_MSG</code>.</p>
<p>That's our script in its entirety. No big deal here...except that we need to know some stuff about Linux commands, its not going to be easy to debug, and as soon as we want something a bit more complex we are getting into <code>sed</code> and <code>AWK</code> terrain. And nobody wants that!</p>
<h3 id="So-lets-do-it-in-Perl">So let's do it in Perl</h3>
<p>There are many ways of doing this in Perl (You didn't expect that, did you?)</p>
<p>First version would just move whatever is a bit more complicated to Perl itself. Let us run <code>git</code> as an external command, since that's seemingly our easiest option. Besides, now that we have a sensible language we might go a bit further and consider not only lines added, but also lines taken from the file. And we can do that not by counting lines with <code>+</code> at the front, but parsing the <a href="https://www.gnu.org/software/diffutils/manual/html_node/Detailed-Unified.html#Detailed-Unified">diff format</a> which says how many lines have changed. The result is this program, a bit longish</p>
<pre><code class="code-listing"><span class="comment">#!/usr/bin/env perl<br /></span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">File::Slurp::Tiny</span> <span class="words">qw(read_file write_file)</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="version">v5.14</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$commit_msg_fn</span> <span class="operator">=</span> <span class="core">shift</span> <span class="operator">||</span> <span class="word">die</span> <span class="double">"No commit message file"</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$commit_msg</span> <span class="operator">=</span> <span class="word">read_file</span><span class="structure">(</span> <span class="symbol">$commit_msg_fn</span> <span class="structure">);</span><br /><span class="word">die</span> <span class="word">if</span> <span class="operator">!</span><span class="symbol">$commit_msg</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$diff_output</span> <span class="operator">=</span> <span class="backtick">`git diff-index HEAD --cached -p`</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">@lines_changed</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$diff_output</span> <span class="operator">=~</span> <span class="match">/-\d+,(\d+) \+\d*,?(\d+)/gs</span><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$lines_added</span><span class="operator">,</span> <span class="symbol">$lines_taken</span><span class="structure">);</span><br /><br /><span class="keyword">while</span> <span class="structure">(</span> <span class="symbol">@lines_changed</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$lines_taken</span> <span class="operator">+=</span> <span class="core">shift</span> <span class="symbol">@lines_changed</span><span class="structure">;</span><br /> <span class="symbol">$lines_added</span> <span class="operator">+=</span> <span class="core">shift</span> <span class="symbol">@lines_changed</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /><span class="keyword">my</span> <span class="symbol">$message</span><span class="operator">=</span><span class="double">"\nYou have added $lines_added and taken $lines_taken lines"</span><span class="structure">;</span><br /><br /><span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$commit_msg</span> <span class="operator">!~</span> <span class="match">/$message/</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">write_file</span><span class="structure">(</span> <span class="symbol">$commit_msg_fn</span><span class="operator">,</span> <span class="symbol">$commit_msg</span><span class="operator">.</span><span class="symbol">$message</span> <span class="structure">);</span><br /><span class="structure">}</span></code></pre>
<p>That Perl script is pretty much what we did above in the smaller shell script. The results is more or less the same, and can be seen <a href="https://github.com/JJ/git-hooks/commit/bd19c3b69f8fbcc592e2c312a15656d34317f16a">in action in the repo</a>. But we are still not where we want to be.</p>
<h3 id="Which-perl-are-we-running">Which perl are we running?</h3>
<p>There is a caveat here in our script, and that is the <code>env</code> in the first line which causes whatever perl is first in our path to be executed.</p>
<p>From the shell I use a perl created with <code>perlbrew</code>, and use the <code>perlbrew</code> command to manipulate the shell PATH to run that perl. That path is not going to be available in this spawned shell that my git hook is running, meaning it will use the system perl. There are many solutions to this - including simply installing the modules you need for your git hooks on your system perl with `cpan` / `cpanm`.</p>
<h3 id="Lets-perlify-it-even-more">Let's perlify it even more</h3>
<p>The second, and more significant, problem is that running an external command brings all kind of trouble in particular setups. It is not Pure Perl either.</p>
<p>In fact, if you've got git, you've got a Perl way of doing git using a Perl module without having to directly script the <code>git</code> command ourselves. As part of the git distribution, a <code>Git.pm</code> file is directly installed in <code>/usr/share/perl5</code> (at least in new versions of git.) <code>perldoc Git</code> will show you what to expect from it, and you can also check it out <a href="https://metacpan.org/pod/Git">on MetaCPAN</a>, although in fact it is developed (pretty much, it's kind of quiet) alongside git itself in <a href="https://github.com/git/git">its repository</a>.</p>
<p>Be that as it may, <code>Git.pm</code> provides a Perl interface for running git. After adding <code>use Git;</code> at the beginning, we will insert these new two lines</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$repo</span> <span class="operator">=</span> <span class="word">Git</span><span class="operator">-></span><span class="word">repository</span><span class="structure">();</span><br /><span class="keyword">my</span> <span class="symbol">$diff_output</span> <span class="operator">=</span> <span class="symbol">$repo</span><span class="operator">-></span><span class="word">command</span><span class="structure">(</span><span class="single">'diff-index'</span><span class="operator">,</span> <span class="single">'--cached'</span><span class="operator">,</span><span class="single">'-p'</span><span class="operator">,</span><span class="single">'HEAD'</span><span class="structure">);</span></code></pre>
<p>That is not buying us much, except, well, we are kind of less worried about running external commands from our program (which we are doing anyway, only kind of under the hood)</p>
<h2 id="But-now-we-have-the-full-power-of-Perl">But now we have the full power of Perl</h2>
<p>So let us use it for the greater good. For instance, it would be nice to prohibit anything that can't be compiled from being checked in. Many companies include <code>pre-commit</code> policies that check that, and <a href="http://codeinthehole.com/writing/tips-for-using-a-git-pre-commit-hook/">a host of other things</a>, there are <a href="http://pre-commit.com/">whole frameworks that allow management of this kind of policies, like this one written in Python by Yelp</a>. We might want to check that out later on (or not, because Not Perl), but meanwhile we can run a simple syntax check on the Perl files before they are even committed. This would do the trick:</p>
<pre><code class="code-listing"><span class="comment">#!/usr/bin/env perl<br /></span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Git</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Term::ANSIColor</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="version">v5.14</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$repo</span> <span class="operator">=</span> <span class="word">Git</span><span class="operator">-></span><span class="word">repository</span><span class="structure">();</span><br /><br /><span class="keyword">my</span> <span class="symbol">$diff_output</span> <span class="operator">=</span> <span class="symbol">$repo</span><span class="operator">-></span><span class="word">command</span><span class="structure">(</span><span class="single">'diff-index'</span><span class="operator">,</span> <span class="single">'--cached'</span><span class="operator">,</span><span class="single">'-p'</span><span class="operator">,</span><span class="single">'HEAD'</span><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">@files_changed</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$diff_output</span> <span class="operator">=~</span> <span class="match">/\++\sb\/(\S+)/gs</span><span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$syntax_ok</span> <span class="operator">=</span> <span class="number">0</span><span class="structure">;</span><br /><span class="keyword">foreach</span> <span class="keyword">my</span> <span class="symbol">$file</span> <span class="structure">(</span> <span class="symbol">@files_changed</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">next</span> <span class="word">if</span> <span class="structure">(</span> <span class="symbol">$file</span> <span class="operator">!~</span> <span class="match">/\.p[ml]/</span> <span class="structure">);</span><br /> <span class="word">print</span> <span class="double">"Checking $file "</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$output</span> <span class="operator">=</span> <span class="backtick">`perl -cw $file 2>&1`</span><span class="structure">;</span><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$output</span> <span class="operator">=~</span> <span class="match">/syntax error/</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$syntax_ok</span> <span class="operator">=</span> <span class="symbol">$syntax_ok</span> <span class="operator">||</span> <span class="number">1</span><span class="structure">;</span><br /> <span class="word">say</span> <span class="word">color</span><span class="structure">(</span><span class="double">"red"</span><span class="structure">)</span><span class="operator">,</span> <span class="double">"\N{BALLOT X}"</span><span class="operator">,</span> <span class="word">color</span><span class="structure">(</span><span class="double">"reset"</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="double">"\n\tThere's an error in $file:"</span><span class="operator">,</span> <span class="word">color</span><span class="structure">(</span><span class="double">"red"</span><span class="structure">)</span><span class="operator">,</span><br /> <span class="word">join</span><span class="structure">(</span><span class="double">""</span><span class="operator">,</span><span class="word">map</span><span class="structure">(</span> <span class="double">"\n\t$_"</span><span class="operator">,</span> <span class="word">split</span><span class="structure">(</span><span class="double">"\n"</span><span class="operator">,</span><span class="symbol">$output</span><span class="structure">)))</span><span class="operator">,</span> <span class="word">color</span><span class="structure">(</span><span class="double">"reset"</span><span class="structure">);</span><br /> <span class="structure">}</span> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="word">say</span> <span class="word">color</span><span class="structure">(</span><span class="double">"green"</span><span class="structure">)</span><span class="operator">,</span><span class="double">"\N{HEAVY CHECK MARK}"</span><span class="operator">,</span> <span class="word">color</span><span class="structure">(</span><span class="double">"reset"</span><span class="structure">);</span><br /> <span class="structure">}</span><br /><span class="structure">}</span><br /><br /><span class="word">exit</span> <span class="symbol">$syntax_ok</span><span class="structure">;</span></code></pre>
<p>This script will be copied or symlinked to <code>pre-commit</code> in the <code>hooks</code> directory and will check syntax and warnings for any Perl file that has been modified during the commit. It uses the same git command as above for obtaining the files that have been modified, extracting the filenames from the diff output. That is why it is using:</p>
<pre><code class="code-listing"><span class="structure">(</span><span class="symbol">$diff_output</span> <span class="operator">=~</span> <span class="match">/\++\sb\/(\S+)/gs</span><span class="structure">);</span></code></pre>
<p>which will get filenames from lines such as this one</p>
<pre><code> +++ b/2016/submission/git-hooks.pod</code></pre>
<p>(this will work as long as the filenames do not have any whitespace in them). Anyway for every file modified it will proceed to run <code>perl-cw</code> on it. If it passes muster, it will give the go ahead. If it does not, it will exit with a 1, which will indicate to git that the commit cannot proceed, printing at the same time the cause of the error.</p>
<p><img src="pass.png" alt="Screenshot of test pass">
</p>
<p>We also add color for niceness, using the <a href="https://metacpan.org/module/Term::ANSIColor">Term::ANSIColor</a> module. One of the good things we can use within Perl. But we have also got even better things.</p>
<h2 id="Enter-Git::Hooks">Enter <a href="https://metacpan.org/module/Git::Hooks">Git::Hooks</a></h2>
<p><a href="https://metacpan.org/module/Git::Hooks">Git::Hooks</a> is written specifically to do this kind of thing. It lays a layer of goodness over other modules that deal with Git and allows us to work easily with the information that is available to us inside the hooks. Besides, and this is the nicest thing, it allows us to unify all hooks in a single script. Let's unify the two files we have above in a single one.</p>
<p>This is the one program to rule them all:</p>
<pre><code class="code-listing"><span class="comment">#!/usr/bin/env perl<br /></span><br /><span class="keyword">use</span> <span class="pragma">strict</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="pragma">warnings</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Git::Hooks</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Term::ANSIColor</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">File::Slurp::Tiny</span> <span class="words">qw(read_file write_file)</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="version">v5.14</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">diff_output</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$git</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /> <span class="symbol">$git</span><span class="operator">-></span><span class="word">command</span><span class="structure">(</span><span class="single">'diff-index'</span><span class="operator">,</span> <span class="single">'--cached'</span><span class="operator">,</span><span class="single">'-p'</span><span class="operator">,</span><span class="single">'HEAD'</span><span class="structure">);</span><br /><span class="structure">}</span><br /><br /><span class="word">PRE_COMMIT</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$git</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@files_changed</span> <span class="operator">=</span> <span class="structure">(</span><span class="word">diff_output</span><span class="structure">(</span> <span class="symbol">$git</span><span class="structure">)</span> <span class="operator">=~</span> <span class="match">/\++\sb\/(\S+)/gs</span><span class="structure">);</span><br /><br /> <span class="operator">...</span><span class="word">the</span> <span class="word">same</span> <span class="word">pre-commit</span> <span class="word">logic</span> <span class="word">as</span> <span class="word">in</span> <span class="word">our</span> <span class="word">script</span><span class="operator">...</span><br /><br /> <span class="keyword">return</span> <span class="symbol">$syntax_ok</span><span class="structure">;</span><br /><span class="structure">};</span><br /><br /><span class="word">PREPARE_COMMIT_MSG</span> <span class="structure">{</span><br /><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$git</span><span class="operator">,</span> <span class="symbol">$commit_msg_fn</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$commit_msg</span> <span class="operator">=</span> <span class="word">read_file</span><span class="structure">(</span> <span class="symbol">$commit_msg_fn</span> <span class="structure">);</span><br /> <span class="word">die</span> <span class="word">if</span> <span class="operator">!</span><span class="symbol">$commit_msg</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@lines_changed</span> <span class="operator">=</span> <span class="structure">(</span><span class="word">diff_output</span><span class="structure">(</span> <span class="symbol">$git</span> <span class="structure">)</span> <span class="operator">=~</span> <span class="match">/-\d+,(\d+) \+\d*,?(\d+)/gs</span><span class="structure">);</span><br /><br /> <span class="operator">...</span><span class="word">the</span> <span class="word">same</span> <span class="word">prepare-commit-msg</span> <span class="word">logic</span> <span class="word">as</span> <span class="word">in</span> <span class="word">our</span> <span class="word">other</span> <span class="word">script</span><span class="operator">...</span><br /><span class="structure">};</span><br /><br /><span class="word">run_hook</span><span class="structure">(</span><span class="magic">$0</span><span class="operator">,</span> <span class="symbol">@ARGV</span><span class="structure">);</span></code></pre>
<p>This is just part of the program; the rest is in the <a href="https://github.com/JJ/git-hooks">git-hooks repository</a>. <a href="https://metacpan.org/module/Git::Hooks">Git::Hooks</a> has a centralized approach to hooks: it encourages you to create a single file, that is symlinked to the different hook names; in this case, <code>prepare-commit-msg</code> and <code>pre-commit-msg</code>. The magic is done by the last line containing <code>run_hook</code> which takes the command under which it is being invoked <code>$0</code> along with the command-line arguments.</p>
<p>The hooks are declared LABEL-like (although they are actually function calls.) These calls receive always <code>$git</code> as the first argument, the git command itself. And its invocation is actually the same as we did before, only we have used a slightly different syntax via the <code>qw</code> quote operator. And we do that in the <code>diff_output</code> function, which is actually the only part where we are winning something by factoring out part of the code, this common code that includes the differences between the repository in the previous state and this one. A single file is easier to maintain and copy around; in fact, you can <a href="https://coderwall.com/p/jp7d5q/create-a-global-git-commit-hook">create a git directory template</a> which includes this single file and its corresponding symlinks so that your development team plays by the hook. Pun intended.</p>
<h3 id="Enforcing-commit-refers-to-issue-policies">Enforcing commit-refers-to-issue policies</h3>
<p>An idea that I try to instill in my students is that you always work in a project towards an objective, and that objective must be engraved in a milestone (get it? Engraved... in a mile-<i>stone</i>), which must be divided in tasks that are assigned issues. This is such a dyed-in-the-wool thing that <a href="https://metacpan.org/module/https:#metacpan.org-pod-Git::Hooks-Using-Plugins">Git::Hooks has a plugin that checks that every commit message includes a valid JIRA issue</a>. In fact, there are many off-the-shelf plug-ins that can be activated via the combination of git config variables and ready-to-go plug-ins. Other <a href="https://bigbrassband.com/api-doc.html">frameworks</a> also including little-to-no-programming hooks for integration in development ecosystems.</p>
<p>However, we use plain old GitHub issues, and that is what we want to use: let's accept a commit message if and only if it addresses at least one valid issue. We will do this with that chunk of code, that is integrated along with the other hooks. Please go to <a href="https://github.com/JJ/git-hooks/blob/master/commit-hooks.pl">the repo</a> for the whole file, just the part including the new code has been included here.</p>
<pre><code class="code-listing"><span class="word">COMMIT_MSG</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$git</span><span class="operator">,</span> <span class="symbol">$commit_msg_file</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$git_repo</span> <span class="operator">=</span> <span class="word">Git::More</span><span class="operator">-></span><span class="word">repository</span><span class="structure">();</span><br /> <span class="keyword">my</span> <span class="symbol">$api_key</span> <span class="operator">=</span> <span class="symbol">$git_repo</span><span class="operator">-></span><span class="word">get_config</span><span class="structure">(</span> <span class="single">'github'</span><span class="operator">,</span><span class="single">'apikey'</span> <span class="structure">);</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$gh</span> <span class="operator">=</span> <span class="word">Net::GitHub</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">version</span> <span class="operator">=></span> <span class="number">3</span><span class="operator">,</span><br /> <span class="word">access_token</span> <span class="operator">=></span> <span class="symbol">$api_key</span><br /> <span class="structure">);</span><br /> <span class="keyword">my</span> <span class="symbol">$origin</span> <span class="operator">=</span> <span class="symbol">$git_repo</span><span class="operator">-></span><span class="word">get_config</span><span class="structure">(</span> <span class="single">'remote.origin'</span><span class="operator">,</span><span class="single">'url'</span> <span class="structure">);</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$user</span><span class="operator">,</span> <span class="symbol">$repo</span> <span class="structure">)</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$origin</span> <span class="operator">=~</span> <span class="match">m{:(.+?)/(.+)\.git}</span><span class="structure">);</span><br /> <span class="keyword">my</span> <span class="symbol">$issue</span> <span class="operator">=</span> <span class="symbol">$gh</span><span class="operator">-></span><span class="word">issue</span><span class="structure">();</span><br /> <span class="keyword">my</span> <span class="symbol">@these_issues</span> <span class="operator">=</span> <span class="symbol">$issue</span><span class="operator">-></span><span class="word">repos_issues</span><span class="structure">(</span> <span class="symbol">$user</span><span class="operator">,</span> <span class="symbol">$repo</span><span class="operator">,</span> <span class="structure">{</span> <span class="word">state</span> <span class="operator">=></span> <span class="single">'open'</span><span class="structure">}</span> <span class="structure">);</span><br /><br /> <span class="keyword">my</span> <span class="symbol">%issues_map</span><span class="structure">;</span><br /> <span class="keyword">for</span> <span class="keyword">my</span> <span class="symbol">$i</span> <span class="structure">(</span> <span class="symbol">@these_issues</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="symbol">$issues_map</span><span class="structure">{</span><span class="symbol">$i</span><span class="operator">-></span><span class="structure">{</span><span class="single">'number'</span><span class="structure">}}</span> <span class="operator">=</span> <span class="symbol">$i</span><span class="operator">-></span><span class="structure">{</span><span class="single">'title'</span><span class="structure">};</span><br /> <span class="structure">}</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$commit_msg</span> <span class="operator">=</span> <span class="word">read_file</span><span class="structure">(</span> <span class="symbol">$commit_msg_file</span> <span class="structure">);</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@issues</span> <span class="operator">=</span> <span class="structure">(</span><span class="symbol">$commit_msg</span> <span class="operator">=~</span> <span class="match">/\#(\d+)/g</span><span class="structure">);</span><br /><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="operator">!</span><span class="symbol">@issues</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">say</span> <span class="double">"This commit should address at least one issue"</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="number">0</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$addresses_issue</span> <span class="operator">=</span> <span class="number">1</span><span class="structure">;</span><br /> <span class="keyword">for</span> <span class="keyword">my</span> <span class="symbol">$i</span> <span class="structure">(</span> <span class="symbol">@issues</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">if</span> <span class="structure">(</span> <span class="symbol">$issues_map</span><span class="structure">{</span><span class="symbol">$i</span><span class="structure">}</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">say</span> <span class="word">pass</span><span class="structure">(</span><span class="double">"Addresses issue $i: $issues_map{$i}"</span><span class="structure">);</span><br /> <span class="symbol">$addresses_issue</span> <span class="operator">&&=</span> <span class="number">1</span><span class="structure">;</span><br /> <span class="structure">}</span> <span class="keyword">else</span> <span class="structure">{</span><br /> <span class="word">say</span> <span class="word">fail</span><span class="structure">(</span><span class="double">"There is no issue $i"</span><span class="structure">);</span><br /> <span class="symbol">$addresses_issue</span> <span class="operator">&&=</span> <span class="number">0</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="word">say</span> <span class="double">"\N{INCREMENT}\N{NABLA}"</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="symbol">$addresses_issue</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">};</span></code></pre>
<p>In order to use this, we will have first to create and copy an API key from GitHub. This key is going to be used to download the issues. After getting the API token from your profile, write</p>
<pre><code> git config --add github.apikey abcdeandmanylettersmoreandnumbers</code></pre>
<p>Since we cannot pass through environment variables to the hooks, and it is really not a good idea to save the API key in a file, because you might accidentally add it to the repository, let's just handle it <i>outside</i> the repository by placing it in an environment that is available to the hooks: the git configuration file. We are going to be using <a href="https://metacpan.org/module/Git::More">Git::More</a>, which is part of the <a href="https://metacpan.org/module/Git::Hooks">Git::Hooks</a> repository, to retrieve that key, which we do in</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$api_key</span> <span class="operator">=</span> <span class="symbol">$git_repo</span><span class="operator">-></span><span class="word">get_config</span><span class="structure">(</span> <span class="single">'github'</span><span class="operator">,</span><span class="single">'apikey'</span> <span class="structure">);</span></code></pre>
<p>We then use this key to open the API using yet another module, <a href="https://metacpan.org/module/Net::GitHub">Net::GitHub</a>. We will need this one to access the issues, which is something we do in the next few lines using the $issue->repos_issues function.</p>
<p>After that, the program creates a map with the issues (which makes them easier to check) and lets the commit pass if - and only if - there are issues and they actually exist and are open.</p>
<p>After <i>hooking</i> this hook by symlinking it to the script, the result will be something like this:</p>
<img src="issues.png" alt="Screenshot of test pass">
<h2 id="The-tip-of-the-hook">The tip of the hook</h2>
<p>By itself, Perl is a great tool for writing git hooks. Together with the git and GitHub related modules, writing a set of hooks for your repositories is fast and straightforward. You might even not need to write anything, by just using <a href="https://metacpan.org/module/Git::Hooks">Git::Hooks</a> plugins.</p>
<p>Most difficult thing here is to debug the tools, but in many cases you will simply have to create a file with the right name and invoke the script with the right command line too, so that <code>$0</code> and <code>@ARGV</code> are correct. Always bear in mind the environment in which the scripts are running, and before you know, you'll be hooked.</p>
<h2 id="SEE-ALSO">SEE ALSO</h2>
<p>These are the modules that have been used here. There are hook frameworks in most other languages, some of them exclusively devoted to a particular hook. You might use them too if the language is already installed in your system.</p>
<p>* <a href="https://metacpan.org/module/Git::More">Git::More</a></p>
<p>* <a href="https://metacpan.org/module/Git::Repository">Git::Repository</a></p>
<p>* <a href="https://metacpan.org/module/Net::GitHub">Net::GitHub</a></p>
<p>* <a href="https://www.npmjs.com/package/pre-commit">pre-commit in node.js</a>, which can be configured via a JSON file.</p>
<p>* <a href="https://github.com/pre-commit/pre-commit-hooks">pre-commit hooks in Python</a></p>
<h2 id="POD-ERRORS">POD ERRORS</h2>
<p>Hey! <b>The above document had some coding errors, which are explained below:</b></p>
<dl>
<dt>Around line 10:</dt>
<dd>
<p>Cannot have multiple =encoding directives</p>
<p>Invalid =encoding syntax: utf8</p>
</dd>
</dl>
</div>2016-12-07T00:00:00ZJMERELOHelp Santa Klaus Reward Only Nice Childrenhttp://perladvent.org/2016/2016-12-06.html<div class='pod'><h3 id="Once-Upon-A-Time">Once Upon A Time</h3>
<p>It's 2016, and near the North Pole in Scotland, Santa Klaus is having his annual financial review. And it doesn't look great.</p>
<p>Santa realises he needs to reduce his costs from now on, and so decides to only reward children who have been nice throughout the year, as opposed to lavishly rewarding even the naughty ones as he's done for decades past.</p>
<p>After consulting with the senior elves, Santa decides to implement a child niceness assessment process over the year:</p>
<dl>
<dt>In February, all parents receive a letter from Santa including a report form for them to fill in.</dt>
<dd>
</dd>
<dt>The due date for the report form is the 1st of November.</dt>
<dd>
</dd>
<dt>If no form has been received, the child is considered naughty.</dt>
<dd>
</dd>
<dt>Each completed form is assigned to an elf for assessment (the assessment process is another story).</dt>
<dd>
</dd>
</dl>
<p>Santa's elves are not very enthusiastic about implementing this in Perl, with all the code required to track if the parents have filled in the forms or not and building the logic to trigger the right part of their existing code base at the right time. How do they design these processes so to minimize the mess? How will they unit test that so things don't randomly fail in the future?</p>
<p>'Stop panicking!' says Rudolf, head of Santa's programming sweatshop. 'I've found <a href="https://metacpan.org/module/Schedule::LongSteps">Schedule::LongSteps</a> on the CPAN!'</p>
<p><a href="https://metacpan.org/module/Schedule::LongSteps">Schedule::LongSteps</a> is a small framework that enables you to program the future with confidence.</p>
<h2 id="Lets-Help-Santa-Implement">Let's Help Santa Implement!</h2>
<p>For this example implementation, we'll assume we have an object <code>$santaHQ</code> that represents Santa's business and all the wonderful things it can do. We're going to concentrate on creating example module that manages the process spelled out above.</p>
<p>In <a href="https://metacpan.org/module/Schedule::LongSteps">Schedule::LongSteps</a>, the process we are modeling is simply modeled as a class that extends Schedule::LongSteps:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">My::Process::NicenessFeedback</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose</span><span class="structure">;</span> <span class="comment"># There are a lot of them in Santa's land.</span><br /><span class="word">extends</span> <span class="words">qw/Schedule::LongSteps/</span><span class="structure">;</span><br /><br /><span class="comment"># The object that represents all the business logic that this process<br /># model operates on<br /></span><span class="word">has</span> <span class="single">'santaHQ'</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'My::SantaHQ'</span><span class="operator">,</span> <span class="word">required</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><br /><span class="word">__PACKAGE__</span><span class="operator">-></span><span class="word">meta</span><span class="operator">-></span><span class="word">make_immutable</span><span class="structure">();</span><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>Each actual process model is an instance of this that maintains a state. The state of the process serves both as the place to give the process parameters and the place that stores the current work in progress. It must only contain 'pure Perl' data, so no objects.</p>
<p>Schedule::LongSteps stores this state between executions of the code in what is known as a Storage. This pluggable storage system allows Schedule::LongSteps to store data for you in various ways, typically in a SQL database.</p>
<p>Before we flesh out the My::Process::NicenessFeedback module, let's look at how we might instantiate it with the Schedule::LongSteps module:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$sl</span> <span class="operator">=</span> <span class="word">Schedule::LongSteps</span><span class="operator">-></span><span class="word">new</span><span class="structure">();</span><br /><br /><span class="symbol">$sl</span><span class="operator">-></span><span class="word">instantiate_process</span><span class="structure">(</span><br /> <span class="single">'My::Process::NicenessFeedback'</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">santaHQ</span> <span class="operator">=></span> <span class="symbol">$santaHQ</span> <span class="structure">}</span><span class="operator">,</span> <span class="comment"># The construction parameters</span><br /> <span class="structure">{</span> <span class="word">child_id</span> <span class="operator">=></span> <span class="number">1234</span> <span class="structure">}</span> <span class="comment"># Initial state</span><br /><span class="structure">);</span></code></pre>
<p>We've passed in two hash refs. The first is the parameters that need to be passed to My::Process::NicenessFeedback when Schedule::LongSteps instantiates it - in this case the <code>$santaHQ</code> that holds all of our underlying business logic. The second hashref contains the initial state we want to assign to the object.</p>
<p>This state will be accessible inside our object using the <code>state</code> method, and while the state has to be a pure Perl data structure we can use builder methods to populate attributes with more complex objects:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">My::Process::NicenessFeedback</span><span class="structure">;</span><br /><span class="operator">...</span><br /><span class="word">has</span> <span class="single">'child'</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">lazy_build</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><span class="keyword">sub</span> <span class="word">_build_child</span><span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">santaHQ</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">big_book_of_children</span><br /> <span class="operator">-></span><span class="word">find</span><span class="structure">(</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">state</span><span class="structure">()</span><span class="operator">-></span><span class="structure">{</span><span class="word">child_id</span><span class="structure">}</span> <span class="structure">);</span><br /><span class="structure">}</span></code></pre>
<p>The process now has a child!</p>
<h3 id="Adding-Steps">Adding Steps!</h3>
<p>We can now define what it should do first. For this, we first implement the (mandatory) method 'build_first_step' method:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">build_first_step</span><span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><span class="comment"> # At the beginning of February next year<br /> # send the parents a form.<br /></span> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">new_step</span><span class="structure">({</span><br /> <span class="word">what</span> <span class="operator">=></span> <span class="single">'do_send_form_to_parents'</span><span class="operator">,</span><br /> <span class="word">run_at</span> <span class="operator">=></span> <span class="word">DateTime</span><span class="operator">-></span><span class="word">now</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">add</span><span class="structure">(</span> <span class="word">year</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">)</span><br /> <span class="operator">-></span><span class="word">set_month</span><span class="structure">(</span><span class="number">2</span><span class="structure">)</span><br /> <span class="operator">-></span><span class="word">truncate_to</span><span class="structure">(</span> <span class="word">to</span> <span class="operator">=></span> <span class="single">'month'</span> <span class="structure">)</span><br /> <span class="structure">});</span><br /><span class="structure">}</span></code></pre>
<p>This will cause Schedule::LongSteps to schedule the <code>do_send_form_to_parents</code> step as the first step to be run for this instance. We also set the time when the step will become available to be executed with the <code>run_at</code> parameter. We'd better implement the code for the steps in our class next...</p>
<p>Steps in the process are implemented as methods. The convention is to name them with the prefix 'do_', to avoid any possible conflict with future methods.</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">do_send_form_to_parents</span><span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">$form</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">santaHQ</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">santas_pa</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">create_niceness_form</span><span class="structure">(</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">child</span><span class="structure">()</span> <span class="structure">);</span><br /><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">santaHQ</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">outbox</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">send_letter</span><span class="structure">(</span> <span class="symbol">$form</span><span class="operator">,</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">child</span><span class="structure">()</span><span class="operator">-></span><span class="word">parental_address</span><span class="structure">()</span> <span class="structure">);</span><br /><br /><span class="comment"> # The following November, we want to check stuff.<br /></span> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">new_step</span><span class="structure">({</span><br /> <span class="word">what</span> <span class="operator">=></span> <span class="single">'do_check_form_reception'</span><span class="operator">,</span><br /> <span class="word">run_at</span> <span class="operator">=></span> <span class="word">DateTime</span><span class="operator">-></span><span class="word">now</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">set_month</span><span class="structure">(</span><span class="number">11</span><span class="structure">)</span><br /> <span class="operator">-></span><span class="word">truncate_to</span><span class="structure">(</span> <span class="word">to</span> <span class="operator">=></span> <span class="single">'month'</span> <span class="structure">)</span><span class="operator">,</span><br /> <span class="word">state</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$self</span><span class="operator">-></span><span class="word">state</span><span class="structure">()}</span><span class="operator">,</span><br /> <span class="word">form_id</span> <span class="operator">=></span> <span class="symbol">$form</span><span class="operator">-></span><span class="word">id</span><span class="structure">()</span><span class="operator">,</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">});</span><br /><span class="structure">}</span><br /><br /><span class="word">has</span> <span class="single">'form'</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'My::Form'</span><span class="operator">,</span> <span class="word">lazy_build</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><span class="keyword">sub</span> <span class="word">_build_form</span><span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">santaHQ</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">forms_register</span><span class="structure">()</span><br /> <span class="operator">-></span><span class="word">find</span><span class="structure">(</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">state</span><span class="structure">()</span><span class="operator">-></span><span class="structure">{</span><span class="word">form_id</span><span class="structure">}</span> <span class="structure">);</span><br /><span class="structure">}</span><br /><br /><span class="keyword">sub</span> <span class="word">do_check_form_reception</span><span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span><span class="symbol">$self</span><span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /> <span class="keyword">unless</span><span class="structure">(</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">form</span><span class="structure">()</span><span class="operator">-></span><span class="word">is_back</span><span class="structure">()</span> <span class="structure">){</span><br /><span class="comment"> # Oh no! The form was not received, and so the child is<br /> # considered naughty. This is the end of the process.<br /></span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">child</span><span class="operator">-></span><span class="word">has_been_naughty_in</span><span class="structure">(</span> <span class="word">DateTime</span><span class="operator">-></span><span class="word">now</span><span class="structure">()</span><span class="operator">-></span><span class="word">year</span><span class="structure">()</span> <span class="structure">);</span><br /> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">final_step</span><span class="structure">({</span><br /> <span class="word">state</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$self</span><span class="operator">-></span><span class="word">state</span><span class="structure">()}</span><span class="operator">,</span><br /> <span class="word">reason</span> <span class="operator">=></span> <span class="single">'Missed deadline'</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">});</span><br /> <span class="structure">}</span><br /><br /><span class="comment"> # The form has been returned - time to pick an elf and start<br /> # the review process.<br /></span> <span class="keyword">my</span> <span class="symbol">$reviewer_elf</span> <span class="operator">=</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">santaHQ</span><span class="structure">()</span><span class="operator">-></span><span class="word">most_available_elf</span><span class="structure">();</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">longsteps</span><span class="structure">()</span><span class="operator">-></span><span class="word">instantiate_process</span><span class="structure">(</span><br /> <span class="single">'My::Process::NicenessAssessment'</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">santaHQ</span> <span class="operator">=></span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">santaHQ</span><span class="structure">()</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="word">child_id</span> <span class="operator">=></span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">child</span><span class="structure">()</span><span class="operator">-></span><span class="word">id</span><span class="structure">()</span><span class="operator">,</span><br /> <span class="word">elf_id</span> <span class="operator">=></span> <span class="symbol">$reviewer_elf</span><span class="operator">-></span><span class="word">id</span><span class="structure">()</span><span class="operator">,</span><br /> <span class="word">form_id</span> <span class="operator">=></span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">form</span><span class="structure">()</span><span class="operator">-></span><span class="word">id</span><span class="structure">()</span><br /> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><br /> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">final_step</span><span class="structure">({</span><br /> <span class="word">state</span> <span class="operator">=></span> <span class="structure">{</span><br /> <span class="cast">%</span><span class="structure">{</span><span class="symbol">$self</span><span class="operator">-></span><span class="word">state</span><span class="structure">()}</span><span class="operator">,</span><br /> <span class="word">reason</span> <span class="operator">=></span> <span class="single">'Form received'</span><br /> <span class="structure">}</span><br /> <span class="structure">});</span><br /><span class="structure">}</span></code></pre>
<p>Note that each step either schedules the next step or calls <code>final_step</code> to indicate we're done. Along the way we modify the state we store as we pick up new details of things we need to keep track of - like the form_id - that we need to populate our attributes again.</p>
<h3 id="Testing-Time">Testing Time!</h3>
<p>'Wait, wait, wait,' says one of the elves. 'This is supposed to happen in the future. How do we make sure today that this process will work?'</p>
<p>Well, this is where unit testing comes into play. Here's how to write a test:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Test::MockDateTime</span><span class="structure">;</span><br /><span class="keyword">my</span> <span class="symbol">$longsteps</span> <span class="operator">=</span> <span class="word">Schedule::LongSteps</span><span class="operator">-></span><span class="word">new</span><span class="structure">();</span><br /><br /><span class="keyword">my</span> <span class="symbol">$january</span> <span class="operator">=</span> <span class="word">DateTime</span><span class="operator">-></span><span class="word">now</span><span class="operator">-></span><span class="word">set_month</span><span class="structure">(</span><span class="number">1</span><span class="structure">)</span><span class="operator">-></span><span class="word">truncate</span><span class="structure">(</span> <span class="word">to</span> <span class="operator">=></span> <span class="single">'month'</span> <span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">$february</span> <span class="operator">=</span> <span class="word">DateTime</span><span class="operator">-></span><span class="word">now</span><span class="operator">-></span><span class="word">set_month</span><span class="structure">(</span><span class="number">2</span><span class="structure">)</span><span class="operator">-></span><span class="word">truncate</span><span class="structure">(</span> <span class="word">to</span> <span class="operator">=></span> <span class="single">'month'</span> <span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">$november</span> <span class="operator">=</span> <span class="word">DateTime</span><span class="operator">-></span><span class="word">now</span><span class="operator">-></span><span class="word">set_month</span><span class="structure">(</span><span class="number">11</span><span class="structure">)</span><span class="operator">-></span><span class="word">truncate</span><span class="structure">(</span> <span class="word">to</span> <span class="operator">=></span> <span class="single">'month'</span> <span class="structure">);</span><br /><br /><span class="keyword">my</span> <span class="symbol">$p</span><span class="structure">;</span><br /><span class="word">on</span> <span class="symbol">$january</span> <span class="operator">=></span> <span class="keyword">sub</span><span class="structure">{</span><br /> <span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$longsteps</span><span class="operator">-></span><span class="word">instantiate_process</span><span class="structure">(</span><br /> <span class="single">'My::Process::NicenessFeedback'</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">santaHQ</span> <span class="operator">=></span> <span class="symbol">$santaHQ</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span> <span class="word">child_id</span> <span class="operator">=></span> <span class="number">1234</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><span class="structure">};</span><br /><br /><span class="keyword">my</span> <span class="symbol">$form</span><span class="structure">;</span><br /><span class="word">on</span> <span class="symbol">$february</span> <span class="operator">=></span> <span class="keyword">sub</span><span class="structure">{</span><br /> <span class="symbol">$longsteps</span><span class="operator">-></span><span class="word">run_due_processes</span><span class="structure">({</span> <span class="word">santaHQ</span> <span class="operator">=></span> <span class="symbol">$santaHQ</span> <span class="structure">});</span><br /> <span class="symbol">$p</span> <span class="operator">=</span> <span class="symbol">$longsteps</span><span class="operator">-></span><span class="word">find_process</span><span class="structure">(</span> <span class="symbol">$p</span> <span class="structure">);</span> <span class="comment"># Reload process.</span><br /> <span class="symbol">$form</span> <span class="operator">=</span> <span class="symbol">$santaHQ</span><span class="operator">-></span><span class="word">forms_register</span><span class="structure">()</span><span class="operator">-></span><span class="word">find</span><span class="structure">(</span> <span class="symbol">$p</span><span class="operator">-></span><span class="word">state</span><span class="structure">()</span><span class="operator">-></span><span class="structure">{</span><span class="word">form_id</span><span class="structure">}</span> <span class="structure">);</span><br /> <span class="word">ok</span><span class="structure">(</span><br /> <span class="symbol">$form</span><span class="operator">-></span><span class="word">is_sent</span><span class="structure">()</span><span class="operator">,</span><br /> <span class="double">"Process should now have a form_id, and the form should be sent"</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><span class="structure">};</span><br /><br /><span class="comment"># For this test, we assume the form was not sent back by November.<br /></span><span class="word">on</span> <span class="symbol">$november</span> <span class="operator">=></span> <span class="keyword">sub</span><span class="structure">{</span><br /> <span class="symbol">$longsteps</span><span class="operator">-></span><span class="word">run_due_processes</span><span class="structure">({</span> <span class="word">santaHQ</span> <span class="operator">=></span> <span class="symbol">$santaHQ</span> <span class="structure">});</span><br /> <span class="keyword">my</span> <span class="symbol">$child</span> <span class="operator">=</span> <span class="symbol">$santaHQ</span><span class="operator">-></span><span class="word">big_book_of_children</span><br /> <span class="operator">-></span><span class="word">find</span><span class="structure">(</span> <span class="symbol">$p</span><span class="operator">-></span><span class="word">state</span><span class="structure">()</span><span class="operator">-></span><span class="structure">{</span><span class="word">child_id</span><span class="structure">}</span> <span class="structure">);</span><br /> <span class="word">ok</span><span class="structure">(</span><br /> <span class="symbol">$child</span><span class="operator">-></span><span class="word">has_been_naughty</span><span class="structure">()</span><span class="operator">,</span><br /> <span class="double">"Child is marked as naughty"</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><span class="structure">};</span></code></pre>
<p>'By Odin, Rudolf, this is jolly good,' Santa says enthusiastically. 'So I just need to run <code>$longsteps->run_due_process()</code> from time to time?'</p>
<p>'That's right,' Rudolph replies. 'We'll put it in a cron job or an Event loop watcher, and we're good to go. We don't even need to worry about concurrency, so we can have as many machines as we want doing it.'</p>
<p>Santa looks thoughtful. 'And next year, maybe we can also use this to automate the gift-returning procedure...'</p>
<h3 id="Disclaimer">Disclaimer</h3>
<p>Of course there is no Santa Klaus, and this code has not been tested anywhere other than Dreamland. But what is the future, if not our dreams come true?</p>
<h3 id="See-Also">See Also</h3>
<p><a href="https://metacpan.org/module/BPM::Engine">BPM::Engine</a> A business process engine based on XPDL</p>
</div>2016-12-06T00:00:00ZJerome EteveGathering all the Presentshttp://perladvent.org/2016/2016-12-05.html<div class='pod'><p>Happy the Elf wasn't. Normally Happy by name and happy by nature, the elf was uncharacteristically grumpy. And the reason for this? An email from Santa.</p>
<pre><code> From: Santa <bigred@workshop.org.northpole>
To: All Staff <all@workshop.org.northpole>
Subject: Bonus Scheme
Greetings all!
Starting this year every elf working on project Christmas Eve will be
receiving a wonderful Christmas gift of their own!
Ho ho ho!
Santa.</code></pre>
<p>A well intentioned goodwill gesture for sure - however as usual no one had considered the effect this would have on the poor programmer-elf who would have to implement the code! Case in point: the simple code Happy had written many moons ago in order to pick what presents were needed to be produced this year:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">present_list</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span><br /> <span class="word">map</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">presents_asked_for</span> <span class="structure">}</span><br /> <span class="word">grep</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">naughty_or_nice</span> <span class="operator">eq</span> <span class="single">'nice'</span> <span class="structure">}</span><br /> <span class="symbol">$self</span><span class="operator">-></span><span class="word">children</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>It's simple functional code to get all presents asked for by all the nice children.</p>
<p>Now, with these new requirements, Happy would probably have to introduce a temporary variable:</p>
<pre><code class="code-listing"><span class="keyword">sub</span> <span class="word">present_list</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="symbol">$shift</span><span class="structure">;</span><br /><br /> <span class="keyword">my</span> <span class="symbol">@entities</span><span class="structure">;</span><br /> <span class="word">push</span> <span class="symbol">@entities</span><span class="operator">,</span> <span class="word">grep</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">naughty_or_nice</span> <span class="operator">eq</span> <span class="single">'nice'</span> <span class="structure">}</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">children</span><span class="structure">;</span><br /> <span class="word">push</span> <span class="symbol">@entities</span><span class="operator">,</span> <span class="word">grep</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">worked_on_xmas_eve</span> <span class="structure">}</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">elves</span><br /> <span class="word">if</span> <span class="symbol">$config</span><span class="operator">-></span><span class="word">santa_extra_gift_enabled</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="word">map</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="operator">-></span><span class="word">presents_asked_for</span><br /> <span class="structure">}</span> <span class="symbol">@entities</span><span class="structure">;</span><br /><span class="structure">}</span></code></pre>
<p>Feeling there must be a better solution, Happy scoured the CPAN until he found the <a href="https://metacpan.org/module/List::Gather">List::Gather</a> module, which uses Perl's pluggable keyword facilities to provide new <code>gather</code> and <code>take</code> syntax.</p>
<p>The <code>gather</code> keyword introduces a block of code that returns the list of whatever is taken within that block by making calls to `take` within.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">List::Gather</span> <span class="words">qw( gather take )</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">presents</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="word">map</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="operator">-></span><span class="word">presents_asked_for</span><br /> <span class="structure">}</span> <span class="word">gather</span> <span class="structure">{</span><br /> <span class="word">take</span> <span class="word">grep</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">naughty_or_nice</span> <span class="operator">eq</span> <span class="single">'nice'</span> <span class="structure">}</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">children</span><span class="structure">;</span><br /> <span class="word">take</span> <span class="word">grep</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">worked_on_xmas_eve</span> <span class="structure">}</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">elves</span><br /> <span class="word">if</span> <span class="symbol">$config</span><span class="operator">-></span><span class="word">santa_extra_gift_enabled</span><span class="structure">;</span><br /> <span class="structure">};</span><br /><span class="structure">}</span></code></pre>
<p>Now this did make Happy happy. What made him more happy was that when he got the next email from Santa with even more requirements he knew just what to do:</p>
<pre><code> From: Santa <bigred@workshop.org.northpole>
To: All Staff <all@workshop.org.northpole>
Subject: Bonus Scheme EXTENDED!
Greetings all!
I'm feeling extra jolly because of all your hard work! So jolly I've decided
to extend our bonus scheme to the wives and husbands of those people working
on Project Christmas Eve!
Ho ho ho!
Santa.</code></pre>
<p>Because <code>gather { ... }</code> is a block containing arbitrary statements he was easily able to convert the <code>push ... grep</code> into a more complex <code>for</code> loop, right in the middle of the <code>gather</code> statement:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">List::Gather</span> <span class="words">qw( gather take )</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">presents</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="keyword">return</span> <span class="word">map</span> <span class="structure">{</span><br /> <span class="magic">$_</span><span class="operator">-></span><span class="word">presents_asked_for</span><br /> <span class="structure">}</span> <span class="word">gather</span> <span class="structure">{</span><br /> <span class="word">take</span> <span class="word">grep</span> <span class="structure">{</span> <span class="magic">$_</span><span class="operator">-></span><span class="word">naughty_or_nice</span> <span class="operator">eq</span> <span class="single">'nice'</span> <span class="structure">}</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">children</span><span class="structure">;</span><br /><br /> <span class="keyword">if</span> <span class="structure">(</span><span class="symbol">$config</span><span class="operator">-></span><span class="word">santa_extra_gift_enabled</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="keyword">for</span> <span class="keyword">my</span> <span class="symbol">$elf</span> <span class="structure">(</span><span class="symbol">$self</span><span class="operator">-></span><span class="word">elves</span><span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">next</span> <span class="word">unless</span> <span class="symbol">$elf</span><span class="operator">-></span><span class="word">worked_on_xmas_eve</span><span class="structure">;</span><br /> <span class="word">take</span> <span class="symbol">$elf</span><span class="structure">;</span><br /> <span class="keyword">my</span> <span class="symbol">$spouse</span> <span class="operator">=</span> <span class="symbol">$elf</span><span class="operator">-></span><span class="word">spouse</span><span class="structure">;</span><br /> <span class="word">next</span> <span class="word">unless</span> <span class="symbol">$spouse</span><span class="structure">;</span><br /> <span class="word">next</span> <span class="word">if</span> <span class="symbol">$spouse</span><span class="operator">-></span><span class="word">worked_on_xmas_eve</span><span class="structure">;</span><br /> <span class="word">take</span> <span class="symbol">$spouse</span><span class="structure">;</span><br /> <span class="structure">}</span><br /> <span class="structure">}</span><br /> <span class="structure">};</span><br /><span class="structure">}</span></code></pre>
<h3 id="Constructor">Constructor</h3>
<p>Now that Happy had <a href="https://metacpan.org/module/List::Gather">List::Gather</a> in his toolset he started to find all kinds of places that he could use it.</p>
<p>The present wrapping code was a perfect example. Here's the code Happy needed to modify:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$wrapping</span> <span class="operator">=</span> <span class="word">Present::Wrapping</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">color</span> <span class="operator">=></span> <span class="single">'green'</span><span class="operator">,</span><br /><span class="structure">);</span></code></pre>
<p>Happy needed to make changes so that if the gift was large, the Present::Wrapping instance would be set up to use an extra large sheet, passing the <code>extra_large_sheet</code> option if and only if <code>$gift->large</code> was true. His first attempt was to pull out the arguments into an array that he built up in advance:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">@args</span> <span class="operator">=</span> <span class="structure">(</span><br /> <span class="word">color</span> <span class="operator">=></span> <span class="single">'green'</span><span class="structure">;</span><br /><span class="structure">);</span><br /><br /><span class="word">push</span> <span class="symbol">@args</span><span class="operator">,</span> <span class="word">extra_large_sheet</span> <span class="operator">=></span> <span class="number">1</span><br /> <span class="word">if</span> <span class="symbol">$gift</span><span class="operator">-></span><span class="word">large</span><span class="structure">;</span><br /><br /><span class="keyword">my</span> <span class="symbol">$wrapping</span> <span class="operator">=</span> <span class="word">Present::Wrapping</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span> <span class="symbol">@args</span> <span class="structure">);</span></code></pre>
<p>This code, besides taking many more lines than before, is also somewhat less readable. Another programmer-elf looking at the code for the first time has no idea what the arguments being created are for until they read the very last line; Only at that point can they go back and make sense of why <code>@args</code> is set up the way it is.</p>
<p>The traditional way to write this inline is to abuse the ternary operation, producing the particularly unreadable:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$wrapping</span> <span class="operator">=</span> <span class="word">Present::Wrapping</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><br /> <span class="word">color</span> <span class="operator">=></span> <span class="single">'green'</span><span class="operator">,</span><br /> <span class="structure">(</span> <span class="structure">(</span><span class="symbol">$gift</span><span class="operator">-></span><span class="word">large</span><span class="structure">)</span> <span class="operator">?</span> <span class="structure">(</span> <span class="word">extra_large_sheet</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">)</span> <span class="operator">:</span> <span class="structure">())</span><span class="operator">,</span><br /><span class="structure">);</span></code></pre>
<p>Happy wasn't happy about this code. All those brackets weren't only hard to type but forced Happy to think too hard whenever he was debugging code that used this construct. Had he missed a brace? Did he need to wrap the <code>$gift->large</code> in brackets or not?</p>
<p>Now that he had <a href="https://metacpan.org/module/List::Gather">List::Gather</a> in his toolset, Happy was able to do something much more readable:</p>
<pre><code class="code-listing"><span class="keyword">my</span> <span class="symbol">$wrapping</span> <span class="operator">=</span> <span class="word">Present::Wrapping</span><span class="operator">-></span><span class="word">new</span><span class="structure">(</span><span class="word">gather</span> <span class="structure">{</span><br /> <span class="word">take</span> <span class="word">color</span> <span class="operator">=></span> <span class="single">'green'</span><span class="structure">;</span><br /> <span class="word">take</span> <span class="word">extra_large_sheet</span> <span class="operator">=></span> <span class="number">1</span> <span class="word">if</span> <span class="symbol">$gift</span><span class="operator">-></span><span class="word">large</span><span class="structure">;</span><br /><span class="structure">});</span></code></pre>
</div>2016-12-05T00:00:00ZMark FowlerYuletide Logginghttp://perladvent.org/2016/2016-12-04.html<div class='pod'><blockquote><i>
'Twas a night before Christmas and on the ops floor<br>
All the servers were humming behind the closed door<br>
The app was deployed to the servers with care<br>
In hopes that the customers soon would be there<br>
When from out of the phone there arose such a clatter<br>
I sprang out of my chair to see what was the matter<br>
"The website is down!" said the boss with a shout<br>
"We need to make money, so figure it out!"<br>
I logged in to the server and looked all around<br>
But the app had no logging; no reason was found<br>
With no other choice, I called the developer<br>
Who said "just restart it, I'm sure that'll fix 'er"<br>
I ran the right service, up the app came<br>
Only to come down again and again<br>
If there but was a way to know what was wrong<br>
I could fix it for sure, but no logging was found
</i></blockquote>
<p>Good logging is crucial for applications in production. In an emergency, you will want it to be as easy as possible to track down problems when they happen. With good logs you can ensure that minor bugs don't cause major downtime and data loss problems. Good logs can help track down security issues and can provide an auditable trail of changes to track down who did what and when.</p>
<p><a href="https://metacpan.org/module/Log::Any">Log::Any</a> is a lightweight, generic API built for interoperable logging for <a href="http://cpan.org">CPAN</a> modules. Much like <a href="http://dbi.perl.org">DBI</a> allows interoperable database interfaces, <a href="http://metacpan.org/pod/CHI">CHI</a> allows interoperable caching interfaces, and <a href="http://plackperl.org">PSGI</a> allows interoperable web applications, Log::Any allows a CPAN module to produce logs that fit into your environment whether you just want to see logs on your terminal, you're using <a href="http://mschilli.github.io/log4perl/">Log4perl</a> to directly send e-mail alerts to your operations team, or you're using a local <a href="http://www.rsyslog.com">rsyslog</a> daemon to transmit logs to an <a href="https://www.elastic.co/products/elasticsearch">ElasticSearch</a> instance via <a href="https://www.elastic.co/products/logstash">Logstash</a>.</p>
<p>To achieve this interoperability, Log::Any is split up into two parts: Producers produce logs using a Log::Any object, and consumers consume those logs using a Log::Any::Adapter object. First we'll cover how to produce logs, then we'll cover how to consume them to display logs on your terminal.</p>
<h3 id="Setting-our-Application-up-to-Log:-Using-a-Producer">Setting our Application up to Log: Using a Producer</h3>
<p>To get started using Log::Any to produce logs, we just need to use and create a Log::Any object. The simplest way is by creating a single log object for your program when importing Log::Any:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Log::Any</span> <span class="single">'$LOG'</span><span class="structure">;</span></code></pre>
<p>If you've got an object-oriented module, you can load your log object lazily using the `get_logger` method and store it in your object:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Moo</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Log::Any</span><span class="structure">;</span><br /><span class="word">has</span> <span class="word">log</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'lazy'</span><span class="operator">,</span> <span class="word">default</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">Log::Any</span><span class="operator">-></span><span class="word">get_logger</span> <span class="structure">}</span> <span class="structure">);</span></code></pre>
<p>Now that we have a log object, we can start producing logs. By default, they won't go anywhere, and we'll set up a consumer later. For now, let's just write some logs to tell our operations staff what's going on in our application.</p>
<p>Log::Any has methods to produce logs at various named severity levels, including the standard <a href="https://en.wikipedia.org/wiki/Log4j#Log4j_Log_Levels">Log4j-ish levels</a> of <code>fatal</code>, <code>error</code>, <code>warning</code>, <code>info</code>, <code>debug</code>, and <code>trace</code>, and the <a href="https://en.wikipedia.org/wiki/Syslog#Severity_level">Syslog severity levels</a> (which include "critical", "alert", and "emergency"). To emit a log message, simply call one of these methods with the message as an argument:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">DBI</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Log::Any</span> <span class="single">'$LOG'</span><span class="structure">;</span><br /><br /><span class="symbol">$LOG</span><span class="operator">-></span><span class="word">info</span><span class="structure">(</span> <span class="double">"Connecting to database"</span> <span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">$dbh</span> <span class="operator">=</span> <span class="word">DBI</span><span class="operator">-></span><span class="word">connect</span><span class="structure">(</span> <span class="single">'dbi:SQLite:data.db'</span> <span class="structure">);</span></code></pre>
<p>Log::Any also has a set of formatter methods similar to <a href="http://perldoc.perl.org/functions/sprintf.html">sprintf</a> to make formatting log messages easier. These methods are the same name as the severity level, but with an "f" at the end (like <code>errorf()</code>, <code>warningf()</code>, <code>infof()</code>, etc...). These methods take a format string as the first argument, and format the remaining arguments using the format string (exactly like <a href="http://perldoc.perl.org/functions/sprintf.html">sprintf</a>). Any objects given to these methods will be printed with <a href="http://perldoc.perl.org/Data/Dumper.html">Data::Dumper</a> for quick debugging.</p>
<p>The log message is returned by the log method and can be used further, for example, to throw an exception with <a href="http://perldoc.perl.org/functions/die.html">die</a> after writing a log message with <code>errorf()</code>, or to use <a href="http://perldoc.perl.org/functions/warn.html">warn</a> to ensure the log message is seen on STDERR even if you're logging to a file.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">DBI</span><span class="structure">;</span><br /><span class="keyword">use</span> <span class="word">Log::Any</span> <span class="single">'$LOG'</span><span class="structure">;</span><br /><br /><span class="symbol">$LOG</span><span class="operator">-></span><span class="word">info</span><span class="structure">(</span> <span class="double">"Connecting to database"</span> <span class="structure">);</span><br /><span class="keyword">my</span> <span class="symbol">$dbh</span> <span class="operator">=</span> <span class="word">DBI</span><span class="operator">-></span><span class="word">connect</span><span class="structure">(</span> <span class="single">'dbi:SQLite:data.db'</span> <span class="structure">);</span><br /><span class="keyword">if</span> <span class="structure">(</span> <span class="operator">!</span><span class="symbol">$dbh</span> <span class="structure">)</span> <span class="structure">{</span><br /> <span class="word">die</span> <span class="symbol">$LOG</span><span class="operator">-></span><span class="word">errorf</span><span class="structure">(</span> <span class="single">'Could not connect to database: %s'</span><span class="operator">,</span> <span class="symbol">$DBI::errstr</span> <span class="structure">);</span><br /><span class="structure">}</span><br /><span class="symbol">$LOG</span><span class="operator">-></span><span class="word">info</span><span class="structure">(</span> <span class="double">"Database connected"</span> <span class="structure">);</span></code></pre>
<h3 id="Storing-the-Logs-Somewhere:-Wiring-up-a-Consumer">Storing the Logs Somewhere: Wiring up a Consumer</h3>
<blockquote>The major difference between a thing that might go wrong and a thing that
cannot possibly go wrong is that when a thing that cannot possibly go wrong
goes wrong it usually turns out to be impossible to get at or repair.
<cite>Douglas Adams</cite></blockquote>
<p>Now that we have some log lines being written, we need to give them somewhere to go. Log::Any has a set of "adapters" (in the <a href="https://metacpan.org/module/Log::Any::Adapter">Log::Any::Adapter</a> namespace) that allow logs written using Log::Any to be written to various places.</p>
<p>For example, if you want to throw logs to <code>STDERR</code> on your terminal, you can set up the "Stderr" adapter:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Log::Any::Adapter</span> <span class="single">'Stderr'</span><span class="structure">;</span></code></pre>
<p>Now when any log line is written, it will go to <code>STDERR</code>.</p>
<p>There are adapters to make Log::Any log to syslog, files, and even other logging systems like <a href="https://metacpan.org/module/Log::Dispatch">Log::Dispatch</a> and <a href="https://metacpan.org/module/Log::Log4perl">Log::Log4perl</a>. These adapters make Log::Any a perfect choice for logging in CPAN modules: If the user wants to see logs, they get to see them in the same way as all other logs in their application, otherwise, the logging is there when they need it.</p>
<p>The adapter is also where we decide what level of logs we want to see. Some adapters handle this with their own configuration, like Log::Dispatch and Log::Log4perl. For our simple example, we need to handle it ourselves. Let's allow our operations staff to set the <code>LOG_LEVEL</code> environment variable, and have it default to <code>warn</code>.</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Log::Any::Adapter</span> <span class="single">'Stderr'</span><span class="operator">,</span> <span class="word">log_level</span> <span class="operator">=></span> <span class="symbol">$ENV</span><span class="structure">{</span><span class="word">LOG_LEVEL</span><span class="structure">}</span> <span class="operator">||</span> <span class="double">"warn"</span><span class="structure">;</span></code></pre>
<p>That's all there is to getting started using Log::Any. For those concerned about bloating their dependency tree, Log::Any has no non-core dependencies. For those who value backwards-compatibility, Log::Any is supported back to very early versions of Perl 5.8 (and if it is broken for versions before that, <a href="http://github.com/preaction/Log-Any">patches are welcome</a>).</p>
<blockquote><i>
Now that the logging is hung in our program with care<br>
I searched for the log file I knew would be there<br>
Inside I would find all the things I could know<br>
About problems and issues and something to go<br>
Now that I know what the problem's about<br>
I can fix it for sure so the app just stays up
</i></blockquote>
</div>2016-12-04T00:00:00ZDoug BellBenchmarking with Bencherhttp://perladvent.org/2016/2016-12-03.html<div class='pod'><p>Santa had a problem. Due to increased use of machine transcription, his list of children's names contained far more misspellings than previously, and really had to be checked twice. Santa ordered one of the elves to write a Perl script for this task, but there was a catch - with more than a billion names on the list, the script needed to be fast.</p>
<p>The elf started off by evaluating several modules on CPAN that calculate <a href="https://en.wikipedia.org/wiki/Levenshtein_distance">Levenshtein edit distance</a> (among others: <a href="https://metacpan.org/module/Text::Levenshtein">Text::Levenshtein</a>, <a href="https://metacpan.org/module/Text::Levenshtein::XS">Text::Levenshtein::XS</a>, <a href="https://metacpan.org/module/Text::Levenshtein::Flexible">Text::Levenshtein::Flexible</a>, <a href="https://metacpan.org/module/Text::LevenshteinXS">Text::LevenshteinXS</a>) and trying to pick one to use for his script, preferably the fastest one.</p>
<p>"I'll simply write a benchmark script to find out which one is the fastest," he thought to himself. Normally, that script would have used the built-in <a href="https://metacpan.org/module/Benchmark">Benchmark</a> module, like this:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Benchmark</span> <span class="single">'cmpthese'</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Text::Levenshtein</span> <span class="structure">();</span><br /><span class="keyword">use</span> <span class="word">Text::Levenshtein::XS</span> <span class="structure">();</span><br /><span class="keyword">use</span> <span class="word">Text::Levenshtein::Flexible</span> <span class="structure">();</span><br /><span class="keyword">use</span> <span class="word">Text::LevenshteinXS</span> <span class="structure">();</span><br /><br /><span class="word">cmpthese</span><span class="structure">(</span><br /> <span class="number">100_000</span><span class="operator">,</span><br /> <span class="structure">{</span><br /> <span class="single">'Text::Levenshtein'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">Text::Levenshtein::fastdistance</span><span class="structure">(</span><span class="double">"foo"</span><span class="operator">,</span> <span class="double">"bar"</span><span class="structure">)</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'Text::Levenshtein::XS'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">Text::Levenshtein::XS::distance</span><span class="structure">(</span><span class="double">"foo"</span><span class="operator">,</span> <span class="double">"bar"</span><span class="structure">)</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'Text::Levenshtein::Flexible'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">Text::Levenshtein::Flexible::levenshtein</span><span class="structure">(</span><span class="double">"foo"</span><span class="operator">,</span> <span class="double">"bar"</span><span class="structure">)</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="single">'Text::LevenshteinXS'</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span> <span class="word">Text::LevenshteinXS::distance</span><span class="structure">(</span><span class="double">"foo"</span><span class="operator">,</span> <span class="double">"bar"</span><span class="structure">)</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">}</span><br /><span class="structure">);</span></code></pre>
<p>but I tricked him into, er, suggested, trying the <a href="https://metacpan.org/module/Bencher">Bencher</a> framework for a change. So here's what he wrote instead:</p>
<pre><code class="code-listing"><span class="comment"># lib/Bencher/Scenario/Levenshtein.pm<br /></span><span class="keyword">package</span> <span class="word">Bencher::Scenario::Levenshtein</span><span class="structure">;</span><br /><span class="keyword">our</span> <span class="symbol">$scenario</span> <span class="operator">=</span> <span class="structure">{</span><br /> <span class="word">summary</span> <span class="operator">=></span> <span class="single">'Benchmark modules that calculate Levenshtein edit distance'</span><span class="operator">,</span><br /> <span class="word">participants</span> <span class="operator">=></span> <span class="structure">[</span><br /> <span class="structure">{</span><span class="word">fcall_template</span> <span class="operator">=></span> <span class="double">"Text::Levenshtein::fastdistance(<word1>, <word2>)"</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><span class="word">fcall_template</span> <span class="operator">=></span> <span class="double">"Text::Levenshtein::XS::distance(<word1>, <word2>)"</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><span class="word">fcall_template</span> <span class="operator">=></span> <span class="double">"Text::Levenshtein::Flexible::levenshtein(<word1>, <word2>)"</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">{</span><span class="word">fcall_template</span> <span class="operator">=></span> <span class="double">"Text::LevenshteinXS::distance(<word1>, <word2>)"</span><span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">]</span><span class="operator">,</span><br /> <span class="word">datasets</span> <span class="operator">=></span> <span class="structure">[</span><br /> <span class="structure">{</span> <span class="word">name</span> <span class="operator">=></span> <span class="double">"foo"</span><span class="operator">,</span> <span class="word">args</span> <span class="operator">=></span> <span class="structure">{</span><span class="word">word1</span><span class="operator">=></span><span class="double">"foo"</span><span class="operator">,</span> <span class="word">word2</span><span class="operator">=></span><span class="double">"bar"</span><span class="structure">}</span><span class="operator">,</span> <span class="word">result</span> <span class="operator">=></span> <span class="number">3</span> <span class="structure">}</span><span class="operator">,</span><br /> <span class="structure">]</span><span class="operator">,</span><br /><span class="structure">};</span></code></pre>
<p>What's different between the two? First of all, the script is turned into a module containing a data structure called <i>scenario</i>. The code snippets, called <i>participants</i>, are turned into code templates with variables written inside angle brackets like this: <code><name></code>. The variable values are put in the <i>datasets</i> key.</p>
<p>How do we run this scenario module? Install the <a href="https://metacpan.org/module/bencher-tiny">bencher-tiny</a> script from the <a href="https://metacpan.org/release/Bencher-Tiny">Bencher-Tiny</a> distribution:</p>
<pre><code> % cpanm -n Bencher::Tiny</code></pre>
<p>then run:</p>
<pre><code> % PERL5OPT=-Ilib bencher-tiny -c 100000 Levenshtein</code></pre>
<p>The output will be identical to the output of the first script we saw, because <code>bencher-tiny</code> also uses <a href="https://metacpan.org/module/Benchmark">Benchmark</a> to benchmark the code:</p>
<pre><code> (warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
(warning: too few iterations for a reliable count)
Rate Text::Levenshtein::fastdistance Text::Levenshtein::XS::distance Text::LevenshteinXS::distance Text::Levenshtein::Flexible::levenshtein
Text::Levenshtein::fastdistance 52083/s -- -92% -99% -99%
Text::Levenshtein::XS::distance 666667/s 1180% -- -87% -87%
Text::LevenshteinXS::distance 5000000/s 9500% 650% -- -0%
Text::Levenshtein::Flexible::levenshtein 5000000/s 9500% 650% 0% --</code></pre>
<p>However, turning our benchmark script into a scenario module means we can do a lot more things with it. First of all, let's use the full-featured CLI <a href="https://metacpan.org/module/bencher">bencher</a> (from the <a href="https://metacpan.org/release/Bencher">Bencher</a> distribution) instead of <code>bencher-tiny</code>. Install it from CPAN (this might take a while, as it has quite a lot of dependencies):</p>
<pre><code> % cpanm -n Bencher</code></pre>
<p>then run:</p>
<pre><code> % bencher -Ilib -m Levenshtein
+------------------------------------------+-----------+-----------+------------+---------+---------+
| participant | rate (/s) | time (us) | vs_slowest | errors | samples |
+------------------------------------------+-----------+-----------+------------+---------+---------+
| Text::Levenshtein::fastdistance | 51000 | 20 | 1 | 3.3e-08 | 20 |
| Text::Levenshtein::XS::distance | 757000 | 1.32 | 14.8 | 1.8e-10 | 20 |
| Text::LevenshteinXS::distance | 8500000 | 0.12 | 170 | 2.4e-10 | 20 |
| Text::Levenshtein::Flexible::levenshtein | 8850000 | 0.113 | 173 | 1.1e-10 | 20 |
+------------------------------------------+-----------+-----------+------------+---------+---------+</code></pre>
<p>You'll notice several things are different. Instead of <a href="https://metacpan.org/module/Benchmark">Benchmark</a>, the <code>bencher</code> CLI by default uses <a href="https://metacpan.org/module/Dumbbench">Dumbbench</a> to benchmark the code. It then presents the results as a table.</p>
<p>You'll also notice that the script returns <i>much</i> more quickly, and the result is more accurate for the faster participants. Recall that <a href="https://metacpan.org/module/Benchmark">Benchmark</a> complained above that we didn't use enough iterations for a reliable count. To avoid this warning, we would need to set count to something like 3_000_000 - but imagine how long it would take for the benchmark to run in this case (~1 minute, because <a href="https://metacpan.org/module/Text::Levenshtein">Text::Levenshtein</a> can only perform ~50k calculations per second). By contrast, if you look at the <code>samples</code> result field, you'll see that <a href="https://metacpan.org/module/Dumbbench">Dumbbench</a> only needs about 20 runs for each participant. You don't actually even need to set the <code>count</code> parameter, because it will figure out the minimum sufficient number of runs.</p>
<p>Aside from this difference in output, there are quite a number of other things we can do.</p>
<h3 id="Adding-more-datasets">Adding more datasets</h3>
<p>Remember how we split the code and data when we constructed the scenario? The benefit of doing this is that we can easily add more data. Let's say we want to measure performance for some longer word. We'll just add this to our <code>datasets</code>:</p>
<pre><code class="code-listing"><span class="structure">{</span> <span class="word">name</span> <span class="operator">=></span> <span class="double">"program"</span><span class="operator">,</span> <span class="word">args</span> <span class="operator">=></span> <span class="structure">{</span><span class="word">word1</span><span class="operator">=></span><span class="double">"program"</span><span class="operator">,</span> <span class="word">word2</span><span class="operator">=></span><span class="double">"porgram"</span><span class="structure">}</span><span class="operator">,</span> <span class="word">result</span> <span class="operator">=></span> <span class="number">2</span> <span class="structure">}</span><span class="operator">,</span></code></pre>
<p>then run:</p>
<pre><code> % bencher -Ilib -m Levenshtein
+------------------------------------------+---------+-----------+-----------+------------+---------+---------+
| participant | dataset | rate (/s) | time (us) | vs_slowest | errors | samples |
+------------------------------------------+---------+-----------+-----------+------------+---------+---------+
| Text::Levenshtein::fastdistance | program | 11000 | 89 | 1 | 1.1e-07 | 20 |
| Text::Levenshtein::fastdistance | foo | 52000 | 19 | 4.7 | 3.3e-08 | 20 |
| Text::Levenshtein::XS::distance | program | 480000 | 2.1 | 43 | 3.3e-09 | 20 |
| Text::Levenshtein::XS::distance | foo | 738000 | 1.36 | 65.7 | 4.2e-10 | 20 |
| Text::LevenshteinXS::distance | program | 3180000 | 0.314 | 284 | 9.7e-11 | 28 |
| Text::Levenshtein::Flexible::levenshtein | program | 4170000 | 0.24 | 371 | 1.7e-10 | 25 |
| Text::LevenshteinXS::distance | foo | 7300000 | 0.137 | 650 | 4.5e-11 | 20 |
| Text::Levenshtein::Flexible::levenshtein | foo | 7660000 | 0.131 | 682 | 4.6e-11 | 20 |
+------------------------------------------+---------+-----------+-----------+------------+---------+---------+</code></pre>
<p>There's now a <code>dataset</code> result field, since we are running with multiple datasets.</p>
<h3 id="Filtering-datasets-participants-modules-etc">Filtering datasets, participants, modules, etc</h3>
<p>To use only one specific dataset:</p>
<pre><code> % bencher -Ilib -m Levenshtein --include-dataset program</code></pre>
<p>There are similar options to include only certain participants: <code>--include-participant</code>, <code>--exclude-participant</code>, <code>--include-participant-pattern</code>, and so on. We can also include/exclude certain modules. For example, let's just exclude all the pure-Perl modules because they have no hope of competing with XS:</p>
<pre><code> % bencher -Ilib -m Levenshtein --include-dataset program --nopp
+------------------------------------------+-----------+-----------+------------+---------+---------+
| participant | rate (/s) | time (us) | vs_slowest | errors | samples |
+------------------------------------------+-----------+-----------+------------+---------+---------+
| Text::Levenshtein::XS::distance | 410000 | 2.5 | 1 | 3.3e-09 | 20 |
| Text::LevenshteinXS::distance | 2800000 | 0.357 | 6.9 | 3.3e-10 | 20 |
| Text::Levenshtein::Flexible::levenshtein | 3600000 | 0.28 | 8.9 | 4.3e-10 | 24 |
+------------------------------------------+-----------+-----------+------------+---------+---------+</code></pre>
<p>There are other kinds of filtering available, for example by tags, sequence, and so on.</p>
<p>Instead of running the benchmark, you can also verify or inspect the participants (<code>--list-participants</code>) and datasets (<code>--list-datasets</code>), or just run the code once and display the result (<code>--show-items-results</code>):</p>
<pre><code> % bencher -Ilib -m Levenshtein --nopp --show-items-results
#0 (dataset=foo participant=Text::Levenshtein::XS::distance):
3
#1 (dataset=program participant=Text::Levenshtein::XS::distance):
2
#2 (dataset=foo participant=Text::Levenshtein::Flexible::levenshtein):
3
#3 (dataset=program participant=Text::Levenshtein::Flexible::levenshtein):
2
#4 (dataset=foo participant=Text::LevenshteinXS::distance):
3
#5 (dataset=program participant=Text::LevenshteinXS::distance):
2</code></pre>
<h3 id="Checking-the-results-first">Checking the results first</h3>
<p>Notice that in each dataset, we added this:</p>
<pre><code class="code-listing"><span class="word">result</span> <span class="operator">=></span> <span class="number">2</span></code></pre>
<p>or:</p>
<pre><code class="code-listing"><span class="word">result</span> <span class="operator">=></span> <span class="number">3</span></code></pre>
<p>This parameter is optional, but if we specify a value here then <code>bencher</code> will first compare it to the results of running the code, to make sure that the code we are benchmarking returns the correct result. Fast but wrong code is useless, after all.</p>
<h2 id="More-features">More features</h2>
<p><a href="https://metacpan.org/module/Bencher">Bencher</a> can do plenty of other things, for example:</p>
<ul>
<li><p>Benchmark module startup overhead</p>
</li>
<li><p>Benchmark against multiple perls</p>
</li>
<li><p>Benchmark against multiple module versions</p>
</li>
<li><p>Show data structure size and memory usage</p>
</li>
<li><p>Present the results as a chart or graph</p>
</li>
<li><p>Include CPU/other system information</p>
</li>
<li><p>Return raw structured data (<code>--json</code>) for easy manipulation or transport to servers</p>
</li>
</ul>
<p>I've also written plugins for <a href="https://metacpan.org/module/Dist::Zilla">Dist::Zilla</a> and other CLI tools related to Bencher. For one-off benchmarking this might not mean much, but if you regularly use benchmarking when developing (for example to watch out for performance regression), <a href="https://metacpan.org/module/Bencher">Bencher</a> can be a useful addition to your toolbox.</p>
<h2 id="SEE-ALSO">SEE ALSO</h2>
<ul>
<li><p><a href="https://metacpan.org/module/Bencher">Bencher</a></p>
</li>
<li><p><a href="https://metacpan.org/module/Bencher::Scenario::LevenshteinModules">Bencher::Scenario::LevenshteinModules</a></p>
</li>
</ul>
</div>2016-12-03T00:00:00ZperlancarJingle Refs, Jingle Refs, Jingle all the wayhttp://perladvent.org/2016/2016-12-02.html<div class='pod'><p>Santa's organization has two essential problems with their code; It has to work 100% of the time (you don't get a second chance at Christmas) and it has to be really efficient and fast (since there's a <i>lot</i> of children in the world and hence the code has to crunch a heck of a lot of data).</p>
<p>Often these goals can be at odds with each other. Abstractions, while inherently safer than writing the same small snippet of code over and over (each with a chance to make a small mistake) can lead to slower code, introducing extra subroutine calls or data processing that simply wasn't there before.</p>
<p>But not always. Tonight we'll tell you of the wondrous day that a Christmas miracle occurred in the most unlikely of places...the annual code review.</p>
<h3 id="A-Worrying-Pattern">A Worrying Pattern</h3>
<p>During the annual code-review, the Wise Old Elf had noticed a common pattern we all maintain: Checking the validity of references using the <code>ref</code> function:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Carp</span> <span class="words">qw<croak></span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">create_sled</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="structure">(</span> <span class="symbol">$self</span><span class="operator">,</span> <span class="symbol">$args</span> <span class="structure">)</span> <span class="operator">=</span> <span class="magic">@_</span><span class="structure">;</span><br /><br /> <span class="word">ref</span> <span class="symbol">$args</span> <span class="operator">eq</span> <span class="single">'HASH'</span><br /> <span class="operator">or</span> <span class="word">croak</span><span class="structure">(</span><span class="single">'Sled expects arguments as a hashref'</span><span class="structure">);</span><br /><br /> <span class="operator">...</span><br /><span class="structure">}</span></code></pre>
<p>There were two problems with this ref checking pattern. Sometimes some elf would get it subtly wrong:</p>
<pre><code class="code-listing"><span class="word">ref</span> <span class="symbol">$plate_contents</span> <span class="operator">eq</span> <span class="single">'HSAH'</span><br /> <span class="operator">or</span> <span class="word">croak</span><span class="structure">(</span><span class="single">'Plate contents not passed as a hashref'</span><span class="structure">);</span></code></pre>
<p>So subtle was that error that it often slipped through into production! Perl isn't smart enough to know that <code>HSAH</code> probably is a typo for <code>HASH</code> and not a class name the reference had been blessed into. And since error checking code is notoriously hard to test, the automated test suites that Santa insisted on frequently didn't always catch this problem.</p>
<p>If the elves were worried only about correctness alone the code could be replaced by something like this:</p>
<pre><code class="code-listing"><span class="comment"># In a module, and tested<br /></span><span class="keyword">sub</span> <span class="word">is_plain_hash</span> <span class="structure">{</span><br /> <span class="keyword">return</span> <span class="word">ref</span> <span class="magic">$_</span><span class="structure">[</span><span class="number">0</span><span class="structure">]</span> <span class="operator">eq</span> <span class="single">'HASH'</span><span class="structure">;</span><br /><span class="structure">}</span><br /><br /> <span class="word">is_plain_hash</span><span class="structure">(</span><span class="symbol">$plate_contents</span><span class="structure">)</span><br /> <span class="operator">or</span> <span class="word">croak</span><span class="structure">(</span><span class="single">'Plate contents not passed as a hashref'</span><span class="structure">);</span></code></pre>
<p>Since under strict a typo in <code>is_plain_hash</code> would cause a compile time error any fat-fingered elf problems would be caught. However, that's just introduced another subroutine call, and since these ref checks are used all over Santa's code base, the cost of doing that adds up!</p>
<p>The second problem with the pattern was the fact that it was strictly doing more work that it needed to. Consider what the following was actually doing:</p>
<pre><code class="code-listing"><span class="word">ref</span> <span class="symbol">$args</span> <span class="operator">eq</span> <span class="single">'HASH'</span></code></pre>
<p>Under the hood perl is inspecting bits on the reference to see if it's a <code>HV</code> (a <code>hash value</code>), then building a standard SV string containing the four characters <code>HASH</code>, then in a separate operation comparing that string character by character with the <code>HASH</code> string which was passed in in the code. That, of course, all happens really quickly, but it's more work (and more Perl virtual ops) than need to happen just to tell if this was a hashref or not!</p>
<p>So the code review had found two problems: One, an abstraction was needed to make it more reliable. Two, the code needed to involve less stuff to make it faster.</p>
<h3 id="The-Voice-of-an-Angel">The Voice of an Angel</h3>
<p>Thank goodness the Wise Old Elf read a <a href="http://cpan-weekly.org/">CPAN Weekly</a> email about this very pattern! Because of this he decided to try out a new module, <a href="https://metacpan.org/module/Ref::Util">Ref::Util</a>.</p>
<p><a href="https://metacpan.org/module/Ref::Util">Ref::Util</a> provides a set of helpful functions to determine what kind of reference a variable is. It abstracts several awkward ref checking patterns into new function. So, as our above example:</p>
<pre><code class="code-listing"><span class="word">ref</span> <span class="symbol">$args</span> <span class="operator">eq</span> <span class="single">'HASH'</span><br /> <span class="operator">or</span> <span class="word">croak</span><span class="structure">(</span><span class="single">'Sled expects arguments as a hashref'</span><span class="structure">);</span></code></pre>
<p>Can be written as:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Ref::Util</span> <span class="words">qw<is_plain_hash></span><span class="structure">;</span><br /><br /><span class="word">is_plain_hash</span><span class="structure">(</span><span class="symbol">$args</span><span class="structure">)</span><br /> <span class="operator">or</span> <span class="word">croak</span><span class="structure">(</span><span class="single">'Sled expects arguments as a hashref'</span><span class="structure">);</span></code></pre>
<p>Or maybe we want to check for a blessed array reference, making sure it isn't accidentally blessed? The following:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Scalar::Util</span> <span class="words">qw<blessed reftype></span><span class="structure">;</span><br /><br /><span class="word">blessed</span> <span class="symbol">$args</span> <span class="operator">&&</span> <span class="word">ref</span> <span class="symbol">$args</span> <span class="operator">&&</span> <span class="word">reftype</span><span class="structure">(</span><span class="symbol">$args</span><span class="structure">)</span> <span class="operator">eq</span> <span class="single">'ARRAY'</span><br /> <span class="operator">or</span> <span class="word">Carp::croak</span><span class="structure">(</span><span class="single">'Uh oh, we require a blessed reference'</span><span class="structure">);</span></code></pre>
<p>Can much more succinctly be written with the new <code>is_blessed_arrayref</code> function:</p>
<pre><code class="code-listing"><span class="keyword">use</span> <span class="word">Ref::Util</span> <span class="words">qw<is_blessed_arrayref></span><span class="structure">;</span><br /><br /><span class="word">is_blessed_arrayref</span><span class="structure">(</span><span class="symbol">$args</span><span class="structure">)</span><br /> <span class="operator">or</span> <span class="word">Carp::croak</span><span class="structure">(</span><span class="single">'Uh oh, we require a blessed reference'</span><span class="structure">);</span></code></pre>
<p>Not writing a lot of code but still being accurate is something Santa appreciates, but that's not the best part....the best part is the speed increase.</p>
<p>Remember when the Elves were worried about the overhead of introducing a new function call? Well, Ref::Util doesn't do that. It introduces a set of new custom ops to do the hard work instead which is much much faster than any kind of function call.</p>
<p>To understand what that means requires an understanding of the internals of perl. When Perl code is run the code is first compiled into a series of "ops" - operations like "add these values", "push to list", or even as complicated as "run this regular expression". These ops are somewhat like the machine code that runs on your actual processor, but much higher level and more complicated. When perl actually runs your code it runs a sort of virtual machine that basically looks at each of these operations in turn and does whatever operation they tell it to do. At a low level what can make perl slow is the number of ops it has to process (since running each op has an overhead) and the complexity of the ops it runs (for example function calls and regular expressions are expensive compared to other more simple ops like add and subtract).</p>
<p>When you call <code>ref $foo eq 'HASH'</code> there are actually several ops that come into play:</p>
<pre><code> ...
6 <2> seq vK/2 ->7
4 <1> ref[t2] sK/1 ->5
3 <#> gvsv[*foo] s ->4
5 <$> const[PV "HASH"] s ->6</code></pre>
<p>Each of these is an operation that perl has to do (create a list, get the value, call ref, etc.). However, when you use Ref::Util, when <code>is_hashref</code> gets encountered, it is then replaced with the following custom op:</p>
<pre><code> 4 <1> is_hashref vK/1 ->5
3 <#> gvsv[*foo] s ->4</code></pre>
<p>While Santa is not exactly sure what all these crazy looking "op-trees" mean, an elf helpfully explained that "it makes the sled go faster". And we don't even need to put racing stripes on the sled!</p>
<p>Less error prone <i>and</i> faster too. Just what the man in red ordered.</p>
</div>2016-12-02T00:00:00ZxsawyerxGraphing Moose Classes Automaticallyhttp://perladvent.org/2016/2016-12-01.html<div class='pod'><p>A picture really is worth a thousand words. Or at least it's often a lot easier to understand than a thousand lines of code...</p>
<center><img src="plantuml.png"></center>
<h3 id="Tale-of-Woe">Tale of Woe</h3>
<p>There was no escaping it. The latest version of the code that ran Santa's Workshop was much more complicated than it used to be, now that it needed to model so much more of what was going on in the ever-increasing North Pole operation.</p>
<p>The Elves had turned to Moose to increase their code re-use and it had been a great success. By using roles and parameterized roles they now had the power to easily bestow complex abilities on different classes with a carefully positioned <code>with</code> statement. Suddenly a million abstract baseclasses were eliminated from their codebase, and complex code gymnastics were no longer necessary.</p>
<p>Even with this ability to consume roles - and have roles that further consumed roles - life was sometime harder than they'd like to admit when it came to debugging the code. They could, for example, see that the costing code was calling the <code>uses_default_paper</code> method on the object, but they'd be darned if they could track down the sub that defined it....until I showed them how they could simply and automatically generate UML diagrams directly from their code.</p>
<h4 id="Looking-At-Example-Code">Looking At Example Code</h4>
<p>In the new Elf system there's a class for every type of gift, with configurable attributes that define who it's going to, size of the gift, etc. Here's the <code>GiftBike</code> class:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">GiftBike</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose</span><span class="structure">;</span><br /><br /><span class="word">extends</span> <span class="single">'Bike'</span><span class="structure">;</span><br /><br /><span class="word">with</span> <span class="structure">(</span><br /> <span class="single">'Gift'</span><span class="operator">,</span><br /> <span class="single">'Wrapped'</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">default_paper_type</span> <span class="operator">=></span> <span class="single">'EXTRASTRONG052'</span> <span class="structure">}</span><span class="operator">,</span><br /><span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">put_tinsel_in_spokes</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="word">__PACKAGE__</span><span class="operator">-></span><span class="word">meta</span><span class="operator">-></span><span class="word">make_immutable</span><span class="structure">;</span><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>The <code>Bike</code> baseclass that <code>GiftBike</code> extends is actually a lot more complicated than other gift classes in the codebase, because the elves use this same code in their internal transport management application.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Bike</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose</span><span class="structure">;</span><br /><br /><span class="word">with</span> <span class="single">'Transportation'</span><span class="structure">;</span><br /><br /><span class="word">has</span> <span class="word">wheels</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Int'</span><span class="operator">,</span> <span class="word">default</span> <span class="operator">=></span> <span class="number">2</span> <span class="structure">);</span><br /><span class="word">has</span> <span class="word">frame_size</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Int'</span><span class="operator">,</span> <span class="word">required</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><span class="word">has</span> <span class="word">color</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Str'</span><span class="operator">,</span> <span class="word">default</span> <span class="operator">=></span> <span class="single">'White'</span> <span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">put_together</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">pedal</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">sit_upon</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">get_off</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">apply_brakes</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><span class="keyword">sub</span> <span class="word">change_gears</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="word">__PACKAGE__</span><span class="operator">-></span><span class="word">meta</span><span class="operator">-></span><span class="word">make_immutable</span><span class="structure">;</span><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>And the <code>Transportation</code> that <code>Bike</code> consumes is also consumed by the <code>Sled</code>, <code>Snowmobile</code>, etc classes in other parts of the codebase. This gives the <code>Bike</code> (and the <code>GiftBike</code> subclass) the <code>travel_with</code> method.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Transportation</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose::Role</span><span class="structure">;</span><br /><br /><span class="keyword">sub</span> <span class="word">travel_with</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>Since the <code>GiftBike</code> subclass represents a gift, it consumes the <code>Gift</code> role, which, in addition to giving it the <code>associated_letter</code> attribute and <code>put_on_sleigh</code> method, gives it all the attributes and methods from the <code>DeliveryLocation</code> and <code>Recipient</code> roles.</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Gift</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose::Role</span><span class="structure">;</span><br /><br /><span class="word">with</span> <span class="structure">(</span><br /> <span class="single">'DeliveryLocation'</span><span class="operator">,</span><br /> <span class="single">'Recipient'</span><span class="operator">,</span><br /><span class="structure">);</span><br /><br /><span class="word">has</span> <span class="word">associated_letter</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Letter'</span> <span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">put_on_sleigh</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>The <code>DeliveryLocation</code> role is straightforward:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">DeliveryLocation</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose::Role</span><span class="structure">;</span><br /><br /><span class="word">has</span> <span class="word">address</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Str'</span><span class="operator">,</span> <span class="word">required</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">print_delivery_label</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>As is the <code>Recipient</code> role:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Recipient</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">Moose::Role</span><span class="structure">;</span><br /><br /><span class="word">has</span> <span class="word">recipient_name</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Str'</span><span class="operator">,</span> <span class="word">required</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><span class="word">has</span> <span class="word">recipient_dob</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'DateTime'</span><span class="operator">,</span> <span class="word">required</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">recipient_age_on_xmas_day</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>The <code>GiftBike</code> is also wrapped (not everything consuming the <code>Gift</code> role is wrapped -- it's hard to wrap a new puppy, after all) and consumes the <code>Wrapped</code> parameterized role. This parameterized role creates a new anonymous role when it is consumed, with methods and attributes that are dynamically created based on the parameters passed in when the role was consumed.</p>
<p>The <code>Wrapped</code> class looks like this:</p>
<pre><code class="code-listing"><span class="keyword">package</span> <span class="word">Wrapped</span><span class="structure">;</span><br /><br /><span class="keyword">use</span> <span class="word">MooseX::Role::Parameterized</span><span class="structure">;</span><br /><br /><span class="word">parameter</span> <span class="word">default_paper_type</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Str'</span><span class="operator">,</span> <span class="word">required</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><br /><span class="word">role</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$p</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /><br /> <span class="word">has</span> <span class="word">paper_type</span> <span class="operator">=></span> <span class="structure">(</span><br /> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span><br /> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Str'</span><span class="operator">,</span><br /> <span class="word">default</span> <span class="operator">=></span> <span class="symbol">$p</span><span class="operator">-></span><span class="word">default_paper_type</span><span class="operator">,</span><br /> <span class="structure">);</span><br /><br /> <span class="word">method</span> <span class="word">uses_default_paper_type</span> <span class="operator">=></span> <span class="keyword">sub</span> <span class="structure">{</span><br /> <span class="keyword">my</span> <span class="symbol">$self</span> <span class="operator">=</span> <span class="core">shift</span><span class="structure">;</span><br /> <span class="keyword">return</span> <span class="symbol">$self</span><span class="operator">-></span><span class="word">paper_type</span> <span class="operator">eq</span> <span class="symbol">$p</span><span class="operator">-></span><span class="word">default_paper_type</span><span class="structure">;</span><br /> <span class="structure">}</span><br /><span class="structure">};</span><br /><br /><span class="word">has</span> <span class="word">number_of_bows</span> <span class="operator">=></span> <span class="structure">(</span> <span class="word">is</span> <span class="operator">=></span> <span class="single">'ro'</span><span class="operator">,</span> <span class="word">isa</span> <span class="operator">=></span> <span class="single">'Int'</span><span class="operator">,</span> <span class="word">default</span> <span class="operator">=></span> <span class="number">1</span> <span class="structure">);</span><br /><br /><span class="keyword">sub</span> <span class="word">apply_wrapping</span> <span class="structure">{</span> <span class="operator">...</span> <span class="structure">}</span><br /><br /><span class="number">1</span><span class="structure">;</span></code></pre>
<p>When this is consumed in the <code>GiftBike</code> class:</p>
<pre><code class="code-listing"><span class="word">with</span> <span class="structure">(</span><br /> <span class="operator">...</span><br /> <span class="single">'Wrapped'</span> <span class="operator">=></span> <span class="structure">{</span> <span class="word">default_paper_type</span> <span class="operator">=></span> <span class="single">'EXTRASTRONG052'</span> <span class="structure">}</span><span class="operator">,</span><br /><span class="structure">);</span></code></pre>
<p>There's an anonymous role created with a new <code>paper_type</code> attribute and the new <code>uses_default_paper_type</code> method in it.</p>
<h3 id="Turning-this-into-Pictures">Turning this into Pictures</h3>
<p>The Meta::Grapher::Moose module is able to load in Moose classes and produce some pretty graphs from them.</p>
<p>For example, the default renderer uses GraphViz:</p>
<pre><code> graph-meta.pl --package GiftBike --output=diagram.png</code></pre>
<p>You can have GraphViz output in any of the formats it supports:</p>
<pre><code> graph-meta.pl --package GiftBike --output=diagram.pdf</code></pre>
<p>Either way, you get a simple diagram like this:</p>
<center><img src="graphviz.png"></center>
<p>The thicke-bordered rectangles represent classes, while the thinner-bordered rectanges are roles. The various dashed-line-bordered rectangles represent the parameterized role and the anonymous role it creates.</p>
<p>To get more detail in our diagram, we need to switch renderers. The PlantUML project is a Java graphing library that can produce UML class diagrams. By passing the right options to <code>graph-meta.pl</code> we can have it produce the PlantUML-compatible source code for the Moose classes and execute Java and the PlantUML code to produce a graph for us.</p>
<pre><code> graph-meta.pl \
--package GiftBike
--renderer=plantuml
--plantuml=/opt/jar/plantuml.jar
--output=diagram.png</code></pre>
<p>This produces the much more detailed diagram we've already seen above.</p>
<center><img src="plantuml.png"></center>
<p>We can even have PlantUML output the diagram in SVG if we want:</p>
<pre><code> graph-meta.pl \
--package GiftBike
--renderer=plantuml
--plantuml=/opt/jar/plantuml.jar
--output=diagram.svg</code></pre>
</div>2016-12-01T00:00:00ZMark Fowler