Perl Advent Calendar 2010-12-20++

Four Naughty Tweeting Birds

by William 'n1vux' Ricker

These days, Glug the elf can mostly fill the Naughty by trolling FaceBook and MySpace. However, some sneaky naughties hide their naughty words elsewhere on-line. To catch naughty hipsters in Northampton swearing and the like on their Twitter accounts, Glug logs into his Twitter identity @AngrySantaElf with his command-line twitter backup-archival tool, teewt.pl. (He thinks spelling Tweet backwards for a backup script is hilarious.)

Glug downloaded this useful little backup script off some website site and hacked it a little to download other peoples tweets instead of one's own precious drivel. It used Net::Twitter::Lite1which provides both OAuth authentication for Twitter and Basic (insecure passwords) for Identi.ca, Laconi.ca, etc. So the first time he runs it as a test, it requires he log into the web Twitter to get the authentication Pin for this app, which it stores in teewt.dat.

$ perl teewt.pl last=angrysantaelf
 Authorize this application at: http://twitter.com/oauth/authorize?oauth_token=LKzNrr...
Then, enter the PIN# provided to continue: 
123456

$VAR1 = [
          {
            'source' => 'web',
            'retweet_count' => 0,
            'created_at' => 'Mon Dec 20 22:56:04 +0000 2010',
            'text' => 'Turns out our Slinky Dogs are rabid, so open those Christmas gifts with great care and distance.',
            'in_reply_to_user_id' => undef,
            'user' => {
                        'id' => 214624407,
                        'screen_name' => 'angrysantaelf',
...

After that, Glug grabs the last 500 tweets for each NoHo hipster

$ perl teewt.pl PAGES=5 user=hanneloreEC user=tai_fighter user=martenreed  user=marigoldfarmer user=fayewhitaker user=dorabianchi > noho.csv
fetching page 1 ...
... got 100 
fetching page 2 ...
... got 100

And then counts the officially sanctioned dirty words2 and average swears per tweet per person.3

$ perl -I. -MRE_BadWords -F,  -lane '$c{$F[1]}++; $n{$F[1]}++ while $F[4] =~ m/$BadWords/g;' \
>	-e 'END{printf "%d\t%d\t%.3f\t%s\n",$n{$_} , $c{$_}, $n{$_} / $c{$_}, $_  
>		for sort keys %n }' noho.csv 
6	210	0.029	dorabianchi
16	250	0.064	fayewhitaker
1	377	0.003	hanneloreEC
7	118	0.059	marigoldfarmer
10	200	0.050	martenreed
10	136	0.074	tai_fighter

Glug sees four naughty tweeting birds over the 5% badword line, and gives them each a Naughty demerit. He also gives Hannelore a Nice check for being much much cleaner than her bad influence friends. (Dora will get a demerit too, when Glug finds out what she's done but that's not on-line.)

teewt.pl

   1 #!perl -l
   2 #
   3 # teewt - backing tweets to CSV or, singly, as Dumper object
   4 # 
   5 # derived from Net::Twitter::Lite - OAuth desktop app example
   6 # by William Ricker for Perl Advent Calendar 2010
   7 # Copyright 2010 William Ricker
   8 # License Same As Perl
   9 
  10 use warnings;
  11 use strict;
  12 
  13 use Net::Twitter::Lite;
  14 use File::Spec;
  15 use Storable;
  16 use Data::Dumper;
  17 use Text::CSV;
  18 use IO::Wrap; # wraphandle STDOUT
  19 use feature ":5.10";
  20 
  21 # Setup
  22 
  23 my $nt = authorize();
  24 
  25 # Supported Twitter API calls (methods) and columns to export in CSV
  26 my %Fetches = (
  27 	'sent_direct_messages' =>
  28                 [ qw[ id_str created_at sender_screen_name recipient_screen_name text ]],
  29         'direct_messages' =>
  30                 [ qw[ id_str created_at sender_screen_name recipient_screen_name text ]],
  31        'user_timeline' =>
  32                 [qw[ created_at user/screen_name user/name user/id_str 
  33 			text geo coordinates place id_str source
  34                       in_reply_to_status_id_str in_reply_to_screen_name retweet_count ]],
  35 	'friends_timeline' =>
  36                 [qw[ created_at user/screen_name user/name user/id_str 
  37 		text geo coordinates place id_str
  38 		in_reply_to_status_id_str in_reply_to_screen_name retweet_count ]],
  39 );
  40 
  41 my %Short = (
  42 	sdms => {cmd=>'sent_direct_messages', all=>1},
  43 	dms => {cmd=>'direct_messages', all=>1},
  44 	mine => {cmd=>'user_timeline',all=>1},
  45 	status =>  {cmd=>'user_timeline',one=>1},
  46 	friends => {cmd=>'friends_timeline',all=>1},
  47 	'last' => {cmd=>'friends_timeline',one=>1},
  48 	sd1 => {cmd=>'sent_direct_messages', one=>1},
  49 	dm1 => {cmd=>'direct_messages', one=>1},
  50 );
  51 
  52 # Work - look for commands on @ARGV
  53 
  54 my $pages = $ENV{PAGES} || 50;
  55 
  56 while ( my $cmd = shift ) {
  57     given ($cmd) {
  58 
  59 	when (/PAGES=(\d+)/) {$pages = $1;}
  60 
  61 	when (%Fetches){ all($cmd)} 
  62 
  63 	when (%Short){
  64 		my ($cmdLong,$one,$all) = @{$Short{$cmd}}{qw[cmd one all]};
  65 		# hash slice per http://www.stonehenge.com/merlyn/UnixReview/col68.html 
  66  		once($cmdLong) if $one;
  67 		all($cmdLong) if $all;
  68 	}
  69 
  70 	# last_$cmd is just one
  71 	when ( / ^ last_ (\w+) (?(?{$Fetches{$1}})(?-:)|(*FAIL)) $ /xi ) {
  72 	   once($1);
  73 	}
  74 
  75 	# alias - drop a final s and the last_ is optional
  76 	when ( / ^ (?: last_ )? (\w+) (?(?{$Fetches{$1.q(s)}})(?-:)|(*FAIL)) $ /xi ) {
  77 	   once($1.q(s));
  78 	}
  79 
  80 	when ( / ^ last = (?: (\d+) | (\w+) ) /xi ) {
  81 	   once('user_timeline', {screen_name => $2} ) if $2;
  82 	   once('user_timeline', {id => $1} ) if $1;
  83 
  84 	}
  85 
  86 	when ( / ^ user = (?: (\d+) | (\w+) ) /xi ) {
  87 	   all('user_timeline', {screen_name => $2} ) if $2;
  88 	   all('user_timeline', {id => $1} ) if $1;
  89 
  90 	}
  91 
  92 	default {
  93 		warn "Unrecognized command $cmd \n supported choices:\n";
  94 		warn "last_$_ $_ \n" for sort keys %Fetches ;
  95 		warn "abbrevs: ".join(q( ),sort keys %Short )."\n";
  96 		last;
  97 	}
  98     }    # end given cmd
  99 }    # end cmds
 100 
 101 # ------------------ subs -------------------------
 102 
 103 sub once {
 104     my ($meth, $args) = @_ ;
 105    my %args = $args ? %{$args} : () ; #=(screen_name => 'safety' | id => 123456 );
 106  
 107 	#TBD should probably be in eval catch too ?
 108     my $status = $nt->$meth( { count => 1, %args } );
 109     print Dumper $status;
 110 }
 111 
 112 sub all {
 113     my ($meth, $args) = @_ ;
 114 	# %Fetches is implicit arg
 115     my @Fields =  @{$Fetches{$meth}};
 116 
 117     my  $csv = Text::CSV->new ();
 118     my $io = wraphandle("STDOUT");
 119 
 120     # setup
 121     $csv->print ($io, \@Fields);
 122     @Fields = map { m((\w+)/(\w+)) ? [$1, $2] : $_ } @Fields;
 123 	# expand 'user/screen_name' to qw[user screen_name] 
 124 	# (avoids splitting in inner loop)
 125 
 126     my %args = $args ? %{$args} : () ; #=(screen_name => 'safety' | id => 123456 );
 127 
 128     for my $i ( 1 .. $pages )    # current 3200 max total fetch, so 50 pages enough
 129     {
 130         warn "fetching page $i ...\n"; # heartbeat
 131 
 132  	my $statuses; 
 133         eval {
 134            $statuses = $nt->$meth(
 135                 {
 136 
 137                     # id => $friendId, # for others, on user_timeline
 138                     # since_id => $high_water, to filter already seen, on most
 139                     count => 100,    # typically 70..80 returned
 140                     page  => $i,     # from 1..
 141 			%args,
 142                 }
 143             );
 144         };    # end eval
 145 
 146         if ($@) {
 147             my $Err = $@;
 148             warn "$Err \n";
 149             given ($Err) {
 150                 when (/502|503/) { sleep 1; redo };    # the intertubes were clogged
 151                 default { die "Unexpected error $Err \n" };
 152             }
 153         }    # end if err
 154         my $ni = scalar @$statuses;
 155         warn "... got $ni \n"; # heartbeat
 156         last unless $ni;
 157         for my $status (@$statuses) {
 158               $csv->print ($io, 
 159 			[ map {( ref $_ 
 160 				? $status->{$_->[0]}->{$_->[1]} // "" #/
 161 				:   $status->{$_} // "" #/
 162 				) }  @Fields
 163 			] ) ;
 164         } #end for each
 165     }    # end page i
 166 }    # end sub all
 167 
 168 sub authorize {
 169 
 170     # straight from Net-Twitter-Lite's examples/oauth_desktop.pl
 171     # just put into sub
 172     
 173     # You can replace the consumer tokens with your own;
 174     # these tokens are for the Net::Twitter example app.
 175     my %consumer_tokens = (
 176         consumer_key    => 'v8t3JILkStylbgnxGLOQ',
 177         consumer_secret => '5r31rSMc0NPtBpHcK8MvnCLg2oAyFLx5eGOMkXM',
 178     );
 179 
 180     # $datafile = oauth_desktop.dat
 181     my ( undef, undef, $datafile ) = File::Spec->splitpath($0);
 182     $datafile =~ s/\..*/.dat/;
 183 
 184     my $nt = Net::Twitter::Lite->new(%consumer_tokens);
 185     my $access_tokens = eval { retrieve($datafile) } || [];
 186 
 187     if (@$access_tokens) {
 188         $nt->access_token( $access_tokens->[0] );
 189         $nt->access_token_secret( $access_tokens->[1] );
 190     }
 191     else {
 192         my $auth_url = $nt->get_authorization_url;
 193         print " Authorize this application at: $auth_url\n"
 194 		. "Then, enter the PIN# provided to continue: ";
 195 
 196         my $pin = <STDIN>;    # wait for input
 197         chomp $pin;
 198 
 199         # request_access_token stores the tokens in $nt AND returns them
 200         my @access_tokens = $nt->request_access_token( verifier => $pin );
 201 
 202         # save the access tokens
 203         store \@access_tokens, $datafile;
 204     }
 205 
 206     return $nt;
 207 }
 208 

1. Net::Twitter has more HTML and Error handling bells-and-whistles for those with the Moose in the house, worthwhile for Web-App use, but not needed for command-line/desktop use.

2. George Carlin's Seven Filthy Words aka The Seven Words You Can't Say on the Radio as heard by Supreme Court of the US

3. In the interests of decency, we are ignoring the two worst potty-mouth twitters of the NoHo QC storyline. Glug the Elf since doesn't believe in talking birds so skips Yelling Bird, and the subjects' pintsize robot friend, a known hentai smut linker, is not a natural fictional person so is ineligible. Neither even gets coal, so Glug doesn't search them. Well, maybe he does, but not for $DayJob.

View Source (POD)