Perl-Uwe.com: December 2010

Friday, December 24, 2010

Day 24: working with integer spans - Set::IntSpan::Fast

At my previous job I had to work with time ranges. Systems would go down ("red" event) and would come up again ("green" event). And later I had to decide if a new time span was adding new outage time or was already completely contained.

I thought about using sorted start and end times (as epoch time) and compare it via binary search. But the code would have been complex and nobody would understand it. I remembered the integer span modules on CPAN (Set::IntSpan). I could represent all previous time ranges as integer spans and then simply check for intersection with the new time span. If the intersection had the same amount of elements as my new time range, it was already completely contained. Otherwise it added some new outage time.

I choose Set::IntSpan::Fast, which also exists as XS version (Set::IntSpan::Fast::XS):

#!/usr/bin/perl
use strict;
use warnings;
use Set::IntSpan::Fast;
my @outage = (
[1293000000, 1293000060],
[1293000000, 1293000030],
[1293000030, 1293000080],
);
my $prev = Set::IntSpan::Fast->new;
$prev->add_range(@{shift @outage});
foreach (@outage) {
    my $new = Set::IntSpan::Fast->new($_->[0].'-'.$_->[1]);
    my $diff = $new->diff($prev);
    next if $diff->is_empty;
    printf("Added %d seconds outage time.\n", $diff->cardinality);
    $prev->merge($diff);
}
printf("Total outage time: %s seconds.\n", $prev->cardinality);

Instead of intersection I used diff, which is empty if no new outage time is added.

If you misspell the complement method like this:

Set::IntSpan::Fast->new->compliment

you get a nice error message. Try it. :)

This is the end of my Perl Advent calendar. Thanks for all your comments. I learned some new things, which I will try in the coming weeks. I can't promise to post again this year, because I'm quite exhausted. But I have some ideas for new blog entries (update on the UUID benchmark and my reason for switching from RDBO to DBIC).

I wish you a merry Christmas and an happy New Year.

Links:

Thursday, December 23, 2010

Day 23: IO::All can do it all :)

Look at the impressive synopsis:

use IO::All;
# Some of the many ways to read a whole file into a scalar
io('file.txt') > $contents;         # Overloaded "arrow"
$contents < io 'file.txt';          # Flipped but same operation
$io = io 'file.txt';                # Create a new IO::All object
$contents = $$io;                   # Overloaded scalar dereference
$contents = $io->all;               # A method to read everything
$contents = $io->slurp;             # Another method for that
$contents = join '', $io->getlines; # Join the separate lines
$contents = join '', map "$_\n", @$io; # Same. Overloaded array deref
$io->tie;                           # Tie the object as a handle
$contents = join '', <$io>;         # And use it in builtins
# Other file operations:
@lines = io('file.txt')->slurp;         # List context slurp
$content > io('file.txt');              # Print to a file
io('file.txt')->print($content, $more); # (ditto)
$content >> io('file.txt');             # Append to a file
io('file.txt')->append($content);       # (ditto)
$content << $io;                        # Append to a string
io('copy.txt') < io('file.txt');        # Copy a file
io('file.txt') > io('copy.txt');        # Invokes File::Copy
io('more.txt') >> io('all.txt');        # Add on to a file
...
# Miscellaneous:
@lines = io('file.txt')->chomp->slurp;  # Chomp as you slurp
$binary = io('file.bin')->binary->all;  # Read a binary file
io('a-symlink')->readlink->slurp;       # Readlink returns an object
print io('foo')->absolute->pathname;    # Print absolute path of foo
# IO::All External Plugin Methods
io("myfile") > io->("ftp://store.org"); # Upload a file using ftp
$html < io->http("www.google.com");     # Grab a web page
io('mailto:worst@enemy.net')->print($spam); # Email a "friend"

IO::All combines a lot of Perl's IO modules into one package. It is very expressive and terse at the same time. This is handy in one liners or small "throw away" scripts. Like Scalar::Andand (day 10) I would not use it in production code.

Links:

PS: Originally I wanted to write about Chart::Clicker, but installation problems delayed it. Now I'm glad - I found this extensive article today: Charting Weather with Chart::Clicker.

Wednesday, December 22, 2010

Day 22: exporting data as .xls file

Spreadsheet::ParseExcel and Spreadsheet::WriteExcel are two mature modules for reading and writing Microsoft Excel files.

Sometimes you just want to export some columns, Spreadsheet::WriteExcel::Simple is a perfect fit for that:

#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::WriteExcel::Simple;

my $xls = Spreadsheet::WriteExcel::Simple->new;
$xls->write_bold_row([qw/Date Time Thing/]);
$xls->write_row(['12/22/10', '10:00', 'buy presents']);
$xls->write_row(['12/22/10', '16:00', 'wrap presents']);
$xls->write_row(['12/24/10', '18:00', 'give presents']);
$xls->save('todo.xls');

Besides from saving the file, this module has mainly two methods: write_bold_row and write_row. I use the first one for headings and the latter for every row of data.

Spreadsheet::WriteExcel::Simple ist just a wrapper around Spreadsheet::WriteExcel. book and sheet give you access to the underlying objects. So you can adjust the settings:

$xls->sheet->keep_leading_zeros

will allow you to write data like '01234'. (For example some German zip codes have a leading zero.)

The counterpart for reading is (you propably guessed it already) Spreadsheet::ParseExcel::Simple:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dump qw/pp/;
use Spreadsheet::ParseExcel::Simple;

my $xls = Spreadsheet::ParseExcel::Simple->read('todo.xls');
my $sheet = ($xls->sheets)[0];

my @output;
my @headlines = $sheet->next_row;
while ($sheet->has_data) {
    my @data = $sheet->next_row;
    my %item;
    foreach (@headlines) {
        $item{$_} = shift @data;
    }
    push @output, \%item;
}
print pp(\@output);

# Output:
[
  { Date => "12/22/10", Thing => "buy presents", Time => "10:00" },
  { Date => "12/22/10", Thing => "wrap presents", Time => "16:00" },
  { Date => "12/24/10", Thing => "give presents", Time => "18:00" },
]

If you want to do more complex stuff, have a look at Spreadsheet::ParseExcel and Spreadsheet::WriteExcel.

Links:

Tuesday, December 21, 2010

Day 21: parallel HTTP requests with Mojo::Client

Two years ago I followed the development of Mojo very closely. Plack/PSGI did not exist and I was appealed by Mojos very few dependencies. I even wrote an mod_perl2 handler for it. My interest for Mojo is gone (Plack filled that need much better), but Sebastian has some very nice modules in his distribution. One of them is Mojo::Client - an asynchronous HTTP 1.1 client:

#!/usr/bin/perl
# filename: mojo-client.pl
use strict;
use warnings;
use Mojo::Client;
my $mojo = Mojo::Client->new;
my $url  = 'http://localhost:5000/?sleep=';
my @tx   = ();
foreach (2 .. 4) {
    push @tx, scalar $mojo->build_tx(GET => $url.$_);
}
$mojo->queue(@tx);
$mojo->start;
my $slept = 0;
foreach my $tx (@tx) {
    $slept += $tx->res->json->{slept};
}
print "Slept $slept seconds.\n";

To test this, I wrote a small app.psgi:

#!/usr/bin/perl
# filename: app.psgi
use strict;
use warnings;
use Plack::Request;
my $app = sub {
    my $req = Plack::Request->new(shift);
    my $sleep = sleep($req->parameters->{sleep} || 5);
    return [
        200,
        ['Content-Type', 'application/json'],
        ['{"slept":'.$sleep.'}'],
    ];
};

Fire up starman (or any other PSGI web server capable of parallel requests) and try it:

$ time perl mojo-client.pl
Slept 9 seconds.
real 0m4.170s
user 0m0.130s
sys 0m0.030s

The numbers are the same as with the Gearman example. We "wait" 9 seconds, but it only takes a little bit over 4 seconds.

As a counter example the results with plackup (single-request server):

$ plackup &
HTTP::Server::PSGI: Accepting connections at http://0:5000/
$ time perl mojo-client.pl
Slept 9 seconds.
real 0m9.178s
user 0m0.180s
sys 0m0.000s

Mojo::Client has much more to offer than parallel processing. It also supports websocket requests.

Did you notice the $tx->res->json call? Mojo::Client has built-in JSON support. Truly Web 2.0 :)

Links:

Monday, December 20, 2010

Day 20: a praise for Plack

Plack is a wonderful thing. If you don't know it, have a look at Miyagawa's slides. I gave an introduction at the 2010 German Perlworkshop, but it wasn't so good. I assumed a too much prior knowledge, so it was confusing. I won't try again in this short post, either. :)

Let me just say a few sentences: PSGI specifies - like CGI - a standard interface between web servers and Perl web applications (or web frameworks). And Plack is a toolkit for PSGI. With plackup you can start PSGI apps from the command line.

Save this short script as app.psgi and run plackup in the shell:

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
$Data::Dumper::Sortkeys = 1;
my $app = sub {
    my $env = shift;
    return [
        200,
        ['Content-Type' => 'text/html'],
        ['<html><body><pre>'.Dumper($env).'</pre></body></html>'],
    ];
};

Now, point your browser to http://localhost:5000/ and you will see the PSGI environment variables.

With plackup -r you get a restarting server (very useful during development). And -e allows you to wrap middlewares around your web app:

plackup -e 'enable "Debug"' app.psgi

will give you a nice debug panel. (You have to install Plack::Middleware::Debug.)

Normally you do not use Plack directly, your web application framework does it. But for really small web apps, I use Plack::Request and Plack::Response (together with Template Toolkit) directly.

Links:

Sunday, December 19, 2010

Day 19: memory usage with Devel::Size

Devel::Size helps you when you want to know how much memory a particular variable uses. It has two functions: size and total_size. For scalars size is enough, for array or hash references, total_size calculates the memory usage of the complete structure.

On my system (Linux 64bit, Perl 5.12), an integer takes at least 24 Bytes:

perl -MDevel::Size=size -e 'print size(1)'

24

And a string starts at 56 Bytes:

perl -MDevel::Size=size -e 'print size("1")'

56

Here is a comparison of a string and two arrays:

use Devel::Size qw/total_size/;
my $scalar = '123456789';
my $flat   = ['123', '456', '789'];
my $full   = [[1, 2, 3], [4, 5, 6], [7, 8, 9]];
printf "scalar: %5d\n", total_size $scalar;
printf "flat:   %5d\n", total_size $flat;
printf "full:   %5d\n", total_size $full;

# Output:
scalar:    64
flat:     368
full:     896

My (old) German Perl blog has a better example: Array-Overhead mit Devel::Size bestimmen.

But I want to make another point here. Sometimes you need something else:

use Devel::Size qw/total_size/;
use XML::LibXML;

my $html = <<HTML;
<html><head><title>test</title></head>
<body><h1>headline</h1>
<b><i>just</i> <u>testing</u></b>
</body></html>
HTML
my $xml = XML::LibXML->new;
my $doc = $xml->parse_html_string($html);
printf "HTML length: %5d\n", length $html;
printf "total_size:  %5d\n", total_size $doc;

# Output:
HTML length:   112
total_size:     72

Oh, the parsed document (which is a DOM tree), is smaller than the HTML string? Data::Dumper shows the problem:

$VAR1 = bless( do{\(my $o = 42302096)}, 'XML::LibXML::Document' );

XML::LibXML is a XS module, the DOM tree is not stored in Perl objects, but in C. So we need to ask the system how much memory our process is consuming. I could not find a good way in Perl. The best I found was ps:

ps -o rss --no-heading -p PID

Lets try this (I leave the first part out):

my $start = get_rss();
my $doc2  = $xml->parse_html_string($html);
my $final = get_rss();
printf "real size: %d KB\n", $final - $start;

sub get_rss {
    my $rss = qx/ps -o rss --no-heading -p $$/;
    $rss =~ s/\D//g;
    return $rss;
}

This reveals a size of 8 KB. I tried this with bigger documents and it grows. :)

But is measuring RSS (resident set size) really the right figure? Please comment, if you know it. Thanks.

Links:

Saturday, December 18, 2010

Day 18: doing things in parallel (Gearman)

Parallel processing is a complex topic (or at least it can be complex). There are a lot of choices: POE, Coro, threads, processes and the list goes on ...

For a client I had to do three expensive calculations in parallel. I tried threads and processes, but communicating the result back needed some thought. POE would be a good candidate, but I do not have practical experience with it (only a few talks at various conferences). In the end I settled for Gearman. This has the additional charme of being able to spread parallel processing between different hosts.

If you do not know Gearman, have a look at it's Wikipedia page (which is quite brief).

Gearman was originally written in Perl, but was later rewritten in C. CPAN modules for both exist. My examples are for the Perl version. Gearman::Client contains the client and worker code. Gearman::Server the gearmand server script.

Without further ado, here comes the worker script (this contains our "real work" to be executed):

#!/usr/bin/perl
# filename: worker.pl
use strict;
use warnings;
use Gearman::Worker;

my $worker = Gearman::Worker->new(job_servers => ['127.0.0.1']);
$worker->register_function(sleep => \&_sleep);
while (1) {
    $worker->work;
}

sub _sleep {
    my $job     = shift;
    my $seconds = $job->arg;
    print "sleeping\n";
    sleep($seconds);
    return $seconds;
}

And now our client, where we will do three things in parallel:

#!/usr/bin/perl
# filename: client.pl
use strict;
use warnings;
use Gearman::Client;

my $client  = Gearman::Client->new(job_servers => ['127.0.0.1']);
my $waited  = 0;
my $taskset = $client->new_task_set;
for (2..4) {
    $taskset->add_task(sleep => $_, {
        on_complete => sub {
            my $ret = shift;
            $waited += $$ret;
            print "Done.\n";
        },
    });
}
$taskset->wait;
print "Waited $waited seconds.\n";

To test this, start gearmand and at least three workers:

gearmand &
perl worker.pl &
perl worker.pl &
perl worker.pl &
time perl client.pl

It gives the following output:

time perl client.pl
sleeping
sleeping
sleeping
Done.
Done.
Done.
Waited 9 seconds.

real 0m4.062s
user 0m0.030s
sys 0m0.030s

The three sleeping lines appear simultaneously. The jobs run 2, 3 and 4 seconds each. The total run time of the script is just a little bit over 4 seconds (instead of 9 if we were doing the jobs serially).

Two of my team mates have written modules for Gearman: Dennis Schoen has written Gearman::XS, which uses the Gearman C library. Johannes Plunien has written Gearman::Driver, which is very useful if you have a lot of worker processes.

At work we use all of them in production (and the C Gearman server).

Links:

Friday, December 17, 2010

Day 17: nice schema definition for DBIx::Class

This post is for DBIx::Class users only. :)

So, if you are using another ORM, you have to wait for tomorrow ...

As a side note: I was using Rose::DB::Object (RDBO) for years. But I'm slowly changing to DBIx::Class (DBIC). I really liked RDBO and it is a solid ORM. But the community strongly prefers DBIC and I wanted to gain experience with it.

Usually a schema definition looks like this:

package MyDB::Schema::Result::Artist;
use base qw/DBIx::Class::Core/;

use strict;
use warnings;

__PACKAGE__->table('artist');
__PACKAGE__->add_columns(qw/id name/);
__PACKAGE__->set_primary_key('id');
__PACKAGE__->has_many(cds => 'MyDB::Schema::Result::CD');

1;

With DBIx::Class::Candy it changes to:

package MyDB::Schema::Result::Artist;
use DBIx::Class::Candy;

table 'artist';
column id   => {data_type => 'int', is_auto_increment => 1};
column name => {data_type => 'varchar', size => 25};
primary_key 'id';

has_many cds => 'MyDB::Schema::Result::CD', 'id';

1;

I did not like all these __PACKAGE__ calls. This "Moose inspired" syntax is much nicer in my opinion.

Links:

Thursday, December 16, 2010

Day 16: pretty printing XML with XML::Twig

Today's entry is about XML::Twig. It can do serveral things, all related to parsing (and processing) XML documents.

XML::Twig can read an XML document all at once and build a tree structure. It can process it chunk by chunk or act like a filter. I want to show its usage as a XML pretty printer:

#!/usr/bin/perl

use warnings;
use strict;

use XML::Twig;

# XMLRPC-Request
my $xml = <<'END';
Test
test
END

my $twig = XML::Twig->new(pretty_print => 'indented');
$twig->parse($xml);

$twig->print;

Which will produce the following output:

<?xml version="1.0"?>
<methodCall>
  <methodName>Test</methodName>
  <params>
    <param>
      <value>
        <string>test</string>
      </value>
    </param>
  </params>
</methodCall>

XML::Twig comes with a few command line utilities. One of them (xml_pp) is a XML pretty printer.

For more 'traditional' usage (parsing XML and extracting information) look at the documentation or brian d foy's blog entry.

Links:

Wednesday, December 15, 2010

Day 15: CPAN on the road

If the mountain does not come to the prophet, the prophet has to come to the mountain.

You travel and don't have Internet access - but want to install some CPAN modules?

Back in my early days, when I started with Perl (around 1996), I did not have an Internet connection. But I was allowed to use it at my local university. If I wanted to install some CPAN modules, I had to download them and bring them home - on floppy disk! Too bad, if I forgot a dependency ... :)

Later I wrote a script which would fetch all "current" modules and burn them on CD. (USB sticks didn't exist then - or at least I didn't have one.) That worked quite well and missing dependencies were no longer a problem.

But even today (with Internet all around us), you sometimes are "offline". CPAN::Mini can help! It's command line client minicpan offers a comfortable way to sync (the current subset of) CPAN locally:

minicpan -l ~/minicpan -r ftp://ftp.fu-berlin.de/unix/languages/perl

The local destination is specified with -l, the remote CPAN mirror with -r (see http://www.cpan.org/SITES.html for a list).

Sometimes (when there were network problems during downloads - usually a 500 error), I run it once again with -f additionally. This forces the redownload of all missing (or broken) files.

You can also put all your options into a config file (~/.minicpanrc or C:\Users\Uwe\.minicpanrc under Windows):

local: D:\minicpan
remote: http:\\cpan.strawberryperl.com

(This is my original config file from my Windows laptop.) So I just run minicpan or sometimes minicpan -f.

To use this with the CPAN shell, you have to change your urllist setting. In my case I changed it to file:///D:/minicpan/.

o conf urllist file:///D:/minicpan/
o conf commit

It is a really handy solution and extensions like CPAN::Mini::Inject and CPAN::Mini::Webserver make it even more useful.

Links:

Tuesday, December 14, 2010

Day 14: an alternative CPAN installer

To install a CPAN module, you unpack it and run

perl Makefile.PL
make
make test
make install

perl Build.PL
./Build
./Build test
./Build install

depending wether ExtUtils::MakeMaker or Module::Build is used.

But I'm sure you know about the CPAN shell, which can be invoked by perl -MCPAN -e shell or cpan. But the options do not end here...

Some years ago, the development of CPAN.pm was stuck and a new client (CPANPLUS) was on the horizon. It had some really nice features (search for a module and then install some of the search results, without much typing). But I never switched to it, because it did not have an easy 'force' option. (Maybe it has now, I don't know.)

The development of CPAN.pm continued, it got colorful and less verbose, distroprefs was added.

So, what is the alternative, I want to talk about? It's cpanminus.

You can download it like perlbrew without installing any prerequisites:

curl -L http://cpanmin.us | perl - --sudo App::cpanminus

This line installs App::cpanminus itself.

The usage is very simple: cpanm module

See the documentation for the supported options.

Links:

Monday, December 13, 2010

Day 13: brew your own beer ... eh ... perl

Yesterday I showed you how to perform local installs of CPAN modules, today we expand that to Perl itself.

What are the benefits of having a local perl installation?

you can install exactly the version you want
you can install more than one version

So, you can test your code with different Perl versions (5.8, 5.10, 5.12 or even bleadperl).

Compiling (and installing) Perl is not difficult. On Linux/Unix you:

download and unpack the desired version
./Configure -des -Dprefix=~/localperl
make test
make install

Now you can run ~/localperl/bin/perl -v. To use it instead of your system perl you can add ~/localperl/bin to your PATH or set an alias. For all these steps (including download) there is an handy utility: perlbrew.

Let's see it in action ...

This is my system perl:

uwe@quad:~$ perl -v
This is perl, v5.10.1 (*) built for x86_64-linux-gnu-thread-multi
...

Init the perlbrew environment (you have to do this once):

uwe@quad:~$ perlbrew init
Perlbrew environment initiated, required directories are created under

    /home/uwe/perl5/perlbrew

Well-done! Congratulations! Please add the following line to the end
of your ~/.bashrc

    source /home/uwe/perl5/perlbrew/etc/bashrc

After that, exit this shell, start a new one, and install some fresh
perls:

    perlbrew install perl-5.12.1
    perlbrew install perl-5.10.1

For further instructions, simply run:

    perlbrew

The default help messages will popup and tell you what to do!

Enjoy perlbrew at $HOME!!

Add this line to ~/.bashrc:

uwe@quad:~$ vi ~/.bashrc

Now, install Perl 5.12.2:

uwe@quad:~$ perlbrew install perl-5.12.2
Attempting to load conf from /home/uwe/perl5/perlbrew/Conf.pm
Fetching perl-5.12.2 as /home/uwe/perl5/perlbrew/dists/perl-5.12.2.tar.gz
Installing perl-5.12.2 into /home/uwe/perl5/perlbrew/perls/perl-5.12.2
This could take a while. You can run the following command on another shell to track the status:

  tail -f /home/uwe/perl5/perlbrew/build.log

(cd /home/uwe/perl5/perlbrew/build; tar xzf /home/uwe/perl5/perlbrew/dists/perl-5.12.2.tar.gz;cd /home/uwe/perl5/perlbrew/build/perl-5.12.2;rm -f config.sh Policy.sh;sh Configure -de '-Dprefix=/home/uwe/perl5/perlbrew/perls/perl-5.12.2';make;make test && make install) >> '/home/uwe/perl5/perlbrew/build.log' 2>&1 
Installed perl-5.12.2 as perl-5.12.2 successfully. Run the following command to switch to it.

  perlbrew switch perl-5.12.2

Switch to 5.12.2:

uwe@quad:~$ perlbrew switch perl-5.12.2

Version check :)

uwe@quad:~$ perl -v
This is perl 5, version 12, subversion 2 (v5.12.2) built for x86_64-linux
...

You don't even have to install App::perlbrew, just execute:

curl -L http://xrl.us/perlbrewinstall | bash

Links:

Sunday, December 12, 2010

Day 12: installing CPAN modules locally

Today's entry (and the next three days) is around installing CPAN modules. If you have the sufficent rights ('root' or 'Administrator') you can install directly into the system directories of your Perl installation. I prefer to avoid that. So, a system upgrade does not interfer with my installed modules.

There are a few ways to tell your system Perl where your locally installed modules are:

environment variable PERL5LIB
'use lib ...' in your script
'perl -I...' in the command line or shebang line

To install a CPAN module in a custom directory, you can use INSTALL_BASE for ExtUtils::MakeMaker (Makefile.PL) or install_base for Module::Build (Build.PL). I don't want to go into detail here, because I want to show you a much simpler solution: local::lib. It configures your CPAN.pm config and helps you setting the right environment variables for your shell.

You need to follow the bootstrapping technique:

download and unpack the tarball from CPAN
perl Makefile.PL --bootstrap
make test && make install
perl -I~/perl5 -Mlocal::lib
copy the above output into ~/.bashrc (depending on your shell)
open a new shell, so that the new environment variables are present

Now you can use cpan and perldoc like usual, but all installs go into ~/perl5. If you want to run perl scripts outside your shell environment (e. g. in Apache or crontab), set PERL5LIB to the value from your ~/.bashrc.

Links:

Saturday, December 11, 2010

Day 11: less boilerplate in your test files with Test::Most

If you are still using Test::More for your test files, then have a look at Test::Most. For me, it took only a look at the synopsis to be convinced. :)

This:

use strict;
use warnings;
use Test::Exception;
use Test::Differences;
use Test::Deep;
use Test::Warn;
use Test::More tests => ...;

shrinks down to

use Test::Most tests => ...;

It can't become any shorter ... :)

But I want to show you two other features:

use Test::Most die => tests => 25;

Notice the die? The execution of your tests will stop after the first failure. This is often sensible, because one failure often leads to serveral others. And what you still see on the screen is not the real problem. You can also induce this behaviour via the environment variable DIE_ON_FAIL.

The next feature is explain. Test::More has something similar ('note explain ...'), but I did not use it until I saw it in Test::Most. It prints a diagnostic message, with all references going through Data::Dumper:

my $res = $ua->get(...);
explain 'Response: ', $res;
...

If you are using prove (which I recommend), you have to use the verbose switch (-v) to see it.

Links:

Friday, December 10, 2010

Day 10: Can't call method "..." on an undefined value

Are you tired of these error message?

It occurs when you chain methods and one of them returns 'undef' - or when you just call methods on an undefined value.

perl -e 'undef->x'

If "$a->b->c" throws this message you can rewrite it as "$a->b && $a->b->c". But how boring is this?

Scalar::Andand to the rescue! Now you can write "$a->b->andand->c".

It's important to quote the documentation:

Note that this module was intended as a proof of concept. The author has never used it in production code, nor is he planning to do so.

I continue tomorrow with "serious" stuff. :)

Links:

Thursday, December 9, 2010

Day 9: another debugging story

Nobody wants to do it, but sometimes there is no way around: overwriting a sub routine of a (CPAN) module. In our case we had to fiddle around in URI::Escape, because we wanted to port our legacy web framework to Plack. And our "non standard" UTF8 handling made same changes necessary. :)

So, everything worked and we went along with other things. A few weeks later, it suddenly stopped working. We used "git bisect" to identify the guilty commit - code, that had nothing to do with our web framework. But the code was using URI::http and somehow ended using URI::_query too, which is importing uri_unescape from URI::Escape. (I do not remember the exact details, so if this is confusing, I may have missed/mixed some facts.)

But how did we find this out? I used Devel::Loading to print out the order of loading each module (CPAN and our own code). It was quite a long list. And one difference between the two commits was the relative order of URI::_query versus our module that was overriding uri_unescape.

How does Devel::Loading work? Does it overwrite 'use' and 'require'?

No. You can put a code reference into @INC. See 'perldoc -f require' for the glory details. But here is a small example:

#!/usr/bin/perl

use strict;
use warnings;

BEGIN {
    unshift @INC, sub {
        my ($own_code, $module) = @_;
        print "Tried to use $module.\n";
        return undef;
    };
}

use Data::Dumper;

It has the following output:

Tried to use Data/Dumper.pm.
Tried to use Carp.pm.
Tried to use Exporter.pm.
Tried to use XSLoader.pm.
Tried to use bytes.pm.
Tried to use overload.pm.
Tried to use warnings/register.pm.

When the coderef returns 'undef', the next entry in @INC is queried.

Links:

Feedback:
How did you like today's entry? Shall I tell more stories or stick to the more neutral, example based approach? I'm looking forward to your comments.

PS: "We" in this entry means our Perl architecture team at work. Our employer has some open positions. Contact me, if you are interested or have some questions.

Wednesday, December 8, 2010

Day 8: two handy debugging modules

Your script throws an error (or a warning) and you can't make any sense out of the line number. If you only had a stacktrace ...

... Carp::Always to the rescue:

$ perl -MCarp::Always script.pl

Now, every 'warn' and 'die' behave like Carp's 'cloak' and 'confess'. (And 'carp' and 'croak' become noisy too.)

But what if your script silently dies (because of a segmentation fault)? It won't output anything, so what can you do?

I once used Devel::Trace to find out what happens in a CGI script that suddenly died. Devel::Trace prints every line of Perl it executes to STDERR (Apache error log in the case of my CGI script). The module is quite old, almost as old as rlib. But it works good.

There is just one downside: the output is really large. :)

In case of a segfault, just work your way through it from the bottom. In my case it was a faulty database driver, that caused the segfault.

Stay tuned for tomorrow, when I tell another debugging story and introduce another Devel::* module.

Links:

Tuesday, December 7, 2010

Day 7: Alternatives to Data::Dumper

Data::Dumper is a handy module. But I prefer Data::Dump: Its output is compacter and already sorted (hash keys). Data::Dumper also has a setting to sort the keys ($Data::Dumper::Sortkeys), but with Data::Dump "it just works". :)

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use Data::Dump;

my $small = {a => 1, b => 2, c => 3, d => 4, e => 5};
my $large = {
    aaa => 'data set 1',
    bbb => 'data set 2',
    ccc => 'data set 3',
    ddd => 'data set 4',
    xxx => $small,
};

print "Data::Dumper:\n";
print Dumper($small);
print Dumper($large);

print "\nData::Dump:\n";
dd($small);
dd($large);

The following output shows the default settings of each module:

Data::Dumper:
$VAR1 = {
          'e' => 5,
          'c' => 3,
          'a' => 1,
          'b' => 2,
          'd' => 4
        };
$VAR1 = {
          'bbb' => 'data set 2',
          'xxx' => {
                     'e' => 5,
                     'c' => 3,
                     'a' => 1,
                     'b' => 2,
                     'd' => 4
                   },
          'aaa' => 'data set 1',
          'ccc' => 'data set 3',
          'ddd' => 'data set 4'
        };

Data::Dump:
{ a => 1, b => 2, c => 3, d => 4, e => 5 }
{
  aaa => "data set 1",
  bbb => "data set 2",
  ccc => "data set 3",
  ddd => "data set 4",
  xxx => { a => 1, b => 2, c => 3, d => 4, e => 5 },
}

As I said, Data::Dump is compacter. :)
If a structure fits into one line, it will do so.

Another neat module is Data::Dumper::Concise. It is an "optimal" configuration of Data::Dumper. But I don't care, I use Data::Dump for that. But I like its Devel::Dwarn module. It exports Dwarn, which allows you this:

sub ... {
    my $ua = ...
    ...
    return Dwarn $ua->get(...);
}

Normally you would have to rewrite your code and assign the result of the method call to a temporary variable and dump it. With Dwarn the changes are minimal. Also exported: DwarnS for scalar context and DwarnL for list context.

Have a look at Data::Dump::Streamer for another alternative to Data::Dumper.

Links:

Monday, December 6, 2010

Day 6: Who has the longest ...

... sub routine? :)

Today I want to give you an example of PPI. PPI stands for "Parse Perl Isolated" and is a mammoth project of Adam Kennedy. PPI builds an abstract syntax tree out of your Perl code. Similar to DOM this tree is called PDOM (Perl Document Object Model). Have a look at the documentation for an overview.

Let's use PPI to count the number of lines in a sub routine:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use File::Find;
use PPI;

my @controller = ();
File::Find::find(
    {wanted => \&wanted, no_chdir => 1},
    qw(apps lib), ### change me ###
);

sub wanted {
    ### change me ###
    if (m!/Controller(/.+)?\.pm$!) {
        push @controller, $_;
    }
}

my @subs = ();
foreach my $file (@controller) {
    my $doc = PPI::Document->new($file);

    my $subs = $doc->find('Statement::Sub');
    next unless $subs;

    foreach my $sub (@$subs) {
        next unless $sub->name;
        ### change me ###
        next unless $sub->name =~/^op_/;

        my @lines = split /\n/, $sub->content;
        push @subs, {
            controller => $file,
            op_name    => $sub->name,
            lines      => scalar @lines,
        };
    }
}

@subs = sort { $b->{lines} <=> $a->{lines} } @subs;

print Dumper [@subs[0..9]];

The first part of the script uses File::Find to fill the @controller array with the file names of the wanted Perl modules. (I used this example at work to count the length of sub routines in our web framework.)

The second part uses PPI to build the PDOM tree ($doc) and query for sub routine nodes ('Statement::Sub'). The name of the sub has to start with 'op_' - you may want to change this. The number of lines are stored (together with the module and sub routine name) and finally the top 10 is printed.

So, Ovid: Show us some numbers of your fairly long methods ... :)

Links:

Sunday, December 5, 2010

Day 5: Check your Google PageRank with WWW::Google::PageRank

The Google PageRank is one factor in the placement of search results for Google. With WWW::Google::PageRank you can query the PageRank of any URL:

use WWW::Google::PageRank;
my $pr = WWW::Google::PageRank->new;
print scalar($pr->get('http://www.perl-uwe.de')), "\n";

Using the little advice from day 3 you can write it in one line:

perl -Maliased=WWW::Google::PageRank,PR -e 'print PR->new->get("http://is.gd")."\n"'

Ok, it does not quite fit into one line on this blog. :)
But I want to show you some options to '-M' here. Lets say you want use List::Util's 'max' function in your one liner. This is done by the '=' separator:

perl -MList::Util=max ...

All further imports have to be separated by ',':

perl -MList::Util=max,min ...

In the above example of aliased I use the two parameter form to give the alias an even shorter name ('PR'). The default would have been 'PageRank'.

Links:

Saturday, December 4, 2010

Day 4: inheritable DATA sections with Data::Section

Damn, RJBS beat me on Sub::Exporter / Sub::Import! I wanted to write about them too. But he was faster. Maybe this time, I am faster. :)

Today I want to show you Data::Section, which is (like Sub::Exporter and Sub::Import) from Ricardo Signes. It has two use cases (and I will show an example for both):

inheritable DATA sections
multiple 'files' (hunks) in the DATA section

Consider the following modules:

# file: Parent.pm
package Parent;
use Data::Section -setup => {default_name => 'default'};
1;
__DATA__
__[file1]__
Master version of file 1.
__[file2]__
Master version of file 2.

# file: Child.pm
package Child;
use base 'Parent';
1;
__DATA__
Default name content.
__[file1]__
Custom version of file 1.

Now you can use Data::Section's methods for retrieval on these modules:

use Child;
use Data::Dump;
print ${Child->section_data('file1')};
print dd(Child->merged_section_data);

Each data section is returned as scalar reference. The output is the following:

Custom version of file 1.
{
  default => \"Default name content.\n",
  file1   => \"Custom version of file 1.\n",
  file2   => \"Master version of file 2.\n",
}

I used Data::Section in my experimental web framework UVWeb for inheritable templates in CRUD actions. I stopped working on UVWeb and switched all my projects to Catalyst. But this was a feature I especially liked.

Have a look at Mojo::Command (and Mojolicious::Command::Generate::App) for another approach and an example use case.

Links:

Friday, December 3, 2010

Day 3: shorter package names with aliased

Day 1 saved us some characters (for FindBin), but this was just once per script. With aliased you ease typing of long class names:

use aliased 'Rose::HTML::Form::Field::DateTime' => 'DateField';
my $dt = DateField->new(...);

When you leave out the second argument, your alias is 'DateTime' (which would be a bad idea, because this clashes with the CPAN module DateTime).

If you have a complex class structure under a common name space, have a look at aliased::factory.

Links:

Thursday, December 2, 2010

Day 2: a better grep - ack

If you are a Unix/Linux user you propably know grep. It lets you search in the contents of files.

Back when I was using SVN as version control system, there was one annoying thing: SVN stored all versioned files twice. There always was a copy in the .svn directory. So, when I was using

grep -r 'use strict'

I also got these copies as results. By typing

grep -r 'use strict' | grep -v .svn

I only got the desired results.

It was at this time that I came across ack - and was immediately sold. Ack ignores also other version control directories and common backup files (e. g. ending with '~').

Also, it lets you specify the file types:

ack --perl 'use strict'

There are more options, have a look at the documentation.

Ack is available through CPAN - just install App::Ack. But there is also an standalone version which includes every module into one big script:

curl http://betterthangrep.com/ack-standalone > ~/bin/ack && chmod 0755 !#:3

This standalone version is also used in the ack-in-project and AckMate TextMate bundle. (TextMate is a popular text editor for MacOS.) I use TextMate at work - the AckMate bundle works very well.

Ack runs also on Windows (its pure perl). Under Debian and Ubuntu the package is called ack-grep. (Ack is a Kanji code converter there.)

Links:

Wednesday, December 1, 2010

Day 1: add relative paths to @INC with rlib

Welcome to day 1 of the Perl-Uwe Advent calendar. We start with a very small CPAN module, which is one of my favorites. It's quite old (its last release was 1998), but it still does its job. :)

When my Perl projects are larger than a single script file, I usually create folders like this:

bin/
lib/
t/

You know what goes into what folder. And especially in the scripts which use your own modules you want to include something like this:

use lib '../lib';

This works fine when you call this script from your project root directory, but nowhere else. :(

rlib to the rescue!

use rlib '../lib';

This is equivalent to:

use FindBin;
use lib "$FindBin::Bin/../lib";

Like "use lib" you can add more than one directory to @INC. If you specify no path at all, rlib uses "../lib" and "lib" as defaults. So, in our example we could have written "use rlib;" alone.

Links: