Monday, December 6, 2010

Day 6: Who has the longest ...

... sub routine? :)

Today I want to give you an example of PPI. PPI stands for "Parse Perl Isolated" and is a mammoth project of Adam Kennedy. PPI builds an abstract syntax tree out of your Perl code. Similar to DOM this tree is called PDOM (Perl Document Object Model). Have a look at the documentation for an overview.

Let's use PPI to count the number of lines in a sub routine:
#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use File::Find;
use PPI;

my @controller = ();
File::Find::find(
    {wanted => \&wanted, no_chdir => 1},
    qw(apps lib), ### change me ###
);

sub wanted {
    ### change me ###
    if (m!/Controller(/.+)?\.pm$!) {
        push @controller, $_;
    }
}

my @subs = ();
foreach my $file (@controller) {
    my $doc = PPI::Document->new($file);

    my $subs = $doc->find('Statement::Sub');
    next unless $subs;

    foreach my $sub (@$subs) {
        next unless $sub->name;
        ### change me ###
        next unless $sub->name =~/^op_/;

        my @lines = split /\n/, $sub->content;
        push @subs, {
            controller => $file,
            op_name    => $sub->name,
            lines      => scalar @lines,
        };
    }
}

@subs = sort { $b->{lines} <=> $a->{lines} } @subs;

print Dumper [@subs[0..9]];

The first part of the script uses File::Find to fill the @controller array with the file names of the wanted Perl modules. (I used this example at work to count the length of sub routines in our web framework.)

The second part uses PPI to build the PDOM tree ($doc) and query for sub routine nodes ('Statement::Sub'). The name of the sub has to start with 'op_' - you may want to change this. The number of lines are stored (together with the module and sub routine name) and finally the top 10 is printed.

So, Ovid: Show us some numbers of your fairly long methods ... :)

Links:

2 comments:

  1. This summer i actually used this technique to write Code::Statistics, a tool that collects sub length, indentation depth, complexity and such for an entire codebase and for various types of objects in that codebase. :)

    It is on CPAN: http://search.cpan.org/~mithaldu/Code-Statistics-1.103260/lib/Code/Statistics.pm

    ReplyDelete
  2. Nice, I'd bee too ashamed to place some of our 2000 + line counts from our legacy Perl app on here :-). The If count in one is well into the hundreds !

    ReplyDelete