What is the sound of Perl? Is it not the sound of a wall that people
have stopped banging their heads against?
-- Larry Wall
Overview
This month, we're going to cover some general Perl issues, look at a
way to use it in Real Life(tm), and take a quick look at a mechanism that
lets you leverage the power of O.P.C. - Other People's Code :). Modules
let you plug chunks of pre-written code into your own scripts, saving you
hours - or even days - of programming. That will wrap up this introductory
series, hopefully leaving you with enough of an idea of what Perl is to
write some basic scripts, and perhaps a desire to explore further.
A Quick Correction
One of our readers, A. N. Onymous (he didn't want to give me his name;
I guess he decided that fame wasn't for him...), wrote in regarding a statement
that I made in last month's article - that "close" without any parameters
closes all filehandles. After cranking out a bit of sample code
and reading the docs a bit more closely, I found that he was right: it
only closes the currently selected filehandle (STDOUT by default).
Thanks very much - well spotted!
Excercises
In the last article, I suggested a couple of script ideas that would give you some practice in using what you'd learned previously. One of the people who sent in a script was Tjabo Kloppenburg; a brave man. :) No worries, Tjabo; either you did a good job, or you get to learn a few things... it's a win-win situation.
The idea was to write a script that read "/etc/services", counted UDP
and TCP ports, and wrote them out to separate files. Here was Tjabo's solution
(my comments are preceded with a '###'):
$udp = $tcp = 0;
### Unnecessary: Perl does not require variable declaration.
# open target files:
open (TCP, ">>tcp.txt") or die "Arghh #1 !";
open (UDP, ">>udp.txt") or die "Arghh #2 !";
### My fault here: in a previous article, I showed a quick hack
in which
### I used similar wording for the "die" string. Here is the proper
way
### to use it:
###
### open TCP, ">tcp.txt" or die "Can't open tcp.txt: $!\n";
###
### The '$!' variable gives the error returned by the system, and
should
### definitely be used; the "\n" at the end of the "die" string
prevents
### the error line number from being printed. Also, the ">>" (append)
### modifier is inappropriate: this will cause anything more than
one
### execution of the script to append (rather than overwrite) the
### contents of those files.
# open data source:
open (SERV, "</etc/services") or die "Arghh #3 !";
while( <SERV> ) {
if (/^ *([^# ]+) +(\d+)\/([tcpud]+)/) {
### The above regex has several problems, some of them minor
### (unnecessary elements) and one of them critical: it actually
misses
### out on most of the lines in "/etc/services". The killer is
the ' +'
### that follows the first capture: "/etc/services" uses a mix
of spaces
### and *tabs* to separate its elements.
$name = $1;
$port = $2;
$tcpudp = $3;
$tmp = "$name ($port)\n";
### The above assignments are unnecessary; $1, $2, etc. will keep
their
### values until the next succesful match. Removing all of the
above
### and rewriting the "if" statement below as
###
### if ( $3 eq "udp" ) { print UDP "$1 ($2)\n"; $udp++; }
###
### would work just fine.
if ($tcpudp eq "udp") {
print UDP $tmp;
$udp++;
}
if ($tcpudp eq "tcp") {
print TCP $tmp;
$tcp++;
}
}
}
# just learned :-) :
for ( qw/SERV TCP UDP/ ) { close $_ or die "can't close $_: $!\n";
}
print "TCP: $tcp, UDP: $udp\n";
The above script counted 14 TCPs and 11 UDPs in my "/etc/services" (which
actually contains 185 of one and 134 of the other). Let's see if we can
improve it a bit:
open SRV, "</etc/services" or die "Can't read /etc/services:
$!\n";
open TCP, ">tcp.txt" or die
"Can't write tcp.txt: $!\n";
open UDP, ">udp.txt" or die
"Can't write udp.txt: $!\n";
for ( <SRV> ) {
if ( s=^([^# ]+)(\s+\d+)/tcp.*$=$1$2= ) { print
TCP; $tcp++; }
if ( s=^([^# ]+)(\s+\d+)/udp.*$=$1$2= ) { print
UDP; $udp++; }
}
close $_ or die "Failed to close $_: $!\n" for qw/SRV TCP UDP/;
print "TCP: $tcp\t\tUDP: $udp\n";
Starting at the beginning of the line, (begin capture into $1) match
any character that is not a '#' or a space and occurs one or more times
(end capture). (Begin capture into $2) Match any whitespace character that
occurs one or more times that is followed by one or more digits (end capture),
a forward slash, and the string 'tcp' followed by any
number of any character to the end of the line. Replace the matched
string (i.e., the entire line) with $1$2 (which contain the name of the
service, whitespace, and the port number.) Write the result to the TCP
filehandle, and increment the "$tcp" variable.
Repeat for 'udp'.
Note that I used the '=' symbol for the delimiter in the 's///' function.
'=' has no particular magic about it; it's just that I was trying to avoid
conflict with the '/' and the '#' characters which appear as part of the
regex (those being two commonly used delimiters), and there was a sale
on '=' at the neighborhood market. :) Any other character or symbol would
have done as well.
Here are a couple of simple solutions for the other two problems:
1. Open two files and exchange their contents.
for ( A, B ) { open $_, "<\l$_" or die "Can't open \l$_: $!\n";
}
@a = <A>; @b = <B>;
for ( A, B ) { open $_, ">\l$_" or die "Can't open \l$_: $!\n";
}
print A @b; print B @a;
I'm sure that a number of folks figured out that renaming the files
would produce the same result. That wasn't the point of the excercise...
but here's a fun way to do that:
2. Read "/var/log/messages" and print out any line that contains the words "fail", "terminated/terminating", or " no " in it. Make it case-insensitive.
This one is an easy one-liner:
Building Quick Tools
A few days ago, I needed to convert a text file into its equivalent
in phonetic alphabet - a somewhat odd requirement. There may or may not
have been a program to do this, but I figured I could write my own in
less time that it would take me to find one:
1) I grabbed a copy of the phonetic alphabet from the Web and saved it to a file. I called the file "phon", and it loked like this:
Alpha
Bravo
Charlie
Delta
Echo
Foxtrot
Golf
...
2) Then, I issued the following command:
"a" => "Alpha",
"b" => "Bravo",
"c" => "Charlie",
"d" => "Delta",
"e" => "Echo",
"f" => "Foxtrot",
"g" => "Golf",
...
3) A few seconds later, I had the tool that I needed - a script with
exactly one function and one data structure in it:
s/([a-zA-Z])/$ph{"\l$1"} /g;
BEGIN {
%ph = (
"a" => "Alpha",
"b" => "Bravo",
"c" => "Charlie",
"d" => "Delta",
"e" => "Echo",
"f" => "Foxtrot",
"g" => "Golf",
"h" => "Hotel",
"i" => "India",
"j" => "Juliet",
"k" => "Kilo",
"l" => "Lima",
"m" => "Mike",
"n" => "November",
"o" => "Oscar",
"p" => "Papa",
"q" => "Quebec",
"r" => "Romeo",
"s" => "Sierra",
"t" => "Tango",
"u" => "Uniform",
"v" => "Victor",
"w" => "Whisky",
"x" => "X-ray",
"y" => "Yankee",
"z" => "Zulu",
);
}
This is one of the most common ways I use Perl - building quick tools
that I need to do a specific job. Other people may have other uses for
it - after all, TMTOWTDI [1] - but for me, a computer
without Perl is only half-useable. To drive the point even further home,
a group of Perl Wizards have rewritten most of the system utilities in
Perl - take a look at <http://language.perl.com/ppt/> - and have fixed
a number of annoying quirks in the process. As I understand it, they were
motivated by the three chief virtues of the programmer: Laziness, Impatience,
and
Hubris (if that confuses you, see the Camel Book ["Programming Perl,
Third Edition"] for the explanation). If you want to see well-written
Perl code, there are very few better places. Do note that the project is
not yet complete, but a number of Unices are already catching on: Solaris
8 has a large number of Perl scripts as part of the system
executables, and doing a
file /sbin/* /usr/bin/* /usr/sbin/*|grep -c perl
shows at least the Debian "potato" distro as having 82 Perl scripts
in the above directories.
OK, now for the explanation of the two s///'s above. First, the "magic" converter:
perl -i -wple's/^(.)(.*)$/\t"\l$1" => "$1$2",/' phon
The "-i", "-w", "-p", and "-e" switches were described in the second part of this series; as a quick overview, this will edit the contents of the file by looping through it and acting on each line. The Perl "warn" mechanism is enabled, and the script to be executed runs from the command line. The "-l" enables end-of-line processing, in effect adding a carriage return to the lines that don't have it. The substitution regex goes like this:
Starting at the beginning of the line, (begin capture into $1) match one character (end capture, begin capture into $2). Capture any number of any character (end capture) to the end of the line.
The replacement string goes like this:
Print a tab, followed by the contents of $1 in lowercase* and surrounded by double quotes. Print a space, the '=>' digraph, another space, $1$2 surrounded by double quotes and followed by a comma.
* This is done by the "\l" 'lowercase next character' operator (see
'Quote and Quote-like Operators' in the "perlop" page.)
The second one is also worth studying, since it points up an interesting feature - that of using a hash value (including modifying the key "on the fly") in a substitution, a very useful method:
s/([a-zA-Z])/$ph{"\l$1"} /g;
First, the regex:
(Begin capture into $1) Match any character in the 'a-zA-Z' range (end capture).
Second, the replacement string:
Return a value from the "%ph" hash by using the lowercase version
of the contents of $1 as the key, followed by a space.
The BEGIN { ... } block makes populating the hash a one-time event, despite the fact that the script may loop thousands of times. The mechanism here is the same as in Awk, and was mentioned in the previous article. So, all we do is use every character as a key in the "%ph" hash, and print out the value associated with that key.
Hashes are very useful structures in Perl, and are well worth studying
and understanding.
Modular Construction
One of the wonderful things about Perl - really, the thing that makes it a living, growing language - is the community that has grown up around it. A number of these folks have contributed useful chunks of code that are made to be re-used; that, in fact, make Perl one of the most powerful languages on the planet.
Imagine a program that goes out on the Web, connects to a server, retrieves the weather data - either the current or the forecast - for your city, and prints the result to your screen. Now, imagine this entire Perl script taking just one line.
perl -MGeo::WeatherNOAA -we 'print print_forecast( "Denver", "CO" )'
That's it. The whole thing. How is it possible?
(Note that this will not work unless you have the 'Geo::WeatherNOAA' module installed on your system.)
The CPAN (Comprehensive Perl Archive Network) is your friend. :) If you go to <http://cpan.org/> and explore, you'll find lots and lots (and LOTS) of modules designed to do almost every programming task you could imagine. Do you want your Perl script converted to Klingon (or Morse code)? Sure. Would you like to pull up your stock's performance from Deutsche Bank Gruppe funds? Easy as pie. Care to send some SMS text messages? No problem! With modules, these are short, easy tasks that can be coded in literally seconds.
The standard Perl distribution comes with a number of useful modules (for short descriptions of what they do, see "Standard Modules" in 'perldoc perlmodlib'); one of them is the CPAN module, which automates the module downloading, unpacking, building, and installation process. To use it, simply type
perl -MCPAN -eshell
and follow the prompts. The manual process, which you should know about just in case there's some complication, is described on the "How to install" page at CPAN, <http://http://cpan.org/modules/INSTALL.html>. I highly recommend reading it. The difference between the two processes, by the way, is exactly like that of using "apt" (Debian) or "rpm" (RedHat) and trying to install a tarball by hand: 'CPAN' will get all the prerequisite modules to support the one you've requested, and do all the tests and the installation, while doing it manually can be rather painful. For specifics of using the CPAN module - although the above syntax is the way you'll use it 99.9% of the time - just type
perldoc CPAN
The complete information for any module installed on your system can be accessed the same way.
As you've probably guessed by now, the "-M" command line switch tells
Perl to use the specified module. If we want to have that module in a script,
here's the syntax:
use Finance::Quote;
$q = Finance::Quote->new;
my %stocks = $q->fetch("nyse","LNUX");
print "$k: $v\n" while ($k, $v) = each %stocks;
The above is an example of the object-oriented style of module, the type that's becoming very common. After telling Perl to use the module, we create a new instance of an object from the "Finance::Quote" class and assign it to $q. We then call the "fetch" method (the methods are listed in the module's documentation) with the "nyse" and "LNUX" variables, and print the results stored in the returned hash.
A lot of modules are of the so-called exporting style; these
simply provide additional functions when "plugged in" to your program.
$code = mirror( "http://slashdot.org", "slashdot.html" );
print "Slashdot returned a code of $code.\n";
Wrapping It Up
Well, that was a quick tour through a few interesting parts of Perl. Hopefully, this has whetted a few folks' tastebuds for more, and has shown some of its capabilities. If you're interested in extending your Perl knowledge, here are some recommendations for reading material:
Learning Perl, 3rd Edition (coming out in July)
Randal Schwartz and Tom Phoenix
Programming Perl, 3rd Edition
Larry Wall, Tom Christiansen & Jon Orwant
Perl Coookbook
By Tom Christiansen & Nathan Torkington
Data Munging with Perl
By David Cross
Mastering Algorithms with Perl
By Jon Orwant, Jarkko Hietaniemi & John Macdonald
Mastering Regular Expressions
By Jeffrey E. F. Friedl
Elements of Programming with Perl
by Andrew Johnson
Good luck with your Perl programming - and happy Linuxing!
Ben Okopnik
perl -we'print reverse split//,"rekcah lreP rehtona tsuJ"'
References:
Relevant Perl man pages (available on any pro-Perl-y configured
system):
perl - overview
perlfaq - Perl FAQ
perltoc - doc TOC
perldata - data structures
perlsyn - syntax
perlop - operators/precedence
perlrun - execution
perlfunc - builtin functions
perltrap - traps for the unwary perlstyle - style guide
"perldoc", "perldoc -q" and "perldoc -f"
Ben Okopnik
A cyberjack-of-all-trades, Ben wanders the world in his 38' sailboat, building
networks and hacking on hardware and software whenever he runs out of cruising
money. He's been playing and working with computers since the Elder Days
(anybody remember the Elf II?), and isn't about to stop any time soon.
Copyright © 2001, Ben Okopnik.
Copying license http://www.linuxgazette.net/copying.html
Published in Issue 69 of Linux Gazette, August 2001