Sculpture by Jeff Powell
Original artwork in stone, metal, and who knows what else
Gallery Home Blog Contact

Jeff Powell's Simplistic Perl Class

Warning!

This Perl class was developed many years ago -- back in about 1998, I think. I have made no effort to keep it up to date, and even the links here may be bad now. This document is more historical in nature than anything else at this point. Once Perl 6 finally releases for real, even some of the syntax may no longer be correct.

That said, you're welcome to look it over, and if you're planning on simple Perl use, with some flavor of Perl 5, it should still help. Just know that it is far from complete and that it grows less complete every day.

Introduction

  This document is intended to provide an overview of the major features of the Perl language to those who haven't seen it before. What is important? Well, that inevitably is a matter of opinion. Here you get what I think is important, and not what someone else thinks. I base my opinions on several years of writing scripts in the language, but... TMTOWTDI...

There's More Than One Way To Do It

That's pronounced "Tim-Toady", and it means that while I can show you how I use the language, other people will use it very differently. You will see Perl code written by others that looks nothing like what you'll see discussed here. It's not wrong, it's just different.

I am always interested in comments on this material, and feedback on how it helped or hindered you in your efforts. You can contact me here.

  

Table Of Contents

  All of the links below are internal to this single document. You can use this document on-line as a reference, and you may print it in one fell swoop. Things covered in the class... An incomplete list of things NOT covered in the class... at least not yet...
  • Complex data structures
  • Function prototypes
  • Socket programming
  • Formatted output
  • OO Perl
  • Many built in functions -- there are too many for the scope of this class. We'll cover the basics; read the book and/or man pages for the rest.
  • Most library functions -- there are jillions of them in the standard distribution, and many more in the CPAN. Read the book or search the CPAN before reinventing the wheel.
  

What is Perl?

  Perl is a free programming language -- largely platform independent by virtue of its interpreted nature -- that was developed by Larry Wall originally, and is now developed by a cluster of people "out there" on the 'net (including Larry).

Perl is interpreted in a manner somewhat similar to the way Java is interpreted. Both languages have a compiler, though Java's runs separately from the interpreter while Perl' is built into the interpreter. (This gives Perl the ability to interpret new code at runtime, but does so at the expense of compiling the application each time it is started.) Both interpreters operate on a byte-code, rather than on the original source of the program, so they are much faster than shell scripts or batch files. There is no "Perl Virtual Machine" which is a difference from Java.

Perl's language constructs are built for true general purpose use, but (in my opinion) its real strengths are in string manipulation and system interface work. Perl has the ability to handle single strings many megabytes in size, lists of data items, hashes (highly similar to associative memory), and several other data types. It has interfaces to most every UNIX system call there is, and can easily do manipulations on files, file names, and file contents.

In addition, Perl combines internally many of the most useful UNIX utilities and puts them in the hands of the script writer. You can do pattern matching searches across string data (like grep), split strings into fields (like awk/nawk), perform text substitutions (like sed) and many related things. It's a very powerful package. If you are an OO lover, Perl 5 even provides an OO programming scheme as well, though it's much less restrictive in its approach to OO than Java or C++.

In the "Real World" Perl is the language of choice for implementing CGI scripts on WWW server systems. It's string handling facilities make pulling data out of forms extremely easy and it is powerful enough to do most anything that needs to get done on the WWW server side of things. Perl is also used heavily by system administrators to automate various tasks. It is much more efficient than shell scripts or batch files, and much more powerful too, making complex system maintenance tasks much easier to automate.

  

Perl History

  Perl has an interesting history, and Larry documents much of it in Programming Perl. Perl grew out of an effort to build a bi-coastal configuration management system, and has spawned all kinds of interesting oddities, like Perl Poetry (also documented in his book) and more then one obfuscated code contest. (http://www.tpj.com/tpj/contest)
  

Where to get Perl and Perl modules

  As a general rule, if you want Perl source, binaries, modules, docs, advice, or most anything else perlish, start at: http://www.perl.com/. You'll find most everything there, or a link to it.

General information on downloading Perl source code and/or compiled versions can be found at http://www.perl.com/pace/pub/perldocs/latest.html.

UNIX

Perl can be downloaded from http://www.perl.com/pace/pub/perldocs/latest.html in source form and built for most any UNIX platform. Binary versions of Perl for UNIX platforms can be found at the CPAN (Comprehensive Perl Archive Network) at http://www.perl.com/CPAN/README.html or at any of the various binary archive sites on the net for whatever platform you're looking for. Alta Vista can probably help you find such sites for various platforms.

If you install a prebuilt binary, the installation mechanism is up to whoever packaged the binary for you. Follow whatever directions you get with the binary you download. Installing the Perl source and building it yourself is an educational experience. You'll learn all about UNIX (in)compatibility and what kinds of hoops programmers have to jump through to make one C program compile on many system types. There is a README file in the Perl source distribution that provides instructions on building Perl for your system and installing it. Follow those instructions.

Windows-NT

If you want to run Perl on an NT platform, you need to download it from a different location: http://www.activestate.com/. ActiveState ports Perl to the Windows environment and keeps it up to date. Their NT distribution is nominally binary only, though source is available from ActiveState as well. The thing you want to download is called ActivePerl and it is at build 509 as of this writing. (That's Perl version 5.xxx, renamed to protect the innocent.)

Installation of the ActiveState Perl package is very simple. There is an InstallShield tool (setup.exe) that you run, answer 2 or 3 questions, and wait while it puts the stuff on your disk.

IMPORTANT NOTE: You will find some other ports of Perl to the Windows environment. There is one in the Microsoft Windows NT Resource Kit, and another in the MKS tool kit. DO NOT USE THESE PORTS! They are bug ridden and will cause you much grief! Immediately after installing either MKS or the NT Resource Kit delete their version of Perl (and all associated libraries) and download the latest version from ActiveState. If you leave the old ports and libraries around on your machine, you will wind up with systems that use incompatible libraries or fail at unpredictable times.

You have been warned.

What you'll install

When you install Perl on your system, you will install many files. This section does not provide a complete list, but you can always dig around in the /usr/local directory (on a UNIX system) or in the installation directory (on an NT system) to see what got installed out there for you.

There are 2 main directories of stuff that get installed:

  • bin -- contains the perl executable and some other supporting tools
  • lib -- contains the libraries and modules shipped in the base distribution
And one of the following:
  • man -- on UNIX only -- contains Perl man pages
  • html -- on NT only -- contains Perl documentation in HTML format
  

Where to get Help with Perl

  As mentioned above, if you want Perl source, binaries, modules, docs, advice, or most anything else perlish, start at: http://www.perl.com/. You'll find most everything there, or a link to it.

There are usenet news groups if you have access to them. comp.lang.perl might be useful.

There is at least one highly regarded magazine: The Perl Journal

Classes on Perl are taught by all of the University Extensions and many private training companies.

Finally, there are about 6.023 * 10^23 books on Perl available via Computer Literacy or via Amazon.com. The next section mentions the ones I know about.

  

Books About Perl

  There is only one comprehensive book on Perl that I can honestly recommend, and that's because it predates all of the others and so I haven't used anything else. That's OK, since the book in question is pretty darn good. It is:

Programming Perl by Larry Wall, Tom Christiansen, and Randal L. Schwartz. Published by O'Reilly & Associates. This is called "the camel book" because of the large camel on the cover. (That also explains all of the camels on and around the www.perl.com web site.) This is a very good book, funny at times, and it has most everything an average Perl programmer will need to know. The first edition of this book is only valuable now as collector's item, so don't bother with it.

Advanced Perl Programming by Sriram Srinivasan (also published by O'Reilly & Associates) is a useful book in certain more technical areas, but it is not a general purpose reference. I have used a couple of sections in here -- particularly Chapter 6 on Modules -- to enhance what I learned from Programming Perl.

Those with less programming background might want to try Learning Perl by Randal L. Schwartz and Tom Christiansen (yet another fine product of O'Reilly & Associates). I have an old copy of the first edition, and for me it was useful for about two weeks, but then again, I'd been doing systems programming for 10 years when I got the book. As with Programming Perl, make sure you get the 2nd Edition.

Finally, here is a book to avoid: Learning Perl on Win32 Systems by Randal L. Schwartz, Erik Olson, and Tom Christiansen. Yes, this is another O'Reilly & Associates book, but it's not nearly up to their usual standards. It doesn't provide much useful information at all, so avoid it and use any of the above Perl books along with the HTML docs in the ActiveState Perl port.

Unpaid plug: if you need one of these books, and want to pay for it yourself, order it through Amazon.com. I have found Amazon to be a really pleasant place to do business, and their selection is great. Plus they give you discounted prices, no sales tax in CA, and shipping charges that are pretty reasonable.

  

Perl Philosophy

  Perl philosophy might sound like an odd topic in a language course, but it is actually important to cover, at least briefly. That's because Perl's behavior grows from its underlying design and implementation philosophy, so if you know a bit about it, it may help you to anticipate what Perl will do in some cases.

First of all, "Perl" stands for one of two things:

  • Practical Extraction and Report Language
  • Pathologically Eclectic Rubbish Lister
See the perl man page if you don't believe me! What this means is two-fold:
  1. Many of Perl's fundamental functions and data structures are designed to support report generation and related activities.
  2. The Perl language may be a bit different from what you're used to, particularly since its become a kitchen sink of features and functions that you need to use to know about.
Perl grows at the whim and will of Larry Wall and his cadre of developers on the 'net. While that sounds like it might be disorganized, it turns out not to be in reality. With many people reviewing code and making comments and suggestions, bugs get fixed quickly and new features or facilities are implemented in ways that make sense from the perspective of the entire language. In fact, Perl grows much like the English language grows. When a good idea comes along in some other utility or language, Perl sucks it in and makes it accessible somehow, usually in a fairly clean way. Perl's OO system is a fairly recent addition along these lines, as are references, better support for modules, and many other features.

On notable thing about Perl is that there is almost always a reasonable default behavior for functions, operators, and everything else in the language. As you read through the list of functions supported by Perl you will learn that most of these defaults either do nothing or affect the magic variable $_. It takes some getting used to if you want to count on these default behaviors, but they can help you get things done quickly when you need to.

Another important aspect of Perl is the huge number of ways available to perform many operations. Perl strives to provide many basic language services that you can hook together in lots of different ways. But that tactic leads to the ability to do the same job in many different ways. You'll see more of this as you grow more familiar with the language itself.

  

Hello World

  Enough introduction. Here is the traditional first program in most any language:

print "Hello, World!\n";

Not much there, eh? This program prints "Hello, World!" to your screen and exits. C and C++ programmers will note the "\n" (newline) in there which causes the cursor to drop down one row and to go back to the left column. The semicolon (;) ends the statement.

Here's a slightly different version of the same program:

print( "Hello, World!", "\n" );

This is basically the same, but now we have parenthesis around the arguments to the print function, and we can see that print takes a variable number of arguments -- since there is no argument count and we're passing it two arguments in this example. This program does exactly the same thing the previous one did.

Now that you've seen the simplest Perl programs, you need to see how to get them to run on your system.

  

Starting a Perl Program on UNIX

  Starting a perl program can be done in many ways. You'll want /usr/local/bin (or its equivalent) on your PATH variable so that you have access to the perl executable program.

If your program is short, you can type it on the command line like this:

perl -c 'print "Hello, World!\n";'

The -c option takes a string argument that is perl code to execute. That works for really short programs, but anything of any size will quickly get difficult, and quoting complex shell scripts run this way is a significant challenge. Instead, you can put your perl script into a file and give the file name to the perl interpreter. (Many, but far from all perl script names end in ".pl". This is only a convention, not a requirement.) Running a script this way looks like this:

perl hello.pl

That is still not very nice, however. The user has to type "perl " and put the ".pl" on the end, when all they really wanted to do was run "hello". You can fix that by using the UNIX #! (pronounced "pound bang", "hash bang" or "she-bang") line. Modify your perl script file to look like this:

#!/usr/local/bin/perl
print "Hello, World!\n";

The "#!" line must be the very first line in the file, and the "#!" characters must be the first two characters on that line. Once you've modified the file contents, change its name to just "hello", and set the permissions to allow it to be executed. These UNIX commands will probably do what you need:

mv hello.pl hello
chmod ugo+x hello

After making those changes, you can run the perl script by typing just:

hello

This assumes one of the following:

  • the directory containing "hello" is on your PATH
  • the current directory (.) is on your PATH and your current working directory contains the file "hello"
If you are working on a really old UNIX system that does not support the #! interpreter invocation, there are more complex ways of getting a perl script running in a similar manner, but they are beyond the scope of this class.

The #! line is far and away the most common way to start perl scripts in my experience.

  

Starting a Perl Program on NT

  As with UNIX, starting a perl program on Windows NT can be done in many ways. You'll want the bin directory associated with your Perl installation on your PATH variable so that you have access to the perl executable program. If you used the defaults on the installation from ActiveState, this should already be the case. On my work system, the Perl executable is found at:

c:\perl\bin\perl.exe

As with UNIX, if your program is short, you can type it on the command line like this:

perl -c 'print "Hello, World!\n";'

The -c option takes a string argument that is perl code to execute. That works for really short programs, but anything of any size will quickly get difficult, and quoting complex shell scripts run this way is almost impossible in the Windows NT command line (cmd.exe). Instead, you should put your perl script into a file and give the file name to the perl interpreter. (Many, but far from all perl script names end in ".pl". This is only a convention, not a requirement.) Running a script this way looks like this:

perl hello.pl

That is still not very nice, however. The user has to type "perl " and put the ".pl" on the end, when all they really wanted to do was run "hello". There are at least three separate ways to fix this on Windows systems:

  • associate the .pl file extension with the perl.exe interpreter

    This association is something that can be setup for you during ActiveState's installation process. If it is setup for you, then typing "foo" (or double clicking on the file in the explorer) will run "foo.pl" in perl.exe.

    I don't recommend this option personally. I do not (in general) want ".pl" stuck on my filenames, and even if I did, I would want an editor (like wordpad) associated with that extension, since I want to edit my perl scripts pretty often. Thus this option is not something I like.

  • turn the perl script into a batch file

    In the ActiveState distribution you will find a batch file named pl2bat.bat. This batch file converts a perl script file (with any name) to a batch file. So if you have a perl script named "hello" and you run:

    pl2bat hello

    You'll wind up with a file named "hello.bat" which can be run any time by just typing "hello" at the command line.

    This is the option I usually use.

  • turn the perl script into an EXE file

    There is a pl2exe command in the ActiveState distribution that works like the pl2bat tool, but which avoids certain problems with I/O redirection. I have not experimented with this tool.

  • There may be other ways to accomplish this task. If you have trouble getting something to work, talk to me and I'll help you resolve the issue.
Once you have one of these methods properly setup, you should be able to run perl scripts at the NT command line. If you want to setup a shortcut to run a perl script, the easiest method is probably to have the shortcut run perl.exe and to provide the filename as an argument.
  

Comments

  Perl comments begin with a pound sign (#). They may appear anywhere in the program that whitespace might appear.
# this is a comment
                 # another comment
$foo = "bar";                   # still another comment

# $foo = "baz";     # this code never runs
Note that Perl has no preprocessor and thus no equivalent to conditional compilation. That means that commenting out code at compile time requires putting pound signs in front of every line. However, if all you are trying to do is avoid running certain code, you can use a runtime trick to accomplish a similar thing:
... code to run ...
if ( 0 )				# this is never true
{
	... code to skip ...		# This code still must compile...
					# gibberish will cause compile errors.
}
... more code to run ...
  

Variables and Data Types

  Perl has 6 basic data types.

Type Name Also Known As Sample Variable Name Description
Scalar n/a $foo Scalars contain on single value -- numeric or string. Note that '$', contains an 'S' (for scalar).
Array List @foo Arrays contain several items -- numeric or string. Items are indexed by a subscript in [] brackets. Subscripts start at 0 and go to 1 less than the number of items in the array. Note that '@', contains an 'A' (for array).
Hash Associative Memory %foo Hashes contain on multiple items -- numeric or string but instead of being indexed by a number, they are indexed by a string (called a key). So, you store or retrieve a value in a hash by referencing a key within the hash. The examples below should help explain this concept. Note that '%', contains two small circles -- one for an index and one for a value.
File Handle n/a FOO There is no leading funny character on a file handle. They are setup by calling open() or pipe() and must be a unique string. UPPERCASE names are used by convention so they stand out.
Function n/a &Foo When you define a function in Perl, it gets a name, and it can be called from most anywhere. Usually you don't need the leading & character to call a function since the parenthesis around the arguments or a prototype clearly make it a function call. However, there are cases (like creating a reference to a function) where you need the & character. More on this later.
Type globs n/a *foo Type globs are used to refer to all variables with a given name at once. They had use in several cases that are now much easier and cleaner when done with references, but there are still a few odd places where they are used. In general, they are beyond the scope of this class, since there use is really rare now.

Now that you've seen the data types, you can see some code that uses them to do some simple assignments and function calls.

	# Assigning numeric values to scalar variables...

        $foo = 123;		# assign an integer to $foo
        $foo = 123.45;		# assign a decimal number to $foo
        $foo = 6.023E23;	# assign a decimal number using
				#	scientific notation
	$foo = 0xFF734;		# assign hex integer
	$foo = 0765;		# assign octal integer (note leading 0)
	$foo = 1_234_567;	# use underscores for "legibility"

	# Assigning string values to scalar variables...

        $foo = "abc\n";		# a simple string
        $foo = 'abc';		# another simple string
One difference between "xxx" and 'xxx' has to do with what kind of special characters can be put into the string. When using "xxx" strings, there are many special escape sequences that can be embedded in a string. Things like:
  • \n -- newline
  • \r -- carriage return
  • \xFF -- hexadecimal for FF (or any other character)
  • \033 -- octal for escape (or any other character)
There are many more. See page 40 in Programming Perl for a full list.

Double quoted strings also allow for variable interpolation, in which the contents of a scalar or list variable are substituted into the string where the variable name appears inside the double quotes. Example:

	$foo = "bar";			# set scalar $foo
	print "foo = $foo\n";		# prints "foo = bar" on a line
Interpolation does not happen inside strings surrounded by single quotes.

Another note: if you want to put an @ in a string and do not want it interpolated, you need to escape it with a backslash.

	@bar = ( "but", "not", "here" );
	print "want \@ here @bar\n";
This prints "want @ here but not here". Without the backslash in front of the first @ character, you get a compile error.

Back to the sample variable assignments:

	# Assigning items into an array...

	@foo = ();		# empty the array
	$foo[0] = "abc";	# set the first element of the array to "abc"
	$foo[1] = 123;		# set the 2nd element of the array to 123
	@foo = ( "cde", 321 );	# set first element of @foo to "cde" and
				# the 2nd element to 321, overwriting
				# previous values
	$foo[100] = 'xyz";	# set the 101st element of the array to "xyz".
				# unset elements 2 - 99 have the undefined
				# value (undef)

	# Assigning items into a hash...

	%heights = ();			# empty a hash
	%heights = ( 			# initialize the heights hash
		"Jim" => 78,		# Jim is 6' 6"
		"John" => 48, 		# John is 4'
	};
	%heights = ( 			# exact same initialization
		"Jim", 78, "John", 48
	);
	$heights{"Jim"} = 76;		# Jim got 2" shorter
	$heights{"John"} = 60;		# John got 1' taller

	# print everyone's current height
	# "foreach" and "keys" will be explained later

	foreach $who ( keys( %heights ))
	{
		print "$who is $heights{$who} inches tall\n"
	}
By now you're probably wondering why sometimes there are @ or % signs in front of array or hash variable names, and other times there are $ signs. The answer is that it all depends on what you are actually manipulating. When you are dealing with an entire array or hash, or with a subset (called a slice) containing more than one item from an array or hash, then you use the @ or % sign. If you are dealing with a single item, it's just as if it was a scalar variable, so you use the $ sign. That may take some getting used to, but it will make sense later, particularly when you see how references work.

Back to the variable assignments... file handles this time:

	# examples using file handles

	open( FOO, "<in.txt" );		# open file 'in.txt' for reading
					# use file handle FOO

	open( BAR, ">out.txt" );	# open file 'out.txt' for writing
					# use file handle BAR
					# overwrite previous contents

	open( BAR, ">>out2.txt" );	# open file 'out2.txt' for writing
					# use file handle BAR again, implicitly
					#   closing the previous file
					# append to previous contents

	$line = <FOO>;			# read a line from in.txt into $line

	print( BAR $line );		# print $line out to out2.txt
					# note the lack of a comma!
  

Operators

  Perl has many operators. Here is a list of some of them. See Programming Perl pp 76 and 85 for a full list of supported operators.

Operator Use Description
++, -- $a++ = --$b; Increment & decrement, like the C equivalents
+, -, *, /, **, % $a = $b + $c - $d * $e ** $f  / $g; Normal arithmetic operators, with normal (C-like) precedence.
==, !=, <, >, <=, >=, <=> if ( $a == $b ) ...
if ( $a < 123.456 ) ...
Numeric comparison operators; "<=>" is a special comparison operator used in sort routines, it returns a value less than 0, 0, or greater than 0, depending on whether or not the first term is less than, equal to, or greater than the second term.
eq, ne, lt;, gt, le, ge, cmp; if ( $a eq $b ) ...
if ( $a le "foobar" ) ...
String comparison operators; "cmp" is the string equivalent of the <=> numeric operator.
&, |, ^ $a = $x & $y;
$a = $x | $y;
Bitwise and, or, and xor.
||, &&, ! if (( $a && $b ) || ( ! $c )) ... logical comparison operators -- like C equivalents
and, or, xor, not if (( $a and $b ) or ( not $c )) ... logical comparison operators again, lower precedence than the C like ones
=, +=, -=, *=, /=, %=, etc. $a = $b;
$a += $b;
Assignment operators. "x=" forms put computed value back into variable on left side of assignment (e.g., $a += 3; and $a = $a + 3; are identical;
<<, >>, <<=, >>= $a = $b << 4;
$c <<= 2;
numeric shift and shift/assignment operators
-r, -w,
-x, -f,
etc.
if ( -r $file ) ... File test operators. These do tests of files in the file system and return information about the file. -r tests readability, -w tests writeability, -x tests executability, -f tests that it is a file (and not a directory, symlink, etc.) There are many of these operators. See Programming Perl pp 85 for a full list.
=~, !~ $a = s/a/b/; Binding operators, cause pattern matching, search and replace, and translation functions to operate on something other than $_;

Perl's precedence rules are similar to C's, but there are many more operators than there are in C, so the precedence table is fairly complex. Again, see See Programming Perl pp 76 for a full discussion of all of Perl's operators, including the precedence rules.

  

Defining Truth

  Many of Perl's operators and language functions need to operate on or return a value that is either "true" or "false". In Perl, any value that can be put into a scalar can be interpreted as true or false by the following rules:

  • Any string is true except for "" and "0".
  • Any number is true except for 0.
  • Any reference is true.
  • Any undefined value is false.
Here are some examples:

usevaluecomment
"abc" true the string is defined, not empty, and does not contain just "0", so it's true
"0" false the string contains just "0" so it's false
"0.00" true the string does not contain just "0" so it's true
"1" true the string does not contain just "0" so it's true
1 true the value is not 0, so it's true
0 false the value is 0, so it's false
0.00 false the value is 0, so it's false

See Programming Perl pp 20 for a full discussion of the definition of truth.

  

Control Flow Statements

  Perl has the usual flow control statements, along with some more unusual variants.

  • if elsif else, and unless
    	if ( $a < $b )
    	{
    	    do_something_with( $a );
    	}
    	elsif ( $a > $b )
    	{
    	    do_something_with( $b );
    	}
    	else    # $a must be equal to $b
    	{
    	    do_something_else_entirely();
    	}
    Multiple elsif statements are allowed, and the if and elsif clauses operate on the definition of truth mentioned above.

    There is also an unless statement, very similar to the if statement:

    	unless( $a < $b )
    	{
    	    do_something_with( $b );
    	}
    There is no "elsunless" statement, however. These statements can also be "turned around" which doesn't change what they do, just how they look. Sometimes this can help readability:
    	# these two statements are the same:
    
    	if ( $a > 0 ) { $b++; };
    	$b++ if ( $a > 0 );
    
    	# as are these two:
    
    	unless ( $a > 15 ) { $b = 10; }
    	$b = 10 unless ( $a > 15 );

  • while and until
    	while ( $a < $b )
    	{
    	    do_something_with( $a );
    	    $a++;
    	}
    There is also an "until statement, which changes the sense of the comparison:
    	until ( $a > $b )
    	{
    	    do_something_with( $a );
    	    $a++;
    	}
    Also note that you can use a do{...} block to move the conditional test to the bottom of the look, and thus force at least one execution of the loop before the test is done:
    	do
    	{
    	    $a++;
    	    f( $a );
    	} while ( $a < 100 );
    
    	# or
    
    	do
    	{
    	    $a++;
    	    f( $a );
    	} until ( $a > 100 );
  • for and foreach

    Perl's for statement is basically identical to C's for statement:

    	for( $i = 1 ; $i <= 10 ; $i++ )
    	{
    	    print "\$i is $i\n";
    	}
    As in C, there are three items inside the parentheses, any or all of which can be missing:
    • An initialization expression
    • A test expression -- a false value exits the loop
    • An modification expression -- to change the loop variable
    Missing initialization and modification expressions are ignored; missing test expressions are treated is if they returned true, which is identical behavior to the C for statement. (See below about infinite loops.)

    Perl's foreach statement is used to iterate over the contents of a list. Consider:

    	@a = ( 1, 2, 3 );
    	foreach $i ( @a )
    	{
    	    $i +=1;
    	    print( "i = $i\n" );
    	}
    	foreach $u ( @a )
    	{
    	    print( "j = $j\n" );
    	}
    This prints the following output:
    	2
    	3
    	4
    	2
    	3
    	4
    The trick is that in foreach loops, the loop variable is not treated as a standard variable, but rather as an alias for the actual contents of the array. Thus modifying the contents of the loop variable (as we did with the += statement in the first foreach loop) modifies the array.

  • loop control: next, last and redo

    Any while, until, or foreach loop that does not begin with a do{...} statement can also have a label, as in:

    	LOOP: foreach $a ( @list )
    	{
    	    ...
    	}
    These labels are useful because you can use them in conjunction with the loop control statement to exit a loop, skip the current iteration of the loop, or restart the current iteration of the loop. For example:
    	INPUT: while ( <INFILE> )
    	{
    	    next INPUT if /^\s*$/;            # skip empty lines
    	    next if /^\s*#/;                  # skip comments
    	    last INPUT if /^__END__$/;        # stop if we find magic token
    	    ...
    	    here we'd process the contents of line we read
    	    ...
    	}
    Loop control statements do not require a label -- if one is not present, the innermost enclosing loop is affected. (That's why the two next statements above both work.)

  • Infinite loops

    Sometimes you need to write an "infinite" loop -- one that won't exit until you tell it to. There are two common ways to do this in Perl:

    	while( 1 )
    	{
    	    ...
    	    last if ( your condition here );
    	    ...
    	}
    
    	for(;;)
    	{
    	    ...
    	    last if ( your condition here );
    	    ...
    	}
    The first works because 1 is always true, so the loop never exits. The second works by convention -- a missing test section always evaluates to a true value.

    There is no built in switch (or case) statement in Perl. There are many ways to write something similar using Perl's basic statements. See Programming Perl pp 104 for a discussion of how to write case statements.

  

Regular Expressions and Pattern Matching

  Regular expressions are one of Perl's most powerful features, and they require the most extensive documentation. We'll cover only the basics here, see Programming Perl pp 57 - 76 for a full discussion of all of regular expressions in Perl.

Pattern matching is a technique for finding carefully defined substrings within a larger string, and optionally removing, replacing, or saving them for later use elsewhere. There are a huge number of ways that pattern matching can be used, but they all use the same basic facilities underneath.

Let's start with a simple example:

$a = "this is a test of the emergency broadcast system";

if ( $a =~ /test/ )
{
    print "test found!\n";
}
else
{
    print "test not found!\n";
}
Let's look that over in detail:

The =~ operator causes the matching operation to work against $a, instead of against $_. The search pattern here is /test/. This is a simplification of "m/test/" -- the leading 'm' is not needed because we're using slashes as our delimiters. If we wanted to use another delimiter, we'd need the m, as in "m,test," which is the same as '/test/' or 'm/test/'.

What is happening here: each "thing" inside the search pattern is being compared with the contents of the string, to see if it is found. In this case, the 't' from from '/test/' is compared with the first character of the string, and a match is found. Then we compare the 'e' from '/test/' with the 'h' from the second character of the string, and it doesn't match, so we drop back to the 't' and keep looking. We fail to the match 't' 9 more times (each character in 'his is a ') and then match against the 't' from 'test'. Once there, each additional character in the search pattern '/test/' matches against its corresponding character in the string. When we run out of characters in the search pattern, success is declared, and the match operation returns a true value.

As a result, this code will print "test found!".

Comparing characters is pretty useful, but only the tip of the iceberg in pattern matching. Many "things" other than characters can appear in your search patterns. Here are some of the most common:

charactermeaning
. matches any character except newline
* matches zero or more of the thing preceding it
+ matches one or more of the thing preceding it; so "a+" is the same as "aa*"
? matches the thing preceding it zero or one times;
^ matches the beginning of the string
$ matches the end of the string
[a-z] square brackets match any single character contained inside them; use the dash to indicate character ranges
[^a-zA-Z] the leading '^' inside square brackets makes them match any single character that is not contained inside them; again, you may use the dash to indicate character ranges
\d matches any digit; same as [0-9]
\D matches any non-digit; same as [^0-9]
\s matches any whitespace character; same as [ \t\n\r\f]
\S matches any non-whitespace character; same as [^ \t\n\r\f]
\w matches any word character; same as [a-zA-Z_0-9]
\w matches any non-word character; same as [^a-zA-Z_0-9]
(...) matches the regular expression inside the parentheses, and remembers what was matched for later use vi \1 or $1 style variables
| creates an alternate match; "/joe|fred/" matches either "joe" or "fred" in the target string

There are many additional pattern matching items -- entire books are written about pattern matching, so don't expect a full summary here. However, those are usually the most useful.

More examples, using these new "things" in the search pattern:

$foo =~ /^A/              # true if $foo starts with 'A'
$foo =~ /^Z.*q+.*a$/      # true if $foo begins with 'Z',
                          # and ends in 'a', and has at least one
                          # 'q' in the middle somewhere
                          # "Zqa" and "Z---qqq-a" both match.

# here's a more complex example:

$foo =~ /(\d\d):(\d\d)\s*(AM|PM)/i;
$hour = $1 - 1;
$minute = $2;
if ( $3 =~ /PM/i ) { $hour += 12; }
That last example needs some explaining.

First, remember that parentheses do two things: they cause Perl to remember whatever was match for later use, and they delimit sub-parts of the regular expression. We're doing both in this case.

  • (\d\d) matches two digits, and stores them into $1
  • : matches a colon
  • (\d\d) matches two digits, and stores them into $2
  • \s* matches 0 or more space characters
  • (AM|PM) matches one of the strings "AM" or "PM"
"But," I hear you wonder, "what's that 'i' after the ending '/'?"

That is a modifier -- it tells Perl that all case comparisons in the pattern match are to ignore case. In this case, that means that (AM|PM) will match any of the following: "AM", "Am", "aM", "am", "PM", "Pm", "pM", or "pm". using the 'i' modifier is a handy shortcut to avoid really complex patterns. There are other modifiers, some of which we will discuss shortly. If you cannot wait, see Programming Perl pp 69 for a full list.

So what is going on here? Easy. Assuming $foo contains a time, a 2 digit hour is stored into $hour, a 2 digit minute is stored into $minute, and the presence of PM is used to complete the storing of the time in 24 hour format. And what if $foo doesn't contain a time, or it contains a time that is formatted differently? Then this code is dangerous. $1, $2, and $3 should be undefined, so you won't get a time, but you will get a mess. As a result this code would be better written:

if ( $foo =~ /(\d\d?):(\d\d)\s*(AM|PM)?/i )
{
    $hour = $1;
    $minute = $2;
    # check for 24 hour time format -- avoid 
    # conversion if we are already in 24 hour time
    if (( defined( $3 ) and ( $3 =~ /^(A|P)M$/i ))
    {
        # we have a 12-hour time; convert to 24 hour
        $hour -= 1;
        if ( $3 =~ /PM/i )
        {
            $hour += 12;
        }
    }
}
else
{
    print( "Time not found or format not",
           "understood\n" );
}
This code is cleaner. There are still ways to improve it, but at least we announce when we find something we don't expect, and we handl 12 hour clock formats if we find them.

The next thing beyond a pattern match is a search and replace. Here is an example:

$foo = "original string, just to string you along";
$foo =~ s/string/bring/;
print( $foo, "\n" );
So what happens here? The 's///' code is the search and replace function supplied by perl. As with 'm//' you can use separators other than slashes (which is particularly useful when searching or replacing in UNIX path strings) so 's,xxx,yyy,' is legal and the commas would be the separators in this case.

The sample code is doing regular expression matching; the first pattern is the thing to match ('string' in this case) and the second pattern is the thing to replace it with ('bring'). So you might think that it would print:

original bring, just to bring you along
but that is not correct. Instead, it will print:
original bring, just to string you along
That may seem wrong, but it turns out to be very powerful. Perl's search and replace function grew out of the UNIX tools that did similar things -- particularly vi -- and those tools always start by assuming they are supposed to replace only the first occurrence of the pattern on a particular line. If you want to replace them all, you need to do a global search, which requires adding the 'g' modifier, as follows:
$foo =~ s/string/bring/g;
Now all the places that say "string" will be replaced with "bring".

Search patterns can use all of the features discussed in the pattern matching section. So:

$foo = "aa bb cc this is a test aa bb cc";
$foo =~ s/a+\s*b+\s*c+/abc/g;
$foo =~ s/^(.*)(t\S*s)(.*)(is)(.*)$/$1$4$3$2$5/;
print( $foo );
prints out the following:
abc is this a test abc
If you want to search for special characters in your string, you'd escape them with backslashes:
$foo = "/usr/local/bin/perl";
$foo =~ s/\//\\/g;
print( $foo );
prints:
 \usr\local\bin\perl
The first backslash precedes a slash -- so that it becomes a character you are searching for, and not the end of the search string. The second backslash precedes a third backslash -- so that a literal backslash is used in the replacement pattern. I know that's convoluted, but anyone trying to write code that will run on both Windows and UNIX will appreciate it. Use a different separator (like comma -- ',') to avoid all the extra backslashes in front of regular slashes, but you'll still need one in front of the backslash, since that's the escape character itself.

We have barely scratched the surface of regular expressions and pattern matching. There are many more intricate details including additional special matching characters, repeat counts, controlling the "greediness" of the search, and more. Again, please read Programming Perl pp 57 -76 for all the details. In addition, though I have never read it myself, O'Reilly & Associates publishes a book titled Mastering Regular Expressions by Jeffrey E. F. Friedl. I know nothing about it, except that Programming Perl recommends it. If anyone wants to give me a review or loan me their copy at some point, I'd appreciate it.

  

Quoting

  Perl supports several different kinds of quotes. These should look somewhat familiar to shell programmers.
  • "" -- double quotes
    These allow variable interpolation (i.e., substitution of the contents of a variable where the name appears in the string) and a wide variety of special characters and escape sequences.

  • '' -- single quotes
    These disallow variable interpolation and most escape sequences and special characters. (I think \' is the only escape sequence recognized).

  • `` -- back ticks
    These do command substitution. that is, the contents of the back ticks is treated as a command and run by the shell, then the output of the command is inserted into the string where the command was originally. (e.g.: $foo = `ls`;)

  • qXXX -- pick your own quotes
    Perl allows you to pick your own quote characters to make your life easier. For example:
    		$foo = "abc";
    		$bar = "a${foo}z";
    		$baz = qq/a${foo}z/;
    In this code, $bar and $baz both wind up containing the string "aabcz". The "qq//" notation is a generalized way of writing quotes that allow interpolation, and is exactly equivalent to using double quotes, but allows you to select the quote characters you want instead. It is possible to use things like: qq,xxx, where commas are the quote characters.

    A very common construct in perl is word quotes:

    		@list = qw( a set of words to be put into a list );
    This code creates a list with 10 items in it -- each word of the list. Word quotes of this nature do not allow interpolation, and whitespace delimits the items within them. See Programming Perl pp 41 for a detailed discussion of the various types of quotes available with the qXXX mechanism.

  • here documents
    These are patterned after the UNIX shell's ability to quote large blocks of text easily:
    		$foo = <<'End_of_foo';
    		This line gets put into $foo
    		This line too
    		# even this line -- it is NOT a comment
    		This is the last line going into $foo
    		End_of_foo
    When this code runs, $foo winds up containing 4 lines of text each with a newline on it. (Note: This is not an array, it is a scalar with a lot of text in it.) The single quotes around 'End_of_foo' tell Perl that variable interpolation is not allowed in the contents of the here document -- so in the example $foo did not expand, it was just put into the variable literally as $foo. If you need variable interpolation in the contents, use double quotes or leave the quotes off entirely:
    		$foo = "abc";
    		$bar = <<endoftext;
    		line 1
    		$foo
    		line 2
    		endoftext
    		print( $bar );
    Which prints:
    		line 1
    		abc
    		line 2
    Also note that if you want to include white space in your here document delimiter line, you must quote it after the << operator, and that the terminator must appear by itself, unquoted and with no surrounding whitespace, on a line to be recognized by Perl.
  

I/O

  All programs need to read and write data. Here are the basic things Perl provides to do those tasks.
  • standard input, output, and error

    Perl provides access to the three standard file handles setup for you by the operating system: STDIN, STDOUT, and STDERR.

    • STDIN (standard input) is usually connected to the keyboard, but may be redirected to come from a file or another process by the user.

    • STDOUT (standard output) is usually connected to the screen, but may also be redirected to a file or process by the user.

    • STDERR (standard error output) is also usually connected to the screen and is the location that many programs send their error messages, to keep them separate from their regular output. Like STDOUT, it may be redirected by the user.

  • print, printf -- common output functions

    Perl provides the two basic functions to print data to a file handle.

    • print takes an optional file handle and a list of variables and constants to print. If no file handle is specified, the default output file handle is used (which is usually STDOUT, but it may be changed with the select function). Each item in the list of things to print is sent to the file handle. Constants (aka literals) and scalars are printed without modification -- particularly note that newlines are NOT added to the output for you. Arrays are printed one element at a time without any separators, but, if an array is interpolated into a string, a separator (the contents of the $" special variable -- usually a single space) is added between the array elements. Example:
      	@foo = ( "a", "b", "c" );
      	print( @foo, "\n" );
      	print( "@foo\n" );
      produces the following output:
      	abc
      	a b c
      Hashes are handled somewhat differently. They print like arrays when printed on their own, but entire hashes may not be interpolated into strings (though individual elements may). Example:
      	$foo{a} = "b";
      	$foo{b} = "c";
      	print( %foo, "\n" );
      	print( "%foo\n" );
      produces this output:
      	abbc
      	%foo
      Be aware that the optional file handle given to print does NOT have a comma after it. So:
      	print( FH "this is a test\n" );       # correct
      	print( FH, "this is a test\n" );      # wrong -- compile error
      The Perl interpreter will tell you about this, but it's just easiest to remember it before hand.

    • printf is used to do formatted printing of data values. It takes an optional file handle (just like print), a formatting string, and a list of data values to print. The formatting string is very much like C's printf operator expects, so if you are familiar with that, you're ready to go. If not, a formatting string consists of a series of characters, some of which are literal while others control the printing of the items in the list. Control characters are introduced by the percent sign (%). Here is a simple example:
      	$foo = 123.45;
      	$bar = "abc";
      	printf( STDOUT "foo is %12.5f and bar is \"%10s\"\n",
      	    $foo, $bar );
      which produces:
      	foo is    123.45000 and bar is "       abc"
      For more details on the various printf format specifiers, see the sprintf function reference in Programming Perl and the printf man page on a UNIX system or the on-line reference for your C or C++ compiler on a Windows system.

  • <> -- the angle (input) operator

    The angle operator (<>) is used to read data from a file handle. With no file handle between the angle brackets, it reads a line from the default standard input file handle (which is usually STDIN, but it may be changed with the select function). If a file handle appears between the angle brackets, it reads from that file.

    	$line = <>;            # read from default input file handle
    	$line2 = <STDIN>;      # read from STDIN
    	open( MYFH, "<input.txt" );  # open input.txt for reading
    	$line3 = <MYFH>;             # read from input.txt
    One thing to note about the angle operator: it is sensitive to the context it is operating in. The previous examples are all in scalar context (that is, the left side of the assignment is a scalar variable) but this one is different:
    	@lines = <INPUTFILE>;
    This reads ALL the lines from the INPUTFILE file handle and puts them into the array @lines. This is probably what you wanted if you wrote this code, but try this:
    	print OUTPUTFILE <INPUTFILE>;
    This might look like it will read one line from the input file and write it to the output file, but that is not the case. Since print takes a list of arguments, the angle operator knows that it should provide a list as its return value, so it returns the entire contents of the input file as an array, which all get printed to the output file.

  • open -- to open a file or pipe to a process

    The open function prepares a file handle for use, as we've seen above. Here are some sample uses of open to prepare files for use:

    	open( INFILE, "<input.txt" );    # open input.txt for reading
    	open( OUTFILE, ">output.txt" );  # open output.txt for writing...
    	                                 # will overwrite previous contents
    	open( OUTFILE2, ">>out" );       # open out for appending...
    	                                 # leave previous contents alone
    	open( IOFILE, "+>datafile" );    # open data file for reading
    	                                 # and writing
    
    	# open returns undefined value if it fails, so:
    	open( INFILE, "<required.txt" ) || die( "cannot open file" );
    In addition to opening files, open can also open pipes to processes. This is an advanced topic, so see the books for full details, but here is an example:"
    	$ls_one = `/bin/ls -l /tmp`;          # get ls output using backticks
    	open( FH, "/bin/ls -l /tmp |" ) ||    # start ls cmd and pipe output
    	    die( "cannot run /bin/ls" );      # complain if it fails
    	$ls_two = join( '', <FH> );           # read all ls output, join
    	                                      # lines with nothing, and save
    	close( FH );                          # close file handle
    When this code completes, $ls_one and $ls_two contain the exact same data -- the output of running the ls command against the /tmp directory.

  • close -- close a file handle

    When you are done with a file handle, close it using this function as shown in the previous example. If you don't your program will not release resources back to the operating system until it exits, and if your program uses a lot of files, you can hit a limit on the number of files you may have open at any one time.

  • read and write -- read and write at arbitrary locations in a file

    If you need to write binary data to a file, or read quantities of data other than lines, you'll need read and write. These functions operate just like their C counterparts. See Programming Perl for more details.

  

Special Variables

  Perl has many special variables for controlling certain aspects of the runtime environment and various behaviors. See Programming Perl pp 127 for a full list. Here we touch on only a few that are useful at times.

variable meaning
$0, $1, $2, etc. The variables set by parentheses in pattern matching. The are local to the enclosing block.
$| Autoflush. One copy of this variable exists per open file handle. Set to a true value, each time a print function is called, the data is flushed out to the file or pipe by Perl, rather than buffering it until an entire line or block of data is ready to go. Used with select, as in:
	select( MYFH );   # select my file to be default output
	$| = 1;           # turn autoflush on
	select( STDOUT ); # return to STDOUT for output
This use is less common now, since there is an OO method of getting at the autoflush setting, but you will still see it used (by the author, if no one else).
$_ The default variable used by many functions in Perl.
$" List separator. The contents of this variable are inserted between the elements of an array when they are interpolated into a string. Default is a single space character.
$? The exist status of the last child created by system(), back ticks, or pipe close function. Be aware that different operating systems can do different things to this number, but 0 usually means success in UNIX and Windows programs.
$! The contents of errno, as set by the C library calls made by Perl. This value is unusual, in that it is a string and a number, depending on how it is referenced.
$$ Current process ID. The PID of the Perl interpreter running your perl script.
$0 The program name -- the name or your perl script in all cases I have seen so far.
$^O The OS name -- a magic string that identifies what OS you are running on. Be aware that the various Windows Perl ports use different strings, even when running on the same box! (This variable is control-O, not carat followed by O.)
@ARGV The array of command line arguments.
%ENV A hash containing the environment at the time Perl was started. Modifying the contents of this hash changes the environment for child processes you create.

Most of these variables have alternate (long, English-like) names that become available with the "use English;" pragma in Perl. However, in my experience, most scripts still use the short names.

  

Writing Functions

  As with any programming language, you will want to create functions to perform specific bits of work. Here's a sample of how to do that.
$result = func( 'a', 'b');
print $result, "\n";

sub func
{
	my( $arg1, $arg2 ) = @_;
	# your function's perl code goes here
	$foo = "$arg1 xxx $arg2";
	return $foo;
}
which produces the following output when it runs:
a xxx b
As you can see, you declare a function with the sub keyword. (Actually, sub is a built in function in perl, but never mind.) Functions may return any of the basic data types, including lists, hashes, and scalars. This example returns one scalar. You call a function by using its name and a series of arguments inside parentheses. If there are no arguments, use empty parentheses. (We have seen unusual Perl bugs if functions are called without parentheses.)

Inside the definition of the perl function you can see that the arguments are present in the @_ array by default. (They are copied there, so changing them does not change the original variables in the caller -- until we start talking about references.) The my() call copies the arguments from the @_ array into local variables. Your code does whatever it wants. (Note that in the example, $foo is not a local variable... it is global.) When the function completes, it may return with an explicit return statement (as shown) or it may just fall off the end of the function. The return value is either the result of the last statement executed, or the value given to the return call.

Returning more than one value is possible with Perl functions, since you may return a list. Example:

( $val1, $val2 ) = func( $p1, $p2, $p3, "abc" );
sub func
{
	my( $arg1, $arg2, $arg3, $arg4 ) = @_
	# some code building things from the arguments
	return( "foo bar", "blather" );
}
When func returns, $val1 will contain "foo bar" and val2 will contain "blather". Note that the function could have been called as:
@results = func( ... );
in which case $results[0] would contain "foo bar" and $results[1] would contain "blather".

Finally, it is possible to ignore an item returned by a function:

( undef, $wanted ) = func( ... );
In this case, $wanted will contain "blather" but the first value returned will be discarded.

A few other points:

  • Functions may be defined anywhere in your script; before or after calls to them appear in the script.
  • Functions many call other functions to any depth.
  • Recursion is fully supported.
  

Local Variables: my() and local()

  There are two ways of creating local variables with Perl. The old way using the local() function, and the new way, using the my() function. In almost all cases you want to use my().
  • my() -- creates true local variables. They exist only until the enclosing block of code (inside curly braces { ... } ) ends, then they disappear. These variables are similar to stack variables in C. The example in the functions section shows the use of my() to create local variables and assign the contents of an array to them.

  • local() -- creates globally scoped variables with locally scoped values. That is, the variable is still global, but its current value will be restored to its previous value when the enclosing block is exited.

An example might help:

$v1 = 1;
$v2 = 1;
f1();
sub f1
{
    local( $v1 ) = 2;
    my( $v2 ) = 2;
    print( "func f1; v1: $v1\n" );
    print( "func f1; v2: $v2\n" );
    f2();
}
sub f2
{
    print( "func f2; v1:$v1\n" );
    print( "func f2; v2:$v2\n" );
}
The output of running this program is:
func f1; v1: 2
func f1; v2: 2
func f2; v1: 2
func f2; v2: 1
So, as you can see, the new value set into $v1 in function f1() was visible inside function f2(), rather than the global value of $v1 set before either function was called. But $v2 behaved as you'd expect a local variable to behave.

In summary, use my() to create local variables. If you think you need local(), think again, and again. There are certain really weird cases where it is useful now, but if you're hitting one of them, you're way beyond what this class can teach you.

  

Context -- Scalar or Array

  Occasionally you may want to write a function that returns a scalar or an array, depending on what the caller is expecting. Here's a simple example of a function that reads an entire file of text, and returns either a scalar containing the entire file as one long line, or an array of lines, depending on what the caller wants:
sub ReadFile
{
    my( $fname ) = @_;         # local variable $fname
    open( FH, "<$fname" );     # open file for reading
    my( @lines ) = <FH>;       # read file into local array
    close( FH );               # close the file
    if ( wantarray )           # if caller wants array
    {
        return @lines;         # return an array
    }
    else
    {
        return( join( '', @lines ));  # else return one long line
    }
}
$data = ReadFile( 'datafile' );    # call in scalar context
@lines = ReadFile( 'datafile' );   # call in array context
After this code runs, $data contains all of the lines in 'datafile' strung together as one long text string. @lines contains all of the same lines, but they are separated into the elements of an array.
  

References

  References are data items that refer to other data items. In some ways they resemble pointers as implemented in C and other languages, but they are safer because they are cannot point to any arbitrary memory address, cannot be used in arithmetic, and contain associated type information, so that what kind of thing they point to is known. Most of those benefits really only matter in OO systems, and since this class doesn't cover the OO features of Perl (at least not yet) they are not discussed here. However, there are a few things about references that make them useful outside of OO code.

References are stored in scalar variables, and created using the backslash operator.

$foo_ref = \@foo;
This code makes $foo_ref contain a reference to the array named @foo. The curious can run this code:
print "$foo_ref\n";
and you'll get something like this out:
ARRAY(0xca5d68)
From this you can tell that this is a reference to an array, and the address of some perl internal data structure holding (or pointing to) the array is 0xCA5D86. (Note that the address is not useful to you... it's useful to Perl, however.)

Once you have a reference it can then be passed to functions as a scalar, but those functions may dereference the contents by placing the original type designation character in front of the dollar sign. So:

 @bar = @$foo_ref
copies the contents of the @foo array into the @bar array. This could happen in a function that doesn't have lexical scope to see the @foo array, so long as it can see (or was passed) the reference to the array ($foo_ref). Why is this useful at all?

First, references allow call by reference instead of just call by value. This is very handy to avoid huge parameter passing overhead. So instead of doing this:

	...
	code to set @foo to contain 10,000 elements
	...
	bar( @foo );	# pass a copy of all 10,000
			# elements to function bar
	...
	sub bar
	{
		my( @baz ) = @_;	# copy all 10,000
					# elements into @baz
		...
		foreach $i ( @baz )
		{
			code to do something based on array contents
		}
	}
You can do this instead:
	...
	code to set @foo to contain 10,000 elements
	...
	bar( \@foo );	# pass a reference to the array function bar
			# a reference is a single scalar item
	...
	sub bar
	{
		my( $arraryref ) = @_;	# copy reference into a
					# local variable
		...
		foreach $i ( @$arrayref )
		{
			...
			code to do something based on array contents
		}
	}
In the second case we're not copying an array of 10,000 things twice. That can be a big efficiency gain at times.

Another thing references allow you to do is pass multiple lists to a single function. In perl 4 you couldn't do that, since the first list parameter in your my or local variable declaration would gobble up all of the arguments. Example:

	@gl1 = ( "a", "b" );
	@gl2 = ( "c", "d" );

	f( @gl1, @gl2 );	# call f() and pass two lists
				# won't work this way

	sub f
	{
		my( @ll1, @ll2 ) = @_;	# make local variables

		# note:
		# @ll1 now contains: ( "a", "b", "c", "d" )
		# @ll2 is empty.
		...
	}
Using references, you can do this instead:
	@gl1 = ( "a", "b" );
	@gl2 = ( "c", "d" );

	f( \@gl1, \@gl2 );	# call f() and pass two
				# references to lists

	sub f
	{
		my( $l1ref, $l2ref ) = @_;	# make local variables
		my( @a ) = @$l1ref;		# copy @l1 into local
						# variable @a
		my( @b ) = @$l1ref;		# copy @l2 into local
						# variable @b
		# note:
		# @a now contains: ( "a", "b" )
		# @b now contains: ( "c", "d" )
		...
	}
Now the local variables contain the same things as the original lists in the calling code. Of course, you don't have to copy the contents of the lists out of the references to use them. If you have a reference to a list, you can use a foreach loop on it like this:
	foreach $i ( @$list_ref )
	...
You can have references to scalars (which are particularly useful for gaining efficiency when your scalars contain multi-megabyte strings), arrays, hashes, and functions.

That's only the most simple uses for references. When this course is expanded to cover the OO portion of Perl, much more detail on references will appear here.

  

Packages

  Other than the "use" statement, most of you are not likely to need to know about packages, but knowing a little bit may help explain some things you'll see from time to time.

A package is a separate namespace in Perl. Basically, when a perl script is compiled, it is put into the "main" namespace (or package). Thus, all of the variables and functions your script creates are in the main namespace too. When you use a "use" statement, however, things change. The functions and variables declared in the file that you refer to with the use statement are (usually) put into a separate namespace -- one determined by the "package" statement present in the package source file. For example, when you say:

	use File::Find;
several functions and variables are created and put into the "File::Find" namespace. To make them easily callable or usable to your script, the Perl interpreter does some odd work to force certain selected names into the "main" namespace for you. When that work is done, you can call find() in one of two ways:
	find( ... );               # called via the main namespace
	File::Find::find( ... );   # qualified name in File::Find package
The first is cleaner to read, but the 2nd actually tells you where the function find comes from. Use the first -- no one usually cares where the function comes from unless they are writing packages themselves. Sometimes the package documentation will tell you about variables that are not exported into the main namespace. In this sample case, $File::Find::dont_use_nlink is such a variable.

Note the way namespace qualifiers are used: the leading type designator character (if there is one) comes first, followed by the package name (and subnames, separated by '::'), followed by '::' and then the variable or function name. This looks really ugly, but (as with most of Perl) the idea was to implement a powerful tool in a way that would make it usable when needed.

Building perl packages is a subject for an entire class all on its own, and it's discussed in some detail in chapter 5 of Programming Perl and in a chapter in Advanced Perl Programming as well. Between them you can figure most of it out, but be prepared to spend some time at it.

  

Built In Functions

  There are many many built in functions in Perl. These functions are discussed in chapter 3 of Programming Perl. Below is a list of the functions I have found most useful, along with a one or two line description of what the function does. See the books for full details on how things work in depth.
  • binmode -- used to change a file handle into binary read mode, rather than text mode. Useless on UNIX systems, but required on Windows systems to avoid translating \r\n into just \n.
  • chdir -- change the current working directory.
  • chmod -- change permissions on a file. This does not map to Windows NT permissions, but in the win32 port there is an entirely new interface created for handling those.
  • chomp & chop -- remove trailing characters from a string
    • chomp removes newlines (only!) from the end of a scalar string.
    • chop is the earlier version that removes the last character from a scalar string, regardless of what it is.
    Use chomp unless you know what you're doing.
  • defined -- tests variables for whether or not the are defined. Note that a variable may exist and still contain the undefined value, in which case this function returns true.
  • delete -- remove an item from a hash.
  • die -- exit the perl program abnormally, usually with a diagnostic message.
  • each, keys, & values -- hash manipulation functions.
    • each is used in while loops to iterate over each key/value pair in a hash
    • keys returns a list containing all of the keys from a hash
    • values returns a list containing all of the values from a hash
  • eval -- process a string as perl code and execute it.
  • exists -- tests to see if a particular key exists within a hash.
  • exit -- exit the Perl program; takes an exit value to return to the OS.
  • fork & exec -- UNIX style process execution routines. For Windows, see the win32::Process call.
  • localtime -- returns all of the components of a time (hour, minute, second, weekday, year, etc.) as a series of values.
  • grep -- iterates over an array, executing a block of code or an expression against each element. Returns a list containing all elements for which the expression or block evaluated to true. That's unreadable:
    	@foo = ( "a", "aba", "c", "bd" );
    	@bar = grep( /b/, @foo );
    	foreach $i ( @bar ) { print $i, "\n"; }
    will print:
    	aba
    	bd
  • join -- concatenate the elements of a list together, inserting a provided string between them. An efficient way to build a string from a list.
  • lc, uc, length -- string manipulation functions
    • lc -- return a lowercase version of a string
    • uc -- return an uppercase version of a string
    • length -- return the length of a string
  • m// -- pattern match operator. (See above for more details.)
  • map -- similar to grep; iterates over an array, executing a block of code or an expression against each element. Returns a list containing the results of each evaluation. Truthfully, thinking about map() makes my head spin, so I don't use it.
  • mkdir -- create a new directory
  • my, local -- declare local variables. (See above for more details.)
  • open, close -- open files for reading or writing, close them when you're done.
  • opendir, readdir, rewinddir seekdir, telldir, & closedir -- directory manipulation functions. These functions are used to read directory contents. Using them you can open a directory as if it is a file, read the entries from the directory, and close the directory.
  • push, pop, shift, unshift, splice -- list manipulation functions.
    • push -- add one or more items to the end of a list
    • pop -- remove an item from the end of a list -- return it
    • shift -- remove an item from the front of a list -- return it
    • unshift -- add one or more items to the front of a list
    • splice -- general list manipulator; can add or delete items at any position within a list
    Using push and pop, you can easily implement a stack. Using shift and pop, you can implement a FIFO.
  • print, printf -- print data to a file handle
  • read, write, seek, tell -- binary file manipulation functions.
    • read -- read data from a file
    • write -- write data to a file
    • seek -- seek to a particular location in a file
    • tell -- find out where you currently are within a file
  • rename -- rename a file; on most systems this will not move a file between file systems or disk drives.
  • return -- exit a function immediately, return a value if you want to.
  • rmdir -- delete a directory
  • s/// -- search and replace function. (See above for more details.)
  • scalar -- force evaluation of an expression into scalar context. Particularly useful when you want to know how many elements an array contains: $len = scalar( @foo );
  • select -- change the default file handle for output by print and printf.
  • sleep -- stop working for some number of seconds.
  • sort -- sort a list; takes a user provided function to do the comparison.
  • split -- split a scalar into a list; uses a pattern to determine where to split the string. VERY POWERFUL.
  • sprintf -- uses the printf like formatting controls to format variable contents into a string, which it returns.
  • stat -- get information about a file or other file system entry so that it may be examined by the script.
  • system -- start a program and wait for it to finish.
  • time -- get the current time. The value returned is usually the number of seconds since Jan. 1, 1970, GMT. If you are using Perl on a Mac, your answer might be different though.
  • tr/// or y/// -- translate characters from those in one set to those in another.
    Example: $foo =~ tr/A-Z/a-z/;
  • umask -- change the way permissions are set on files that are created by your Perl script.
  • unlink -- delete a file.
  • use, require -- import a perl library into your script so you can call the functions it contains.
  • wait -- wait for a process (identified by its PID) to finish.
  • wantarray -- determine if the caller of a function wants a scalar or an array returned to them.
  

Library Functions

  Perl has many functions that are not built into the interpreter, but instead are written into the standard library that ships with the Perl source code. These functions are written in perl and accessed via use or require directives. Chapter 7 of Programming Perl lists the standard library functions that come with a normal installation of Perl. There are a lot of them, so be prepared to wade through a fair bit of text to find what you want. These have proven most useful to me.
  • English -- causes many of the special variables in Perl to have alternate (and rememberable) names. Accessed via "use English;" I actually don't use this all that often, and I think that $OSNAME (aka $^O) is broken on Win32 Perl, but sometimes this stuff is useful.

  • Getopt -- imports a set of routines to handle option processing, so your script can be invoked with options like:
        foo -xyz -d bar
    There are several flavors, but my personal favorite is accessed via "use Getopt::Std;" and the function I call is getopts(). See Programming Perl pp 452 for more details.

  • Cwd -- imports routines for figuring out the current working directory. (Why this isn't a built in function when there mkdir and chdir are, I don't know.) Accessed via "use Cwd;". Based on my reading of the contents of the Cwd module, you should probably always get the current working directory with a call like this:
        $curdir = cwd();
    There are other routines provided in the Cwd module, but they aren't worth the risk.

  • Find -- imports a tool that traverses directory trees and allows you to search for files or build lists of files that you might want to process. Accessed via "use File::Find;". While this module has been very useful to me, there are some caveats about this code that you must know. First, you should always set the variable $File::Find::dont_use_nlink to a non-zero value if you are using find. This allows it to work on PCs, and on CD-ROMs, where it otherwise fails. Second, despite the text in the book, $File::Find::name and $File::Find::dir do not contain the described contents when finddepth() is called. (In fact, I'm not sure what they do contain when finddepth() is called.) Third, this code was written to support find2pl -- an external tool to convert usage of the UNIX find(1) command into Perl scripts. While it does that, the interface presented in find() and finddepth() could be better. Use with caution, but do use it when needed.

Again, there are many items in the standard Perl library. Review the first few pages of chapter 7 in Programming Perl for an overview of them all. Be aware, however, that chapter 7 is the worst organized part of Programming Perl and as a result it can be hard to find things in there. It can be done, but it takes time.
  

CGI programming

  CGI programming is not directly Perl related, but so much CGI programming is done in Perl that it pays to have a brief overview of it here. This discussion is not intended to be a full introduction to CGI programming. For that, I suggest the following URL: http://hoohoo.ncsa.uiuc.edu/cgi/ (at least until they finally tear it down). An infinitely large number of books on HTML will also teach you about CGI too. I own and at least somewhat like HTML : The Complete Reference by Thomas A. Powell (no relation of mine) published by McGraw Hill.

A CGI script is invoked on a web server computer in response to a user's action on a WWW page -- something like a mouse click on the OK button in an HTML form. The HTTP protocol specifies the way in which the data from the form is encoded (we'll discuss that when we review the Perl script doing the decoding) and the method by which the data is transmitted to the CGI script itself. There are two transmission methods, controlled by the "METHOD" attribute of the "FORM" HTML statement. They are:

  • METHOD="GET" -- this appends the data on the end of the submitting URL. This is ugly and limited to small amounts of data.
  • METHOD="POST" -- sends the data separately as a file. Nicer to look at and no real limit on data size.
The Perl CGI script inherits some environment variables that tell it how the data was transmitted. Then it splits the data into name/value pairs and puts them all into a hash. Once the data is in the hash, the CGI script checks the input for errors and does the proper thing with the data. It also generates an HTML page in response to tell the user that something actually happened.

The class materials include an HTML page that implements a simple form and a Perl CGI script that processes the contents. No links are provided here because I (honestly) don't have time to do that yet. Paper copies will be available in class and we'll discuss them there.