|
|
Introduction | |||||||||||||||||||||||||||||||||||||
|
This document is intended to provide an overview of the major features of the
Perl language to those who haven't seen it before. What is important? Well,
that inevitably is a matter of opinion. Here you get what I think is
important, and not what someone else thinks. I base my opinions on
several years of writing scripts in the language, but... TMTOWTDI...
There's More Than One Way To Do It That's pronounced "Tim-Toady", and it means that while I can show you how I use the language, other people will use it very differently. You will see Perl code written by others that looks nothing like what you'll see discussed here. It's not wrong, it's just different. I am always interested in comments on this material, and feedback on how it helped or hindered you in your efforts. You can contact me here. | |||||||||||||||||||||||||||||||||||||
Table Of Contents | |||||||||||||||||||||||||||||||||||||
All of the links below are internal to this single document. You can
use this document on-line as a reference, and you may print it in one
fell swoop.
Things covered in the class...
| |||||||||||||||||||||||||||||||||||||
What is Perl? | |||||||||||||||||||||||||||||||||||||
|
Perl is a free programming language -- largely platform independent by
virtue of its interpreted nature -- that was developed by Larry Wall
originally, and is now developed by a cluster of people "out there" on
the 'net (including Larry).
Perl is interpreted in a manner somewhat similar to the way Java is interpreted. Both languages have a compiler, though Java's runs separately from the interpreter while Perl' is built into the interpreter. (This gives Perl the ability to interpret new code at runtime, but does so at the expense of compiling the application each time it is started.) Both interpreters operate on a byte-code, rather than on the original source of the program, so they are much faster than shell scripts or batch files. There is no "Perl Virtual Machine" which is a difference from Java. Perl's language constructs are built for true general purpose use, but (in my opinion) its real strengths are in string manipulation and system interface work. Perl has the ability to handle single strings many megabytes in size, lists of data items, hashes (highly similar to associative memory), and several other data types. It has interfaces to most every UNIX system call there is, and can easily do manipulations on files, file names, and file contents. In addition, Perl combines internally many of the most useful UNIX utilities and puts them in the hands of the script writer. You can do pattern matching searches across string data (like grep), split strings into fields (like awk/nawk), perform text substitutions (like sed) and many related things. It's a very powerful package. If you are an OO lover, Perl 5 even provides an OO programming scheme as well, though it's much less restrictive in its approach to OO than Java or C++. In the "Real World" Perl is the language of choice for implementing CGI scripts on WWW server systems. It's string handling facilities make pulling data out of forms extremely easy and it is powerful enough to do most anything that needs to get done on the WWW server side of things. Perl is also used heavily by system administrators to automate various tasks. It is much more efficient than shell scripts or batch files, and much more powerful too, making complex system maintenance tasks much easier to automate. | |||||||||||||||||||||||||||||||||||||
Perl History | |||||||||||||||||||||||||||||||||||||
| Perl has an interesting history, and Larry documents much of it in Programming Perl. Perl grew out of an effort to build a bi-coastal configuration management system, and has spawned all kinds of interesting oddities, like Perl Poetry (also documented in his book) and more then one obfuscated code contest. (http://www.tpj.com/tpj/contest) | |||||||||||||||||||||||||||||||||||||
Where to get Perl and Perl modules | |||||||||||||||||||||||||||||||||||||
|
As a general rule, if you want Perl source, binaries, modules, docs, advice,
or most anything else perlish, start at:
http://www.perl.com/. You'll find
most everything there, or a link to it.
General information on downloading Perl source code and/or compiled versions can be found at http://www.perl.com/pace/pub/perldocs/latest.html. UNIXPerl can be downloaded from http://www.perl.com/pace/pub/perldocs/latest.html in source form and built for most any UNIX platform. Binary versions of Perl for UNIX platforms can be found at the CPAN (Comprehensive Perl Archive Network) at http://www.perl.com/CPAN/README.html or at any of the various binary archive sites on the net for whatever platform you're looking for. Alta Vista can probably help you find such sites for various platforms.If you install a prebuilt binary, the installation mechanism is up to whoever packaged the binary for you. Follow whatever directions you get with the binary you download. Installing the Perl source and building it yourself is an educational experience. You'll learn all about UNIX (in)compatibility and what kinds of hoops programmers have to jump through to make one C program compile on many system types. There is a README file in the Perl source distribution that provides instructions on building Perl for your system and installing it. Follow those instructions. Windows-NTIf you want to run Perl on an NT platform, you need to download it from a different location: http://www.activestate.com/. ActiveState ports Perl to the Windows environment and keeps it up to date. Their NT distribution is nominally binary only, though source is available from ActiveState as well. The thing you want to download is called ActivePerl and it is at build 509 as of this writing. (That's Perl version 5.xxx, renamed to protect the innocent.)Installation of the ActiveState Perl package is very simple. There is an InstallShield tool (setup.exe) that you run, answer 2 or 3 questions, and wait while it puts the stuff on your disk. IMPORTANT NOTE: You will find some other ports of Perl to the Windows environment. There is one in the Microsoft Windows NT Resource Kit, and another in the MKS tool kit. DO NOT USE THESE PORTS! They are bug ridden and will cause you much grief! Immediately after installing either MKS or the NT Resource Kit delete their version of Perl (and all associated libraries) and download the latest version from ActiveState. If you leave the old ports and libraries around on your machine, you will wind up with systems that use incompatible libraries or fail at unpredictable times. You have been warned. What you'll installWhen you install Perl on your system, you will install many files. This section does not provide a complete list, but you can always dig around in the /usr/local directory (on a UNIX system) or in the installation directory (on an NT system) to see what got installed out there for you.There are 2 main directories of stuff that get installed:
| |||||||||||||||||||||||||||||||||||||
Where to get Help with Perl | |||||||||||||||||||||||||||||||||||||
|
As mentioned above, if you want Perl source, binaries, modules, docs, advice,
or most anything else perlish, start at:
http://www.perl.com/. You'll find
most everything there, or a link to it.
There are usenet news groups if you have access to them. comp.lang.perl might be useful. There is at least one highly regarded magazine: The Perl Journal Classes on Perl are taught by all of the University Extensions and many private training companies. Finally, there are about 6.023 * 10^23 books on Perl available via Computer Literacy or via Amazon.com. The next section mentions the ones I know about. | |||||||||||||||||||||||||||||||||||||
Books About Perl | |||||||||||||||||||||||||||||||||||||
|
There is only one comprehensive book on Perl that I can honestly
recommend, and that's because it predates all of the others and so I
haven't used anything else. That's OK, since the book in question is
pretty darn good. It is:
Programming Perl by Larry Wall, Tom Christiansen, and Randal L. Schwartz. Published by O'Reilly & Associates. This is called "the camel book" because of the large camel on the cover. (That also explains all of the camels on and around the www.perl.com web site.) This is a very good book, funny at times, and it has most everything an average Perl programmer will need to know. The first edition of this book is only valuable now as collector's item, so don't bother with it. Advanced Perl Programming by Sriram Srinivasan (also published by O'Reilly & Associates) is a useful book in certain more technical areas, but it is not a general purpose reference. I have used a couple of sections in here -- particularly Chapter 6 on Modules -- to enhance what I learned from Programming Perl. Those with less programming background might want to try Learning Perl by Randal L. Schwartz and Tom Christiansen (yet another fine product of O'Reilly & Associates). I have an old copy of the first edition, and for me it was useful for about two weeks, but then again, I'd been doing systems programming for 10 years when I got the book. As with Programming Perl, make sure you get the 2nd Edition. Finally, here is a book to avoid: Learning Perl on Win32 Systems by Randal L. Schwartz, Erik Olson, and Tom Christiansen. Yes, this is another O'Reilly & Associates book, but it's not nearly up to their usual standards. It doesn't provide much useful information at all, so avoid it and use any of the above Perl books along with the HTML docs in the ActiveState Perl port. Unpaid plug: if you need one of these books, and want to pay for it yourself, order it through Amazon.com. I have found Amazon to be a really pleasant place to do business, and their selection is great. Plus they give you discounted prices, no sales tax in CA, and shipping charges that are pretty reasonable. | |||||||||||||||||||||||||||||||||||||
Perl Philosophy | |||||||||||||||||||||||||||||||||||||
|
Perl philosophy might sound like an odd topic in a language course, but
it is actually important to cover, at least briefly. That's because Perl's
behavior grows from its underlying design and implementation philosophy,
so if you know a bit about it, it may help you to anticipate what Perl will
do in some cases.
First of all, "Perl" stands for one of two things:
On notable thing about Perl is that there is almost always a reasonable default behavior for functions, operators, and everything else in the language. As you read through the list of functions supported by Perl you will learn that most of these defaults either do nothing or affect the magic variable $_. It takes some getting used to if you want to count on these default behaviors, but they can help you get things done quickly when you need to. Another important aspect of Perl is the huge number of ways available to perform many operations. Perl strives to provide many basic language services that you can hook together in lots of different ways. But that tactic leads to the ability to do the same job in many different ways. You'll see more of this as you grow more familiar with the language itself. | |||||||||||||||||||||||||||||||||||||
Hello World | |||||||||||||||||||||||||||||||||||||
|
Enough introduction. Here is the traditional first program in most any
language:
Not much there, eh? This program prints "Hello, World!" to your screen and exits. C and C++ programmers will note the "\n" (newline) in there which causes the cursor to drop down one row and to go back to the left column. The semicolon (;) ends the statement. Here's a slightly different version of the same program:
This is basically the same, but now we have parenthesis around the arguments to the print function, and we can see that print takes a variable number of arguments -- since there is no argument count and we're passing it two arguments in this example. This program does exactly the same thing the previous one did. Now that you've seen the simplest Perl programs, you need to see how to get them to run on your system. | |||||||||||||||||||||||||||||||||||||
Starting a Perl Program on UNIX | |||||||||||||||||||||||||||||||||||||
|
Starting a perl program can be done in many ways. You'll want /usr/local/bin
(or its equivalent) on your PATH variable so that you have access to
the perl executable program.
If your program is short, you can type it on the command line like this:
The -c option takes a string argument that is perl code to execute. That works for really short programs, but anything of any size will quickly get difficult, and quoting complex shell scripts run this way is a significant challenge. Instead, you can put your perl script into a file and give the file name to the perl interpreter. (Many, but far from all perl script names end in ".pl". This is only a convention, not a requirement.) Running a script this way looks like this:
That is still not very nice, however. The user has to type "perl " and put the ".pl" on the end, when all they really wanted to do was run "hello". You can fix that by using the UNIX #! (pronounced "pound bang", "hash bang" or "she-bang") line. Modify your perl script file to look like this:
The "#!" line must be the very first line in the file, and the "#!" characters must be the first two characters on that line. Once you've modified the file contents, change its name to just "hello", and set the permissions to allow it to be executed. These UNIX commands will probably do what you need:
After making those changes, you can run the perl script by typing just:
This assumes one of the following:
The #! line is far and away the most common way to start perl scripts in my experience. | |||||||||||||||||||||||||||||||||||||
Starting a Perl Program on NT | |||||||||||||||||||||||||||||||||||||
|
As with UNIX, starting a perl program on Windows NT can be done in many
ways. You'll want the bin directory associated with your Perl installation
on your PATH variable so that you have access to the perl executable
program. If you used the defaults on the installation from ActiveState,
this should already be the case. On my work system, the Perl executable is
found at:
As with UNIX, if your program is short, you can type it on the command line like this:
The -c option takes a string argument that is perl code to execute. That works for really short programs, but anything of any size will quickly get difficult, and quoting complex shell scripts run this way is almost impossible in the Windows NT command line (cmd.exe). Instead, you should put your perl script into a file and give the file name to the perl interpreter. (Many, but far from all perl script names end in ".pl". This is only a convention, not a requirement.) Running a script this way looks like this:
That is still not very nice, however. The user has to type "perl " and put the ".pl" on the end, when all they really wanted to do was run "hello". There are at least three separate ways to fix this on Windows systems:
| |||||||||||||||||||||||||||||||||||||
Comments | |||||||||||||||||||||||||||||||||||||
Perl comments begin with a pound sign (#). They may appear anywhere in
the program that whitespace might appear.
# this is a comment
# another comment
$foo = "bar"; # still another comment
# $foo = "baz"; # this code never runs
Note that Perl has no preprocessor and thus no equivalent to conditional
compilation. That means that commenting out code at compile time requires
putting pound signs in front of every line. However, if all you are
trying to do is avoid running certain code, you can use a runtime trick
to accomplish a similar thing:
... code to run ...
if ( 0 ) # this is never true
{
... code to skip ... # This code still must compile...
# gibberish will cause compile errors.
}
... more code to run ...
| |||||||||||||||||||||||||||||||||||||
Variables and Data Types | |||||||||||||||||||||||||||||||||||||
|
Perl has 6 basic data types.
Now that you've seen the data types, you can see some code that uses them to do some simple assignments and function calls.
# Assigning numeric values to scalar variables...
$foo = 123; # assign an integer to $foo
$foo = 123.45; # assign a decimal number to $foo
$foo = 6.023E23; # assign a decimal number using
# scientific notation
$foo = 0xFF734; # assign hex integer
$foo = 0765; # assign octal integer (note leading 0)
$foo = 1_234_567; # use underscores for "legibility"
# Assigning string values to scalar variables...
$foo = "abc\n"; # a simple string
$foo = 'abc'; # another simple string
One difference between "xxx" and 'xxx' has to do with what kind of
special characters can be put into the string. When using "xxx" strings,
there are many special escape sequences that can be embedded in a string.
Things like:
Double quoted strings also allow for variable interpolation, in which the contents of a scalar or list variable are substituted into the string where the variable name appears inside the double quotes. Example: $foo = "bar"; # set scalar $foo print "foo = $foo\n"; # prints "foo = bar" on a lineInterpolation does not happen inside strings surrounded by single quotes. Another note: if you want to put an @ in a string and do not want it interpolated, you need to escape it with a backslash. @bar = ( "but", "not", "here" ); print "want \@ here @bar\n";This prints " want @ here but not here". Without the
backslash in front of the first @ character, you get a compile error.
Back to the sample variable assignments:
# Assigning items into an array...
@foo = (); # empty the array
$foo[0] = "abc"; # set the first element of the array to "abc"
$foo[1] = 123; # set the 2nd element of the array to 123
@foo = ( "cde", 321 ); # set first element of @foo to "cde" and
# the 2nd element to 321, overwriting
# previous values
$foo[100] = 'xyz"; # set the 101st element of the array to "xyz".
# unset elements 2 - 99 have the undefined
# value (undef)
# Assigning items into a hash...
%heights = (); # empty a hash
%heights = ( # initialize the heights hash
"Jim" => 78, # Jim is 6' 6"
"John" => 48, # John is 4'
};
%heights = ( # exact same initialization
"Jim", 78, "John", 48
);
$heights{"Jim"} = 76; # Jim got 2" shorter
$heights{"John"} = 60; # John got 1' taller
# print everyone's current height
# "foreach" and "keys" will be explained later
foreach $who ( keys( %heights ))
{
print "$who is $heights{$who} inches tall\n"
}
By now you're probably wondering why sometimes there are @ or % signs in
front of array or hash variable names, and other times there are $ signs.
The answer is that it all depends on what you are actually manipulating.
When you are dealing with an entire array or hash, or with a subset
(called a slice) containing more than one item from an array or hash, then
you use the @ or % sign. If you are dealing with a single item, it's
just as if it was a scalar variable, so you use the $ sign. That may
take some getting used to, but it will make sense later, particularly
when you see how references work.
Back to the variable assignments... file handles this time: # examples using file handles open( FOO, "<in.txt" ); # open file 'in.txt' for reading # use file handle FOO open( BAR, ">out.txt" ); # open file 'out.txt' for writing # use file handle BAR # overwrite previous contents open( BAR, ">>out2.txt" ); # open file 'out2.txt' for writing # use file handle BAR again, implicitly # closing the previous file # append to previous contents $line = <FOO>; # read a line from in.txt into $line print( BAR $line ); # print $line out to out2.txt # note the lack of a comma! | |||||||||||||||||||||||||||||||||||||
Operators | |||||||||||||||||||||||||||||||||||||
|
Perl has many operators. Here is a list of some of them. See
Programming Perl pp 76 and 85 for a full list of
supported operators.
Perl's precedence rules are similar to C's, but there are many more operators than there are in C, so the precedence table is fairly complex. Again, see See Programming Perl pp 76 for a full discussion of all of Perl's operators, including the precedence rules. | |||||||||||||||||||||||||||||||||||||
Defining Truth | |||||||||||||||||||||||||||||||||||||
|
Many of Perl's operators and language functions need to operate on or return
a value that is either "true" or "false". In Perl, any value that can be put into a scalar can be interpreted as true or false by the following rules:
See Programming Perl pp 20 for a full discussion of the definition of truth. | |||||||||||||||||||||||||||||||||||||
Control Flow Statements | |||||||||||||||||||||||||||||||||||||
|
Perl has the usual flow control statements, along with some
more unusual variants.
| |||||||||||||||||||||||||||||||||||||
Regular Expressions and Pattern Matching | |||||||||||||||||||||||||||||||||||||
|
Regular expressions are one of Perl's most powerful features,
and they require the most extensive documentation. We'll cover
only the basics here, see Programming Perl pp 57 - 76
for a full discussion of all of regular expressions in Perl.
Pattern matching is a technique for finding carefully defined substrings within a larger string, and optionally removing, replacing, or saving them for later use elsewhere. There are a huge number of ways that pattern matching can be used, but they all use the same basic facilities underneath. Let's start with a simple example:
$a = "this is a test of the emergency broadcast system";
if ( $a =~ /test/ )
{
print "test found!\n";
}
else
{
print "test not found!\n";
}
Let's look that over in detail:
The =~ operator causes the matching operation to work against $a, instead of against $_. The search pattern here is /test/. This is a simplification of "m/test/" -- the leading 'm' is not needed because we're using slashes as our delimiters. If we wanted to use another delimiter, we'd need the m, as in "m,test," which is the same as '/test/' or 'm/test/'.
What is happening here: each "thing" inside the search pattern
is being compared with the contents of the string, to see if it
is found. In this case, the 't' from from '/test/' is compared
with the first character of the string, and a match is found.
Then we compare the 'e' from '/test/' with the 'h' from the second
character of the string, and it doesn't match, so we drop back to
the 't' and keep looking. We fail to the match 't' 9 more times (each
character in ' As a result, this code will print "test found!". Comparing characters is pretty useful, but only the tip of the iceberg in pattern matching. Many "things" other than characters can appear in your search patterns. Here are some of the most common:
There are many additional pattern matching items -- entire books are written about pattern matching, so don't expect a full summary here. However, those are usually the most useful. More examples, using these new "things" in the search pattern:
$foo =~ /^A/ # true if $foo starts with 'A'
$foo =~ /^Z.*q+.*a$/ # true if $foo begins with 'Z',
# and ends in 'a', and has at least one
# 'q' in the middle somewhere
# "Zqa" and "Z---qqq-a" both match.
# here's a more complex example:
$foo =~ /(\d\d):(\d\d)\s*(AM|PM)/i;
$hour = $1 - 1;
$minute = $2;
if ( $3 =~ /PM/i ) { $hour += 12; }
That last example needs some explaining.
First, remember that parentheses do two things: they cause Perl to remember whatever was match for later use, and they delimit sub-parts of the regular expression. We're doing both in this case.
That is a modifier -- it tells Perl that all case comparisons in the pattern match are to ignore case. In this case, that means that (AM|PM) will match any of the following: "AM", "Am", "aM", "am", "PM", "Pm", "pM", or "pm". using the 'i' modifier is a handy shortcut to avoid really complex patterns. There are other modifiers, some of which we will discuss shortly. If you cannot wait, see Programming Perl pp 69 for a full list. So what is going on here? Easy. Assuming $foo contains a time, a 2 digit hour is stored into $hour, a 2 digit minute is stored into $minute, and the presence of PM is used to complete the storing of the time in 24 hour format. And what if $foo doesn't contain a time, or it contains a time that is formatted differently? Then this code is dangerous. $1, $2, and $3 should be undefined, so you won't get a time, but you will get a mess. As a result this code would be better written:
if ( $foo =~ /(\d\d?):(\d\d)\s*(AM|PM)?/i )
{
$hour = $1;
$minute = $2;
# check for 24 hour time format -- avoid
# conversion if we are already in 24 hour time
if (( defined( $3 ) and ( $3 =~ /^(A|P)M$/i ))
{
# we have a 12-hour time; convert to 24 hour
$hour -= 1;
if ( $3 =~ /PM/i )
{
$hour += 12;
}
}
}
else
{
print( "Time not found or format not",
"understood\n" );
}
This code is cleaner. There are still ways to improve it,
but at least we announce when we find something we don't
expect, and we handl 12 hour clock formats if we find them.
The next thing beyond a pattern match is a search and replace. Here is an example: $foo = "original string, just to string you along"; $foo =~ s/string/bring/; print( $foo, "\n" );So what happens here? The 's///' code is the search and replace function supplied by perl. As with 'm//' you can use separators other than slashes (which is particularly useful when searching or replacing in UNIX path strings) so 's,xxx,yyy,' is legal and the commas would be the separators in this case. The sample code is doing regular expression matching; the first pattern is the thing to match ('string' in this case) and the second pattern is the thing to replace it with ('bring'). So you might think that it would print: original bring, just to bring you alongbut that is not correct. Instead, it will print: original bring, just to string you alongThat may seem wrong, but it turns out to be very powerful. Perl's search and replace function grew out of the UNIX tools that did similar things -- particularly vi -- and those tools always start by assuming they are supposed to replace only the first occurrence of the pattern on a particular line. If you want to replace them all, you need to do a global search, which requires adding the 'g' modifier, as follows: $foo =~ s/string/bring/g;Now all the places that say "string" will be replaced with "bring". Search patterns can use all of the features discussed in the pattern matching section. So: $foo = "aa bb cc this is a test aa bb cc"; $foo =~ s/a+\s*b+\s*c+/abc/g; $foo =~ s/^(.*)(t\S*s)(.*)(is)(.*)$/$1$4$3$2$5/; print( $foo );prints out the following: abc is this a test abcIf you want to search for special characters in your string, you'd escape them with backslashes: $foo = "/usr/local/bin/perl"; $foo =~ s/\//\\/g; print( $foo );prints: \usr\local\bin\perlThe first backslash precedes a slash -- so that it becomes a character you are searching for, and not the end of the search string. The second backslash precedes a third backslash -- so that a literal backslash is used in the replacement pattern. I know that's convoluted, but anyone trying to write code that will run on both Windows and UNIX will appreciate it. Use a different separator (like comma -- ',') to avoid all the extra backslashes in front of regular slashes, but you'll still need one in front of the backslash, since that's the escape character itself. We have barely scratched the surface of regular expressions and pattern matching. There are many more intricate details including additional special matching characters, repeat counts, controlling the "greediness" of the search, and more. Again, please read Programming Perl pp 57 -76 for all the details. In addition, though I have never read it myself, O'Reilly & Associates publishes a book titled Mastering Regular Expressions by Jeffrey E. F. Friedl. I know nothing about it, except that Programming Perl recommends it. If anyone wants to give me a review or loan me their copy at some point, I'd appreciate it. | |||||||||||||||||||||||||||||||||||||
Quoting | |||||||||||||||||||||||||||||||||||||
Perl supports several different kinds of quotes. These should look somewhat
familiar to shell programmers.
| |||||||||||||||||||||||||||||||||||||
I/O | |||||||||||||||||||||||||||||||||||||
All programs need to read and write data. Here are the basic things
Perl provides to do those tasks.
| |||||||||||||||||||||||||||||||||||||
Special Variables | |||||||||||||||||||||||||||||||||||||
|
Perl has many special variables for controlling certain aspects of the
runtime environment and various behaviors. See Programming
Perl pp 127 for a full list. Here we touch on only a few
that are useful at times.
Most of these variables have alternate (long, English-like) names that become available with the "use English;" pragma in Perl. However, in my experience, most scripts still use the short names. | |||||||||||||||||||||||||||||||||||||
Writing Functions | |||||||||||||||||||||||||||||||||||||
As with any programming language, you will want to create functions to
perform specific bits of work. Here's a sample of how to do that.
$result = func( 'a', 'b');
print $result, "\n";
sub func
{
my( $arg1, $arg2 ) = @_;
# your function's perl code goes here
$foo = "$arg1 xxx $arg2";
return $foo;
}
which produces the following output when it runs:
a xxx bAs you can see, you declare a function with the sub keyword. (Actually, sub is a built in function in perl, but never mind.) Functions may return any of the basic data types, including lists, hashes, and scalars. This example returns one scalar. You call a function by using its name and a series of arguments inside parentheses. If there are no arguments, use empty parentheses. (We have seen unusual Perl bugs if functions are called without parentheses.) Inside the definition of the perl function you can see that the arguments are present in the @_ array by default. (They are copied there, so changing them does not change the original variables in the caller -- until we start talking about references.) The my() call copies the arguments from the @_ array into local variables. Your code does whatever it wants. (Note that in the example, $foo is not a local variable... it is global.) When the function completes, it may return with an explicit return statement (as shown) or it may just fall off the end of the function. The return value is either the result of the last statement executed, or the value given to the return call. Returning more than one value is possible with Perl functions, since you may return a list. Example:
( $val1, $val2 ) = func( $p1, $p2, $p3, "abc" );
sub func
{
my( $arg1, $arg2, $arg3, $arg4 ) = @_
# some code building things from the arguments
return( "foo bar", "blather" );
}
When func returns, $val1 will contain "foo bar" and val2 will contain "blather".
Note that the function could have been called as:
@results = func( ... );in which case $results[0] would contain "foo bar" and $results[1] would contain "blather". Finally, it is possible to ignore an item returned by a function: ( undef, $wanted ) = func( ... );In this case, $wanted will contain "blather" but the first value returned will be discarded. A few other points:
| |||||||||||||||||||||||||||||||||||||
Local Variables: my() and local() | |||||||||||||||||||||||||||||||||||||
There are two ways of creating local variables with Perl. The old way using the
local() function, and the new way, using the my() function. In almost all
cases you want to use my().
An example might help:
$v1 = 1;
$v2 = 1;
f1();
sub f1
{
local( $v1 ) = 2;
my( $v2 ) = 2;
print( "func f1; v1: $v1\n" );
print( "func f1; v2: $v2\n" );
f2();
}
sub f2
{
print( "func f2; v1:$v1\n" );
print( "func f2; v2:$v2\n" );
}
The output of running this program is:
func f1; v1: 2 func f1; v2: 2 func f2; v1: 2 func f2; v2: 1So, as you can see, the new value set into $v1 in function f1() was visible inside function f2(), rather than the global value of $v1 set before either function was called. But $v2 behaved as you'd expect a local variable to behave. In summary, use my() to create local variables. If you think you need local(), think again, and again. There are certain really weird cases where it is useful now, but if you're hitting one of them, you're way beyond what this class can teach you. | |||||||||||||||||||||||||||||||||||||
Context -- Scalar or Array | |||||||||||||||||||||||||||||||||||||
Occasionally you may want to write a function that returns a scalar or an array,
depending on what the caller is expecting. Here's a simple example of a function
that reads an entire file of text, and returns either a scalar containing the
entire file as one long line, or an array of lines, depending on what the caller
wants:
sub ReadFile
{
my( $fname ) = @_; # local variable $fname
open( FH, "<$fname" ); # open file for reading
my( @lines ) = <FH>; # read file into local array
close( FH ); # close the file
if ( wantarray ) # if caller wants array
{
return @lines; # return an array
}
else
{
return( join( '', @lines )); # else return one long line
}
}
$data = ReadFile( 'datafile' ); # call in scalar context
@lines = ReadFile( 'datafile' ); # call in array context
After this code runs, $data contains all of the lines in 'datafile' strung
together as one long text string. @lines contains all of the same lines, but
they are separated into the elements of an array.
| |||||||||||||||||||||||||||||||||||||
References | |||||||||||||||||||||||||||||||||||||
|
References are data items that refer to other data items. In some ways
they resemble pointers as implemented in C and other languages, but they
are safer because they are cannot point to any arbitrary memory address,
cannot be used in arithmetic, and contain associated type information, so
that what kind of thing they point to is known. Most of those benefits
really only matter
in OO systems, and since this class doesn't cover the OO features of Perl
(at least not yet) they are not discussed here. However, there are a few
things about references that make them useful outside of OO code.
References are stored in scalar variables, and created using the backslash operator. $foo_ref = \@foo;This code makes $foo_ref contain a reference to the array named @foo. The curious can run this code: print "$foo_ref\n";and you'll get something like this out: ARRAY(0xca5d68)From this you can tell that this is a reference to an array, and the address of some perl internal data structure holding (or pointing to) the array is 0xCA5D86. (Note that the address is not useful to you... it's useful to Perl, however.) Once you have a reference it can then be passed to functions as a scalar, but those functions may dereference the contents by placing the original type designation character in front of the dollar sign. So: @bar = @$foo_refcopies the contents of the @foo array into the @bar array. This could happen in a function that doesn't have lexical scope to see the @foo array, so long as it can see (or was passed) the reference to the array ($foo_ref). Why is this useful at all? First, references allow call by reference instead of just call by value. This is very handy to avoid huge parameter passing overhead. So instead of doing this:
...
code to set @foo to contain 10,000 elements
...
bar( @foo ); # pass a copy of all 10,000
# elements to function bar
...
sub bar
{
my( @baz ) = @_; # copy all 10,000
# elements into @baz
...
foreach $i ( @baz )
{
code to do something based on array contents
}
}
You can do this instead:
...
code to set @foo to contain 10,000 elements
...
bar( \@foo ); # pass a reference to the array function bar
# a reference is a single scalar item
...
sub bar
{
my( $arraryref ) = @_; # copy reference into a
# local variable
...
foreach $i ( @$arrayref )
{
...
code to do something based on array contents
}
}
In the second case we're not copying an array of 10,000 things twice.
That can be a big efficiency gain at times.
Another thing references allow you to do is pass multiple lists to a single function. In perl 4 you couldn't do that, since the first list parameter in your my or local variable declaration would gobble up all of the arguments. Example:
@gl1 = ( "a", "b" );
@gl2 = ( "c", "d" );
f( @gl1, @gl2 ); # call f() and pass two lists
# won't work this way
sub f
{
my( @ll1, @ll2 ) = @_; # make local variables
# note:
# @ll1 now contains: ( "a", "b", "c", "d" )
# @ll2 is empty.
...
}
Using references, you can do this instead:
@gl1 = ( "a", "b" );
@gl2 = ( "c", "d" );
f( \@gl1, \@gl2 ); # call f() and pass two
# references to lists
sub f
{
my( $l1ref, $l2ref ) = @_; # make local variables
my( @a ) = @$l1ref; # copy @l1 into local
# variable @a
my( @b ) = @$l1ref; # copy @l2 into local
# variable @b
# note:
# @a now contains: ( "a", "b" )
# @b now contains: ( "c", "d" )
...
}
Now the local variables contain the same things as the original lists in
the calling code. Of course, you don't have to copy the contents of
the lists out of the references to use them. If you have a reference
to a list, you can use a foreach loop on it like this:
foreach $i ( @$list_ref ) ...You can have references to scalars (which are particularly useful for gaining efficiency when your scalars contain multi-megabyte strings), arrays, hashes, and functions. That's only the most simple uses for references. When this course is expanded to cover the OO portion of Perl, much more detail on references will appear here. | |||||||||||||||||||||||||||||||||||||
Packages | |||||||||||||||||||||||||||||||||||||
|
Other than the "use" statement, most of you are not likely to need to
know about packages, but knowing a little bit may help explain some things
you'll see from time to time.
A package is a separate namespace in Perl. Basically, when a perl script is compiled, it is put into the "main" namespace (or package). Thus, all of the variables and functions your script creates are in the main namespace too. When you use a "use" statement, however, things change. The functions and variables declared in the file that you refer to with the use statement are (usually) put into a separate namespace -- one determined by the "package" statement present in the package source file. For example, when you say: use File::Find;several functions and variables are created and put into the "File::Find" namespace. To make them easily callable or usable to your script, the Perl interpreter does some odd work to force certain selected names into the "main" namespace for you. When that work is done, you can call find() in one of two ways: find( ... ); # called via the main namespace File::Find::find( ... ); # qualified name in File::Find packageThe first is cleaner to read, but the 2nd actually tells you where the function find comes from. Use the first -- no one usually cares where the function comes from unless they are writing packages themselves. Sometimes the package documentation will tell you about variables that are not exported into the main namespace. In this sample case, $File::Find::dont_use_nlink is such a variable. Note the way namespace qualifiers are used: the leading type designator character (if there is one) comes first, followed by the package name (and subnames, separated by '::'), followed by '::' and then the variable or function name. This looks really ugly, but (as with most of Perl) the idea was to implement a powerful tool in a way that would make it usable when needed. Building perl packages is a subject for an entire class all on its own, and it's discussed in some detail in chapter 5 of Programming Perl and in a chapter in Advanced Perl Programming as well. Between them you can figure most of it out, but be prepared to spend some time at it. | |||||||||||||||||||||||||||||||||||||
Built In Functions | |||||||||||||||||||||||||||||||||||||
There are many many built in functions in Perl. These functions are
discussed in chapter 3 of Programming Perl. Below is a list
of the functions I have found most useful, along with a one or two line
description of what the function does. See the books for full details on
how things work in depth.
| |||||||||||||||||||||||||||||||||||||
Library Functions | |||||||||||||||||||||||||||||||||||||
Perl has many functions that are not built into the interpreter, but instead
are written into the standard library that ships with the Perl source code.
These functions are written in perl and accessed via use or require directives.
Chapter 7 of Programming Perl lists the standard library functions
that come with a normal installation of Perl. There are a lot of them, so be
prepared to wade through a fair bit of text to find what you want. These have
proven most useful to me.
| |||||||||||||||||||||||||||||||||||||
CGI programming | |||||||||||||||||||||||||||||||||||||
|
CGI programming is not directly Perl related, but so much CGI programming is
done
in Perl that it pays to have a brief overview of it here. This discussion is
not intended to be a full introduction to CGI programming. For that, I
suggest the following URL:
http://hoohoo.ncsa.uiuc.edu/cgi/
(at least until they finally tear it down). An infinitely large number of
books on HTML will also teach you about CGI too. I own and at least somewhat
like HTML : The Complete Reference by Thomas A. Powell (no relation
of mine) published by McGraw Hill.
A CGI script is invoked on a web server computer in response to a user's action on a WWW page -- something like a mouse click on the OK button in an HTML form. The HTTP protocol specifies the way in which the data from the form is encoded (we'll discuss that when we review the Perl script doing the decoding) and the method by which the data is transmitted to the CGI script itself. There are two transmission methods, controlled by the "METHOD" attribute of the "FORM" HTML statement. They are:
The class materials include an HTML page that implements a simple form and a Perl CGI script that processes the contents. No links are provided here because I (honestly) don't have time to do that yet. Paper copies will be available in class and we'll discuss them there. | |||||||||||||||||||||||||||||||||||||