This work (Netizen "Intermediate Perl" training module notes) is licensed under theOpen Publication License.
LICENSE
Terms and Conditions for Copying, Distributing, and Modifying
Items other than copying, distributing, and modifying the Content with which this license wasdistributed (such as using, etc.) are outside the scope of this license.
1. You may copy and distribute exact replicas of the OpenContent (OC) as you receive it, in anymedium, provided that you conspicuously and appropriately publish on each copy an appropriatecopyright notice and disclaimer of warranty; keep intact all the notices that refer to this License andto the absence of any warranty; and give any other recipients of the OC a copy of this License alongwith the OC. You may at your option charge a fee for the media and/or handling involved in creatinga unique copy of the OC for use offline, you may at your option offer instructional support for the OCin exchange for a fee, or you may at your option offer warranty in exchange for a fee. You may notcharge a fee for the OC itself. You may not charge a fee for the sole service of providing access toand/or use of the OC via a network (e.g. the Internet), whether it be via the world wide web, FTP, orany other method.
2. You may modify your copy or copies of the OpenContent or any portion of it, thus forming worksbased on the Content, and distribute such modifications or work under the terms of Section 1above, provided that you also meet all of these conditions:
a) You must cause the modified content to carry prominent notices stating that you changed it, theexact nature and content of the changes, and the date of any change.
b) You must cause any work that you distribute or publish, that in whole or in part contains or isderived from the OC or any part thereof, to be licensed as a whole at no charge to all third partiesunder the terms of this License, unless otherwise permitted under applicable Fair Use law.
These requirements apply to the modified work as a whole. If identifiable sections of that work arenot derived from the OC, and can be reasonably considered independent and separate works inthemselves, then this License, and its terms, do not apply to those sections when you distributethem as separate works. But when you distribute the same sections as part of a whole which is awork based on the OC, the distribution of the whole must be on the terms of this License, whosepermissions for other licensees extend to the entire whole, and thus to each and every partregardless of who wrote it. Exceptions are made to this requirement to release modified worksfree of charge under this license only in compliance with Fair Use law where applicable.
3. You are not required to accept this License, since you have not signed it. However, nothing elsegrants you permission to copy, distribute or modify the OC. These actions are prohibited by law ifyou do not accept this License. Therefore, by distributing or translating the OC, or by deriving worksherefrom, you indicate your acceptance of this License to do so, and all its terms and conditionsfor copying, distributing or translating the OC.
NO WARRANTY
4. BECAUSE THE OPENCONTENT (OC) IS LICENSED FREE OF CHARGE, THERE IS NOWARRANTY FOR THE OC, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHENOTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIESPROVIDE THE OC "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED ORIMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OFMERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK OF USEOF THE OC IS WITH YOU. SHOULD THE OC PROVE FAULTY, INACCURATE, OR OTHERWISEUNACCEPTABLE YOU ASSUME THE COST OF ALL NECESSARY REPAIR OR CORRECTION.
5. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILLANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MIRROR AND/OR REDISTRIBUTETHE OC AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANYGENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USEOR INABILITY TO USE THE OC, EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEENADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
Additionally:
6. If you offer training based upon this OpenContent, you must prominently display a notice stating whether or notyou are a Netizen Certified Training Organisation on the Open Contentitself and on any material advertising or publicising your training.
a) If you are a Netizen Certified Training Organisation, you must statethat you are a Netizen Certified Training Organisation and display theNetizen Certified Training Organisation logo. You must also provide aURL for more information, namely http://netizen.com.au/services/training/ncto/
b) If you are not a Netizen Certified Training Organisation, you muststate that you are not a Netizen Certified Training Organisation. Youmay not use the Netizen Certified Training Organisation logo. Youmust also provide a URL for more information, namely http://netizen.com.au/services/training/ncto/
- Table of Contents
- 1. Introduction
- Course outline
- Assumed knowledge
- Module objectives
- Platform and version details
- The course notes
- Other materials
- Logging into your account
- 2. File I/O
- In this chapter...
- Assumed knowledge
- Angle brackets - the line input and globbing operators
- Exercises
- open() and friends - the gory details
- Opening a file for reading, writing or appending
- Reading directories
- Opening files for simultaneous read/write
- Opening pipes
- Finding information about files
- Exercises
- Recursing down directories
- Exercises
- File locking
- Handling binary data
- Chapter summary
- 3. Advanced regular expressions
- In this section...
- Assumed knowledge
- Review exercises
- More metacharacters
- Working with multiline strings
- Exercises
- Regexp modifiers for multiline data
- Backreferences
- Special variables
- Exercises
- Advanced
- Section summary
- 4. More functions
- In this chapter...
- The grep() function
- Exercises
- The map() function
- Exercises
- Chapter summary
- 5. System interaction
- In this section...
- system() and exec()
- Exercises
- Using backticks
- Exercises
- Platform dependency issues
- Security considerations
- Exercises
- Section summary
- 6. References and complex data structures
- In this section...
- Assumed knowledge
- Introduction to references
- Uses for references
- Creating complex data structures
- Passing arrays and hashes to subroutines and functions
- Object oriented Perl
- Creating and dereferencing references
- Passing multiple arrays/hashes as arguments
- Complex data structures
- Anonymous data structures
- Exercises
- Section summary
- 7. Conclusion
- What you've learnt
- Where to now?
- Further reading
- Books
- Online
- A. Unix cheat sheet
- B. Editor cheat sheet
- vi
- Running
- Using
- Exiting
- Gotchas
- Help
- pico
- Running
- Using
- Exiting
- Gotchas
- Help
- joe
- Running
- Using
- Exiting
- Gotchas
- Help
- jed
- Running
- Using
- Exiting
- Gotchas
- Help
- C. ASCII Pronunciation Guide
Chapter 1. Introduction
Welcome to Netizen's Intermediate Perl training course.This is a one-day module in which we extend on the material covered inIntroduction to Perl and explore the topics of references,advanced regular expressions, and interacting with the operating system.
Course outlineRevise introduction to Perl material
File I/O
Line input and globbing operators
Opening files and directories
Opening pipes
Finding information about files
Recursing down directories
File locking
Handling binary data
Advanced regular expressions
Review of basic regexps
Multiline strings
Backreferences
More functions
System interaction
system() and exec()
Backticks
Interacting with the file system
Dealing with users, groups and permissions
Interacting with processes
Security considerations
References and complex data structures
Assumed knowledgeThis training module assumes the following prior knowledge and skills:
Basic Perl fluency, including a familiarity with Perl variable types, functions and operators, conditional constructs, and basic regularexpressions
Some Unix experience, including logging in, moving arounddirectories, and editing files
Module objectivesBe able to open files and directories to read and write data, using various techniques
Perform tests on files and directories
Open pipes to read or write data through anotherprogram
Use regular expressions to handle multilinedata
Use backreferences to create complex regularexpressions
Use and understand more complex Perl functions such as grep() and map()
Use Perl functions to call system commands
Use Perl to interact with the file system, users, and processes
Understand the security implications of running system commands from Perl, and how to increase security
Understand and use Perl references to create complex data structures and anonymous data structures
Platform and version detailsThis module is taught using Unix or a Unix-like operating system. Mostof what is learnt will work equally well on Windows NT or otheroperating systems; your instructor will inform you throughout the courseof any areas which differ.
All Netizen's Perl training courses use Perl 5, the most recent majorrelease of the Perl language. Perl 5 differs signficantly from previousversions of Perl, so you will need a Perl 5 interpreter to use what youhave learnt. However, older Perl programs should work fine under Perl5.
The course notesThese course notes contain material which will guide you through thetopics listed above, as well as appendices containing other usefulinformation.
The following typographical conventions are used in these notes:
System commands appear in this typeface
Literal text which you should type in to the command line or editorappears as monospaced font.
Keystrokes which you should type appear like this:ENTER. Combinations of keys appear like this:CTRL-D
Program listings and other literal listings of what appears on thescreen appear in a monospaced font like this.Parts of commands or other literal text which should be replaced by your own specific values appears like this
Note: Notes and tips appear offset from the text like this.
Advanced: Notes which are marked "Advanced" are for those who are racing ahead orwho already have some knowledge of the topic at hand. The informationcontained in these notes is not essential to your understanding of thetopic, but may be of interest to those who want to extend theirknowledge.
Readme: Notes marked with "Readme" are pointers to more information which can befound in your textbook or in online documentation such as manual pagesor websites.
Other materialsIn addition to these notes, you should have a copy of the requiredtext book for this course: Programming Perl (2nd edition), morecommonly referred to as "the Camel book". The Camel book will beused throughout the day, and will be a valuable reference to take homeand keep next to your computer.
You will also have received a floppy disk containing these notes inHTML form (with working links to external resources etc) and all theexample scripts and data used in this course.
Logging into your accountYour username and password will have been given to you with thesecourse notes.
Open the telnet program
Connect to the training server using the hostname or IPnumber given by your instructor
Login using the username and password you weregiven
You will find yourself at a Unix shell prompt. Hopefully (if you metthe pre-requisites of this course) you will now be able to see thatyour account has a subdirectory called exercises/ which are theexample scripts and exercises given in these course notes. If you'renot quite up to speed with Unix, there's a cheat-sheet in Appendix Aof these notes.
Chapter 2. File I/O
In this chapter...In this section, we learn how to open and interact with files anddirectories in various ways.
Assumed knowledgeYou should already have encountered the open() function and the<> line input operator in a previous Perl training sessionor in your previous Perl experience.
Angle brackets - the line input and globbing operatorsYou will have encountered the line input operator <> before, insituations such as these:
# reading lines from STDINwhile (<>) { ... ...}# reading a single line of user input from STDINmy $input = <STDIN>Readme: The line input operator is discussed in-depth on page 53 of theCamel. Read it now.
In scalar context, the line input operator yields the next line of the file referenced by the filehandle given.
In list context, the line input operator yields all remaining lines of the file referenced by the filehandle.
The default filehandle is STDIN, or any files listed on the command line of the Perl script (eg myscript.pl file1file2 file3).
The globbing operator is nearly, but not quite, identical to theline input operator. It looks the same, and it acts partly in a similarway, but it really is a separate operator.
Readme: The filename globbing operator is documented on page 55 of theCamel.
If the angle brackets have anything in them other than a filehandle ornothing, it will work as a globbing operator and whatever is between theangle brackets will be treated as a filename wildcard. For instance:
my @files = <*.txt>The filename glob *.txt is matched against files in the currentdirectory, then either they are returned as a list (in list context, asabove) or one scalar at a time (in scalar context).
If you get a list of files this way, you can then open them in turn andread from them.
while (<*.txt>) { open (FILEHANDLE, $_) || die ("Can't open $_: $!"); ... ... close FILEHANDLE;}The glob() function behaves in a very similar mannerto the angle bracket globbing operator.
my @files = glob("*.txt")foreach (glob "*.txt") { ...}The glob() is considered much cleaner and better touse than the angle-brackets globbing operator.
ExercisesUse the line input operator to accept input from the user then print it out
Modify your previous script to use a while loop to get user input repeatedly, until they type "Q" (or "q" - check out the lc() and uc() functions in chapter 3 of your Camel book) (Answer: exercises/answers/userinput.pl)
Use the file globbing function or operator to find all Perl scripts in your home directory and print out their names (assuming they are named in the form *.pl) (Answer: exercises/answers/findscripts.pl)
Advanced exercisesUse the above example of globbing to print out all the Perl scripts one after the other. You will need to use the open() function to read from each file in turn. (Answer: exercises/answers/printscripts.pl)
open() and friends - the gory detailsOpening a file for reading, writing or appendingThe open() function is used to open a file for reading or writing (or both, or as a pipe - more on that later).
Readme: The open() function is documented on pages 191-195 ofthe Camel book, and also in perldoc perlfunc. Read thedocumentation for open() before going any further.
In a typical situation, we might use open() to open and readfrom a file:
open(LOGFILE, "/var/log/httpd/access.log")Note that the < (less than) used to indicate reading isassumed; we could equally well have said"</var/log/httpd/access.log".
You should always check for failure of an open()statement:
open(LOGFILE, "/var/log/httpd/access.log") || die "Can't open /var/log/httpd/access.log: $!";Readme: $! is the special variable which contains the errormessage produced by the last system interaction. It is documented inchapter 2 of the Camel, on page 134.
Once a file is opened for reading or writing, we can use the filehandlewe specified (in this case LOGFILE) for a variety of usefulpurposes:
open(LOGFILE, "/var/log/httpd/access.log") || die "Can't open /var/log/httpd/access/log: $!";# use the filehandle in the in the <> line input operator...while (<LOGFILE>) { print if /netizen.com.au/;}close LOGFILE;# open a new logfile for appendingopen(SCRIPTLOG, ">>myscript.log") || die "Can't open myscript.log: $!";# print() takes an optional filehandle argument - defaults to STDOUTprint SCRIPTLOG "Opened logfile successfully.\n";close SCRIPTLOG;Note that you should always close a filehandle when you're finished with it (though admittedly any open filehandles will be automatically closedwhen your script exits).
Readme: You can also use sysopen() and friends to open a file in aC-like way. See page 229 of your Camel book for details orperldoc -f sysopen.
ExercisesWrite a script which opens a file for reading. Use awhile loop to print out each line of thefile.
Use the above script to open a Perl script. Use a regularexpression to print out only those lines not beginning with a hash character(i.e. non-comment lines). (Answer:exercises/answers/delcomments.pl)
Create a new script which opens a file for writing. Write outthe numbers 1 to 100 into this file. (Answer:exercises/answers/100count.pl)
Create a new script which opens a logfile for appending.Create a while loop which accepts input from STDIN andappends each line of input to the logfile. (Answer:exercises/answers/logfile.pl)
Create a script which opens two files, reads input from thefirst, and writes it out to the second. (Answer:exercises/answers/readwrite.pl)
Reading directoriesIt is also possible to open directories (using opendir()and read from them. However,it is not possible to read the contents of files in that directorysimply by opening it and looping through it. Opening a directory simply makes the filenames in that directory accessible via functions such asreaddir().
Readme: opendir() is documented on page 195 of the Camel.readdir() is on page 202. Don't forget that function help is alsoavailable by typing perldoc -f opendir or perldoc -freaddir
opendir(HOMEDIR, $ENV{HOME});my @files = readdir(HOMEDIR);closedir HOMEDIR;foreach (@files) { open(THISFILE, "<$_") || die "Can't open file $_: $!"); ... ... close THISFILE;}
ExercisesUse opendir() andreaddir() to obtain a list offiles in a directory. What order are they in?
Use the sort() function to sort the list offilesasciibetically (Answer: exercises/answers/dirlist.pl)
Opening files for simultaneous read/writeFiles can be opened for simultaneous read/write by putting a +in front of the > or < sign. +< is almostalways preferable, however, as +> would overwrite the filebefore you had a chance to read from it.
Read/write access to a file is not as useful as it sounds --- you can'twrite into the middle of the file using this method, only onto the end. The main use for read/write access is to read the contents of a file and then append lines to the end of it.
A more flexible way to read and write a file is to import the file intoan array, manipulate the array, then output each element again.
# program to remove duplicate linesopen(INFILE, "file.txt") || die "Can't open file.txt for input: $!";my @lines = <INFILE>;close INFILE;# dup-remover taken from The Perl Cookbookmy @unique = grep { ! $seen{$_} ++ } @lines;open(OUTFILE, ">file.txt") || die "Can't open file.txt for output: $!";foreach (@unique) { print OUTFILE $_;}close OUTFILE;Note: One thing to watch out for here is memory usage. If you have a tenmegabyte file, it will use at least that much memory as a Perl datastructure.
ExercisesOpen a file, reverse its contents (line by line) and write it backto the same filename (Answer:exercises/answers/reversefile.pl)
Opening pipesIf the filename given to open() begins with a pipe symbol(|), the filename is interpreted as a command to which outputis to be piped, and if the filename ends with a |, the filenameis to be interpreted as a filename which pipes input to us.
This is often used when you want to take input from the system a line ata time. Here's an example which reads from the rot13 filter (asimple routine which rotates the letters of its input by 13 letters,providing a very simple cipher for encoding the answers to jokes,spoilers to movies, or other low-security information):
#!/usr/bin/perl -wuse strict;open (ROT13, "rot13 < /etc/motd |") || die "Can't open pipe: $!";while (<ROT13>) { print;}close ROT13;Conversely, we can output something through rot13:
#!/usr/bin/perl -wuse strict;open (ROT13, "|rot13") || die "Can't open pipe: $!";print "This is some rot13'd text:\n";print ROT13 "This is some rot13'd text.\n";close ROT13;Advanced: If you reverse the two print lines above, the output will neverthelessbe in the same order as before. You'll need to set $| to flushthe output pipe. It's on page 130 of your Camel, or in perldoc perlvar.
ExercisesModify the second example above (provided for you asexercises/rot13.pl in your exercises directory to accept user input andprint out the rot13'd version.
Change your script to accept input from a file using open() (Answer:exercises/answers/rot13.pl)
Change your script to pipe its input through the stringscommand, so that if you get a file that's not a text file, it will onlylook at the parts of the file which are strings. (Answer:exercises/answers/strings.pl)
Finding information about filesWe can find out various information about files by using file testoperators and functions such as stat()
Table 2-1. File test operators
OperatorMeaning
-e File exists.
-r File is readable
-w File is writable
-x File is executable
-o File is owned by you
-z File has zero size.
-s File has nonzero size (returns size).
-f File is a plain file.
-d File is a directory.
-l File is a symbolic link.
-p File is a named pipe (FIFO), or Filehandle is a pipe.
-S File is a socket.
-b File is a block special file.
-c File is a character special file.
-t Filehandle is opened to a tty.
-u File has setuid bit set.
-g File has setgid bit set.
-k File has sticky bit set.
-T File is a text file.
-B File is a binary file (opposite of -T).
-M Age of file in days when script started.
-A Same for access time.
-C Same for inode change time.
Readme: The file test operators are documented fully in perldoc perlfunc.
Here's how the file test operators are usually used:
#!/usr/bin/perl -wuse strict;unless (-e "config.txt") { die "Config file doesn't exist";}# or equivalently...die "Config file doesn't exist" unless -e config.txt;
The stat() function returns similar information for asingle file, in list form. lstat() can also be usedfor finding information about a file which is pointed to by a symboliclink.
ExercisesWrite a script which asks a user for a file to open, takes their input from STDIN, checks that the file exists, then prints out the contents of that file. (Answer: exercises/answers/fileexists.pl)
Write a script to find zero-byte files in a directory. (Answer: exercises/answers/zerobyte.pl)
Write a script to find the largest file in a directory: exercises/answers/largestfile.pl)
Recursing down directoriesThe built-in functions described above do not enable you to easilyrecurse through subdirectories. Luckily, the File::Findmodule is part of the standard library distributed with Perl 5.
Readme: The File::Find module is documented in chapter 7 of the Camel, on page439, or in perldoc File::Find.
File::Find emulates Unix's find command. It takes as itsarguments a block to execute for each file found, and a list ofdirectories to search.
#!/usr/bin/perl -wuse strict;use File::Find;print "Enter the directory to search: ";chomp(my $dir = <STDIN>);find (\&wanted, $dir);sub wanted { print "$_\n";}For each file found, certain variables are set.$File::Find::dir is set to the current directory name,$File::Find::name contains the full name of the file, i.e.$File::Find::dir/$_.
ExercisesModify the simple script above (in your scripts directory asexercises/find.pl) to only print out the names of plain text files only (hint: use file test operators)
Now use it to print out the contents of each text file. You'llprobably want to pipe your output through more so that you cansee it all. (Answer: exercises/answers/find.pl)
File lockingFile locking can be achieved using the flock()function. This can be used to guard against race conditions or otherproblems which occur when two (or more) users open the same file inread/write mode.
Readme: flock() is documented on page 166 of the Camel book,or use perldoc -f flock to read the onlinedocumentation.
Handling binary dataIf you are opening a file which contains binary data, you probably don't want to read it in a line at a time using while (<>) { }, as there's no guarantee that there will be any line breaks in the data.
Instead, we use read() to read a certain number of bytesfrom a file handle.
Readme: read() is documented on page 202 of the Camel book, or byusing perldoc -f read.
read() takes the following arguments:
The filehandle to read from
The scalar to put the binary data into
The number of bytes to read
The byte offset to start from (defaults to 0)
#!/usr/bin/perl -wuse strict;my $image = "picture.gif";open (IMAGE, $image) or die "Can't open image file: $!";open (OUT, ">backup/$image") or die "Can't open backup file: $!";my $buffer;binmode IMAGE;while (read IMAGE, $buffer, 1024) { print OUT $buffer;}close IMAGE;close OUT;Note: If you are using Windows, DOS, or some other types of systems, you mayneed to use binmode() to make sure that certainlinefeed characters aren't translated when Perl reads a file in binarymode. While this is not needed on Unix systems, it's a good idea to useit anyway to enhance portability.
Chapter summaryAngle brackets <> can be used for simple line input. Inscalar context, they return the next line; in list context, allremaining lines; the default filehandle is STDIN or any filesmentioned in the command line (ie@ARGV).
Angle brackets can also be used as a globbing operator if anythingother than a filehandle name appears between the angle brackets. Inscalar context, returns the next file matching the glob pattern; in listcontext, returns all remaining matching files.
The open() and close() functions can be used toopen and close files. Files can be opened for reading, writing,appending, read/write, or as pipes.
The opendir(), readdir() and closedir()functions can be used to open, read from, and closedirectories.
The File::Find module can be used to recurse down throughdirectories.
File test operators or stat() can be used to find information about files
File locking can be achieved using flock()
Binary data can be read using the read() function. The binmode() function should be used to ensure platform independence when reading binary data.
Chapter 3. Advanced regular expressions
In this section...This section builds on the basic regular expressions taught in Netizen'sIntroduction to Perl course. We will learn how to handle datawhich consists of multiple lines of text, including how to input data asmultiple lines and different ways of performing matches against that data.
Assumed knowledgeYou should already be familiar with the following topics:
Regular expression metacharacters
Quantifiers
"Greediness" in regular expressions, aka maximal andminimal matching
Character classes and alternation
The m// matching function
The s/// substitution function
Matching strings other than $_ with the =~ matching operator
Assigning matched strings to lvalues
Readme: Patterns and regular expressions are dealt with in depth inchapter 2 of the Camel book, and further information is available in theonline Perl documentation by typing perldoc perlre.
Review exercisesThe following exercises are intended to refresh your memory of basicregular expressions:
Write a script to search a file for any of the names "YasserArafat", "Boris Yeltsin" or "Monica Lewinsky". Print out any lineswhich contain these names. (Answer:exercises/answers/namesre.pl)
What pattern could be used to match any of: Elvis Presley, ElvisAron Presley, Elvis A. Presley, Elvis Aaron Presley. (Answer: exercises/answers/elvisre.pl)
What pattern could be used to match a blank line? (Answer: exercises/answers/blanklinere.pl)
What pattern could be used to match an IP address such as203.20.104.241, where each part of the address is a number from 0 to 255? (Answer: exercises/answers/ipre.pl)
More metacharactersHere are some more advanced metacharacters, which build on the onesalready covered in the Introduction to Perl module:
Table 3-1. More metacharacters
Metacharacter Meaning
\BMatch anything other than a word boundary
\cXControl character, i.e. CTRL-X
\0nnOctal character represented by nn
\xnnHexadecimal character represented by nn
\lLowercase next character
\uUppercase next character
\L Lowercase until \E
\UUppercase until \E
\Qquote (disable) metacharacters until \E
\EEnd of lowercase/uppercase
# search for the C++ computer language:/C++/ # wrong! regexp engine complains about the plus signs/C\+\+/ # this works/C\Q++\E/ # this works too# search for "bell" control characters, eg CTRL-G/\cG/ # this is one way/\007/ # this is another -- CTRL-G is octal 07/\x07/ # here it is as a hex code
Working with multiline stringsOften, you will want to read a file several lines at a time. Consider,for example, a typical Unix fortune cookie file, which is used togenerate quotes for the fortune command:
%Let's call it an accidental feature. -- Larry Wall%Linux: the choice of a GNU generation%When you say "I wrote a program that crashed Windows", people just stare atyou blankly and say "Hey, I got those with the system, *for free*". -- Linus Torvalds%I don't know why, but first C programs tend to look a lot worse thanfirst programs in any other language (maybe except for fortran, but thenI suspect all fortran programs look like `firsts') -- Olaf Kirch%All language designers are arrogant. Goes with the territory... -- Larry Wall%We all know Linux is great... it does infinite loops in 5 seconds. -- Linus Torvalds%Some people have told me they don't think a fat penguin really embodies the grace of Linux, which just tells me they have never seen a angry penguin charging at them in excess of 100mph. They'd be a lot more carefulabout what they say if they had. -- Linus Torvalds, announcing Linux v2.0%
The fortune cookies are separated by a line which contains nothing buta percent sign.
To read this file one item at a time, we would need to set the delimiterto something other than the usual \n - in this case, we'd need to set it to something like \n%\n.
To do this in Perl, we use the special variable $/.
$/ = "\n%\n";Conveniently enough, setting $/ to "" will cause inputto occur in "paragraph mode", in which two or more consecutivenewlines will be treated as the delimiter. Undefining $/ willcause the entire file to be slurped in.
undef $/;$_ = <FH>; # whole file now hereReadme: Special variables are covered in Chapter 2 of the Camel book,from page 127 onwards. We're going to be looking at more specialvariables soon, so mark the page now. The information can also be foundin perldoc perlvar.
Since $/ isn't the easiest name to remember, we can use alonger name by using the English module:
use English;$INPUT_RECORD_SEPARATOR = "\n%\n"; # long name for $/$RS = "\n%\n"; # same thing, awk-likeReadme: The English module is documented on page 403 of the Camel or in perldoc English.
ExercisesIn your directory is a file called exercises/linux.txtwhich is a set of Linux-related fortunes, formatted as in the aboveexample. Use multiline regular expressions to find only those quoteswhich were uttered by Larry Wall. (Answer:exercises/answers/larry.pl)
Regexp modifiers for multiline dataThe /s and /m modifiers can be used to treat thestring you're matching against as either a single or multiple lines. Insingle line mode, ^ will match only at the start of the entirestring, and $ will match only at the end of the entire string.In multiline mode, they will match at embedded newlines as well.
my $string = qq(This is some textand some more textspanning several lines);if ($string =~ /^and some/m) { # this will match print "Matched in multiline mode\n";}if ($string =~ /^and some/s) { # this won't match print "Matched in single line mode\n";}In single line mode, the dot metacharacter will match \n. In multiline mode, it won't.
The differences between default, single line, and multiline mode are setout very succinctly by Jeffrey Friedl in Mastering Regular Expressions(see the Bibliography at the back of these notes for details). Thefollowing table is paraphrased from the one on page 236 of that book.
His term "clean multiline mode" refers to a mode which is similar tomulti-line, but which does not strip the newline character from the end of each line.
Table 3-2. Effects of single and multiline options
ModeSpecified with^ matches...$ matches...Dot matches newline
default neither /s nor /mstart of string end of string No
single-line /s start of string end of string Yes
multi-line /m start of line end of line No
clean multi-line both /m and /s start of line end of line Yes
BackreferencesSpecial variablesThere are several special variables related to regular expressions.
$& is the matchedtext
$` is the unmatched text to the left of the matched text
$' is the unmatched text to the right of the matched text
$1, $2, $3, etc. The text matched by the 1st, 2nd, 3rd, etc sets of parentheses.
All these variables are modified when a match occurs, and can be used inany way that other scalar variables can be used.
# this...my ($match) = m/^(\d+)/;print $match;# is equivalent to this:m/^\d+/;print $&;# match the first three words...m/^(\w+) (\w+) (\w+)/;print "$1 $2 $3\n";You can also use $& and other special variables in substitutions:
$string = "It was a dark and stormy night.";$string =~ s/dark|wet|cold/very $&/;If you want to use parentheses simply for grouping, and don't want themto set a $1 style variable, you can use a special kind ofnon-capturing parentheses, which look like (?: ... )
# this only sets $1 - the first two sets of parentheses are non-capturingm/^(?:\w+) (?:\w+) (\w+)/;The special variables $1 and so on can be used insubstitutions to include matched text in the replacement expression:
# swap first and second wordss/^(\w+) (\w+)/$2 $1/;However, this is no use in a simple match pattern, because $1and friends aren't set until after the match is complete. Somethinglike:
my $word = "this";print if m/($word) $1/;... will not match "this this". Rather, it will match "this"followed by whatever $1 was set to by an earlier match.
In order to match "this this" we need to use the special regularexpression metacharacters \1, \2, etc. Thesemetacharacters refer to parenthesized parts of a match pattern, just as$1 does, but within the same match rather thanreferring back to the previous match.
my $word = "this";print if m/($word) \1/;
ExercisesWrite a script which swaps the first and the last words oneach line (Answer: exercises/answers/firstlast.pl)
Write a script which looks for doubled terms such as "bang bang"or "quack quack" and prints out all occurrences. This script could beused for finding typographic errors in text. (Answer:exercises/answers/double.pl)
AdvancedModify the above script to work across line boundaries (Answer:exercises/answers/multiline_double.pl)
What about case sensitivity?
Section summaryInput data can be split into multiline strings using the specialvariable $/, also known as $INPUT_RECORD_SEPARATOR.
The /s and /m modifiers can be used to treat multiline data as if it were a single line or multiple lines, respectively. This affects the matching of ^ and $, as well as whether or not . will match a newline.
The special variables $&, $` and $' are always set when a successful match occurs
$1, $2, $3 etc are set after a successful match to the text matched by the first, second, third, etc sets of parentheses in the regular expression. These should only be used outside the regular expression itself, as they will not be set until the match has been successful.
Special non-capturing parentheses (?:...) can be used for grouping when you don't wish to set one of the numbered specialvariables.
Special metacharacters such as \1, \2 etc may be used within the regular expression itself, to refer to text previously matched.
Chapter 4. More functions
In this chapter...In this chapter, we discuss some more advanced Perl functions.
The grep() functionThe grep() function is used to search a list for elements whichmatch a certain regexp pattern. It takes two arguments - a pattern and alist - and returns a list of the elements which match the pattern.
Readme: The grep() function is on page 178 of your Camel book.
# trivially check for valid email addressesmy @valid_email_addresses = grep /\@/, @email_addresses;The grep() function temporarily assigns each element of thelist to $_ then performs matches on it.
There are many more complicated uses for the grep function. Forinstance, instead of a pattern you can supply an entire block which isto be used to process the elements of the list.
my @long_words = grep { (length($_) > 8); } @words;grep() doesn't require a comma between itsarguments if you are using a block as the first argument, but does require one if you're just using an expression. Have a look at thedocumentation for this function to see how this is described.
ExercisesUse grep() to return a list of elements which containnumbers (Answer: exercises/answers/grepnumber.pl)
Use grep() to return a list of elements which are
keys to a hash (Answer: exercises/answers/grepkeys.pl)
readable files (Answer: exercises/answers/grepfiles.pl)
The map() functionThe map() function can be used to perform an actionon each member of a list and return the results as a list.
my @lowercase = map lc, @words;my @doubled = map { $_ * 2 } @numbers;map() is often a quicker way to achieve what wouldotherwise be done by iterating through the list withforeach.
foreach (@words) { push (@lowercase, lc($_);}Like grep(), it doesn't require a comma between itsarguments if you are using a block as the first argument, but does require one if you're just using an expression.
ExercisesCreate an array of numbers. Use map() to find the square of each number. Print out the results.
Chapter summary
Chapter 5. System interaction
In this section...In this section, we look at different ways to interact with theoperating system. In particular, we examine the system()function, and the backtick command execution operator. We also look atsecurity and platform-independence issues related to the use of thesecommands in Perl.
system() and exec()The