Thursday, August 2, 2012

Perl Modules and Packages

This morning, someone posted a question on the Perl Beginner's mailing list about functions. They use two modules. Each module has its own collection of utility functions. For the first module, the code calls the functions directly - with just the function name. With the second module, the code puts the module name in front of the function names. The poster asked why the difference?

One of the Perl experts on the list asked if the functions lived in a module or a package. There is a difference. And the difference is very important. I thought it helpful to explore.

What's in a name?

Let's start with a discussion on namespaces. Consider the variable named $a. Its name is, well, a. So your code declares $a and assigns it a value:

    my $a = 6;
    ...
    $a += 3;
    print "$a\n";

Now imagine that we include a module named foo. foo also has a variable named $a.

    # Main program
    my $a = 6;
    ...
    use foo;
    $a += 3;
    print "$a\n";

    #-------------
    # Module "foo"
    my $a = 27;

The code above prints 30, not 9! What happened? The definition of $a in module foo overrode our main program. Both places referenced the same variable: $a. Now imagine that every module on CPAN clobbered variables in your code! You would rename all of your variables every time you use a new module.

Perl solves this problem by putting a namespace in front of every variable name. Internally, Perl translates $a into $main::a. Technically, you can put $main::a in your code and it works. But why type all of those extra characters when Perl does the work for you?

I picture namespaces like boxes. All of the variables from your main program go into a box named main::. All of the variables from foo go into a box named foo::

    +-----------+     +-----------+
    | main::    |     | foo::     |
    +-----------+     +-----------+
    | $a = 3    |     | $a = 27   |
    | $b = 'hi' |     | $h = 'oh' |
    | $c = 10   |     | $x = 30   |
    +-----------+     +-----------+

The big match: Module versus Package

Modules are files that you can include. A module uses the same namespace as the code that included it. When we say use foo, Perl puts all of foo's variables in the main:: namespace.

Packages create namespaces. The package foo puts all of its variables in the namespace foo::. You create a package with the package command.

You use modules. The module declares a package.

So back to our original poster's question. He has two files (aka modules). That first module drops its functions into the main:: namespace. It is a module in the truest sense - a file that simply inserts itself into the code. The second module contains a package statement. Perl creates a namespace and puts the functions in there. To call those functions, his code specifies the namespace - the module name he saw in front.

Best practice says that...

  1. Every module starts with a package statement.
  2. Any module contains only one package.
  3. The module and package have the same name.

As the original e-mail implies, Perl does not enforce these rules. You must specifically implement them. On the plus side, the rules come from very smart people who have learned hard lessons over the years. Save yourself the confusion and aggravation.

No comments:

Post a Comment