This post is meant to be useful for people who have not previously used perl, as well as a reminder for more experienced folks.
Let’s suppose you were, say, trying to run an event and had asked people to submit proposals for workshops. Let’s say further you had not installed software to manage this, but had instead created a Google Form.
(Why would you do this? Unlike your ISP, they’re ‘free’ (in exchange for your data/ soul). They’re robust. They’re easy.)
At the end of your call, you have a spreadsheet full of entries. Yay! Ok, now what? That’s hard to read. Let’s output them as individual HTML files.
Ok, the next thing you’re going to want is a unique ID for each submission. The timestamp might work, but let’s say you decided to go through and add a column on the end called Unique ID and then hand typed sequential numbers through the whole spread sheet. (Please leave you better ways of handling this in the comments!)
The next step is to download your speadsheet in CSV format. This is a comma separated format that databases and spreadsheets use for information exchange. Let’s say you’ve saved your file as workshop.csv and your perl script is going to be saved to the save directory.
CPAN
You’re going to need to get some modules from CPAN. If, like me, you wandered away form perl for the last several years, this is a new but fantastic development. Google for more information. At the top of the file, when you see lines that look like “use Foo::Bar” or “require Baz::Bat”, Foo::Bar and Baz::Bat are the names of cpan modules you will need to install.
First of all, make sure you have CPAN installed.
Next, install a module that will make installing the next modules easier. Type: ‘sudo cpan App::cpanminus’ (without the quotes). Omit the ‘sudo’ is you don’t have root or administrative privileges on your computer (or you don’t know what those are) or if you’re on windows.
Then install the cpan modules that you’ll need:
sudo cpan Text::CSV::Encoded
sudo cpan HTML::Entities
sudo cpan URI::Find::Schemeless
Reading the csv file
The whole script is included below, but this is some of the part you’re going to need to change.
Let’s say this is what your spreadsheet headers look like:
Timestamp |
Your name |
Contact email |
Link to your website |
Experience level required to participate in workshop |
Proposed workshop duration |
Outline of the workshop aims and content |
Biography |
Required Resources. |
Unique ID |
Ok, so how the script is going to work is that we’re going to go submission by submission, reading the database and outputting an HTML file. Each of those table columns has a different piece of data for each submission. Let’s save them in variables. We can start by declaring them, and then by using them.
my ($time, $name, $email, $website, $experience, $duration, $outline, $biography
$resources, $id);
In Perl, variable names for regular variables (scalar variables) start with $. In that line, we’ve named all the variables we’re going to use to read database data.
Ok, we’re reading the csv file in a loop. The loop takes a submission from a file and puts it into an array. The array indexes start with 0. We’re going to read from the array into our variables. That will look like this:
$time = $fileds[0];
$name = $fields[1];
$email = fields [2];
$website = $fields[3];
...
If we carry on, we’ll get all of the database fields.
Formatting Our Data
But we don’t just want to read them in, we want them to format correctly for HTML. There are some subroutines declared at the top of the file. Those subroutines do a few things: One prepares special characters (like accents) to be encoded properly into HTML and they look for things that look like URLS and convert them into links. Another breaks up paragraphs properly. The third one helps out the other two.
Which one we use depends on how we’re going to present the data. For example, we might use the name as a header, not as a paragraph, so we would just use the encoding/link finding one for that. But the biography is likely to be a paragraph or two, so for that one, we want to break up the paragraphs properly. The encoding/link finding subroutine is called ‘prepare_str’ and the paragraph finding one is called ‘htmlify’. You will only ever need to use one of these for any variable.
Let’s change our above example so that in addition to reading from the csv file, we also prepare the data correctly for output:
$name = prepare_str($fields[1]);
...
$biography = htmlify($fields[7]);
...
Which of those routines you pick has to do with whether the data coming in is broken into paragraphs.
Outputting your HTML file
open FILE, ">$id.html" or next;
I’ve decided to just name the files after the unique id. Since this is different for every entry, I don’t need to worry about accidentally trying to create two files with the same name. (If I tried to use the submitter’s name to create the files, I’d need to first check if a file with that name already existed, in case the same person had made multiple submissions).
In perl, you can put your variables right inside a string and it will substitute appropriately. If the submission has the id 666, then the script will translate from “$id.html” to “666.html”.
Perl has a nifty feature that lets you output multiple lines of of text to a file from right in the program. You don’t need to worry about watching out for special characters like quotes, and you can put your variables directly into the string. However, this does mean that if you want to output a $ or a @ you’ll need to type in $ and @ for them.
print FILE <<END;
That starts the longer text section. It continues until it gets to a line that looks like:
END
Let’s say we just want to print out a header, the person’s name and their biography. We’ll use their name as a section heading. That would look like:
print FILE <<END;
<html>
<body>
<h1>$name</h1>
$biography
</body>
</html>
END
You could stick in that section anything else you might want to stick in your html file, including style sheet stuff, extra text, extra links, whatever you want.
The Whole Script
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV::Encoded;
use HTML::Entities;
require URI::Find::Schemeless;
my ($file, $csv, @fields, $io, $row, $output);
my ($id, $name, $email, $website, $experience, $duration, $outline, $bio, $tech);
$file = 'workshop.csv';
sub makelink
{
my $str;
($str) = @_;
chomp($str);
if ($str !~ /^http/i) {
$str = "<a href="http://$str">$str</a>";
} else {
$str = "<a href="$str">$str</a>";
}
return $str;
}
sub htmlify
{
my $str;
($str) = @_;
chomp($str);
$str = prepare_str($str);
$str = "<p>$str</p>";
$str =~ s/n/</p>n<p>/g;
return $str;
}
sub prepare_str
{
my ($str, $finder);
($str) = @_;
chomp($str);
$str = encode_entities($str);
$finder = URI::Find::Schemeless->new(&makelink);
$finder->find($str);
return $str;
}
$csv = Text::CSV::Encoded->new({encoding => undef, binary =>1, eol => $/ });
open $io, "