Using Ruby to Process m3u Playlists
Written By: Nathan Baker
- 06 Apr 2006 -
Description: Programming in Ruby is fun! In this tutorial, we will use Ruby to process the m3u playlists used by a number of media players.
- Introduction
- Part 1: Getting our feet wet
- Part 2: Slowly getting there
- Part 3: I/O, I/O, it's off to work we go
- Part 4: Cleaning up our mess
- Part 5: The extra mile
Part 3: I/O, I/O, it's off to work we go
So after all this talking about closures and enumeration and other things that nobody without a degree in computer science really cares about, you're probably ready to keel over with boredom. Is this class ever going to do anything really useful? Well, the wait is over. We will now do the most important thing: reading from the file itself. So without further ado, I present:
class Playlist def read(file) plf = File.open(file,'r') @playlist = parse(plf.read) plf.close @playlist end end
You're thinking "Wow, all that build-up and then he does all the real work in a function he hasn't written yet". I know, pretty lame? This code serves to demonstrate a few things: first, despite Ruby's sometimes strange-seeming syntax (try saying that five times fast), I/O is relatively canonical and straightforward. In fact, the only languages that really do bizarre things with I/O (that I know of, anyway) are C++ with iostreams and AWK. And guess what? If you like iostreams, Ruby can handle that too, since the << and >> are valid Ruby operators (if you like AWK you're on your own, though AWK isn't really that bizarre once you understand what it's really doing). Second (yes, way up there before I started rambling about AWK I did say there were two things) is the last line. @playlist is just hanging out there by itself. Ruby supports return statements, but like LISP, the function by default takes on the value returned by its contents. Since Ruby is more procedural and thus has multiple statements per function, the last return value is the value for the whole function.
Enough screwing around; now we're really going to do the hard part. And I mean it this time.
If these expressions are regular, I would hate to see an irregular expression
If the non-word regex doesn't mean anything to you, you might struggle a bit in this section. A regex, or regular expression, is a pattern that can be applied to text. Regxps are essentially the last word in data processing once you understand how to use them. Many programming languages have regex facilities, but one thing Ruby got from Perl was the use of regxps as a first-class value. You can assign them to variables and then match them with strings. In fact, the =~ operator (also borrowed from Perl) means "match this string (left hand side) with this regex (right hand side)".
I say this because I made use of regxps in the following code. I tried to keep this code simple so for those who had never seen any Ruby before reading this article won't give up and join a monastery:
class Playlist private def parse(str) re = /\#EXTINF:(\d+)\s*,\s*(.+)$/ playlist = Array.new str.each{ |line| if re=~line playlist << SongEntry.new($2,$1) end } playlist end end
If this is totally confusing, don't worry: I'll explain it in my excruciatingly slow manner. First off, C++ fans will be glad to see the private keyword decorating the parse function. Private, protected, and public are all used in essentially the same way (minus the colon) they are in C++. It means that not just anyone can call parse, only other methods in the same class. Also note that C++ people can read Array.new as new Array. It just calls the constructor.
So we define a regular expression named re. The // characters are used as delimiters for regxps. Inside, I'm looking for a line that matches this definition:
#EXTINF:[any number of digits],[any amount of text]
I have to escape the # character because it's the comment character in Ruby. \d matches any decimal digit, \s matches whitespace, and . matches anything (except a newline). $ matches the end of a line, and while it was not strictly necessary, I included it for clarity's sake.
The string class also processes the each method. By default it splits the string up at the newlines, but you can add a parameter to str.each to make it split up using any character you want (you can, for example, use a space to enumerate each word in a sentence). For each line, we check to see if the regex matches it. If so, we do this thing
playlist << SongEntry.new($2,$1)
This is the line likely to give the most people problems. C++ iostream users will recognize the << operator as inserting something into something. In Ruby, << is a shortcut for insertion. We are inserting a new SongEntry into the playlist array (what is a SongEntry you ask? Well, I haven't defined it yet. One thing at a time, yo!). Those who were paying attention when I was discussing variables will remember that $2 and $1 are globals. Ruby operations tend to set a huge amount of globals, and these are set by the =~ operator. Specifically, they contain the actual string that was matched by parenthesized expressions inside the regex. The first () is in $1, and the second is in $2. If I had included a third set of parentheses, who can guess which variable that string would be stored in?