Archive | Perl RSS for this section

An Intro To Parsec

Taking a break from the Photo Booth, I’ve been working on a re-write of dmp_helper. If you’ve been following me for a while, you may remember that as a quick, dirty hack I did in perl one day to avoid manually writing out HTML snippets. While it mostly works, it’s objectively terrible. I guess if my goal is to never get a job doing Perl (probably not the worst goal to have) then it might serve me well.

But I digress. DmpHelper, the successor to dmp_helper, is going to be written in Haskell, and make extensive use of the Parsec parser library. The old dmp_helper makes extensive use of regular expressions to convert my custom markup to HTML, but I’d prefer to avoid them here. I find regular expressions to be hard to use and hard to read.

So I set off to work. And not 3 lines into main :: IO (), I came across a need to parse something! This is something that has been parsed many times before, but it’s good practice so it’s a wheel I’ll be re-inventing. I’m talking of course about parsing the ArgV!

Some Choices To Make

But before I get to that, I need to make some choices about how my arguments will look. I’ve decided to use the “double dash” method. Therefore, all flags will take the form of --X, where X is the flag. Any flag can take an optional argument, such as --i foo, all text following a flag is considered to belong the the preceeding flag until another flag is encountered. Some flags are optional, and some are mandatory. The order flags appear is not important.

So far, I have 3 flags implemented: --i [INPUT_FILE], the file to be converted, --o [OUTPUT_FILE], the location to save the resulting HTML, and --v, which toggles on verbose mode. Other flags may be added in as development progresses.

Data ArgV

Next, I’ll create a type for my ArgV.

data ArgV = ArgV {inputFile :: FilePath, outputFile :: FilePath, isVerbose :: Bool} deriving (Show)

As you can see, my type has three fields, which line up with my requirements above.

Introducing Parsec

Parsec is a fairly straight-forward library to use. It has a lot of operators and functions, whcih require some thought. However they all amount to basic functions that do a specific parsing task. Your job is to tell these functions what to do. So let’s get right down to it.

parseArgV :: [String] -> Either ParseError ArgV parseArgV i = parse pArgV "" $ foldArgV i foldArgV :: [String] -> String foldArgV i = foldl' (++) [] i

Here we have two functions to get us started. The function foldArgV collapses the list of strings into a single string, to be parsed. Because of the way that arguments are parsed by the operating system, the argument string --i foo --o bar --v will be collapsed to --ifoo--obar--v. This is good for us because this means we don’t have to worry about parsing white space.

The second function, parseArgV is our entry point into Parsec. All of Parsec’s combinator functions return a Parser monad. The function parse parses the data in the parser monad and returns either an error, or a whatever. In our case, it’s parsing to an ArgV.

Parse takes 3 arguments: a parser, a “user state”, and a string to parse. The first and third are pretty self explanatory, but the second is a mystery. Unfortunately I don’t have an answer for you as to it’s nature. I spent a good hour or two researching it yesterday, and the best I could come up with is “don’t worry about it”. None of the tutorials use it, and the documentation basically just says “this is state. Feel free to pass an empty string, it’ll be fine”. I’ve decided I won’t worry my little head about it for the time being.

pArgV :: CharParser st ArgV pArgV = do permute $ ArgV <$$> pInputFile <||> pOutputFile <|?> (False, pIsVerbose)

This function is the true heart of my parser. I’d like to draw your attention to the invocation of permute. This function implements the requirement that the order of arguments shouldn’t matter. Ordinarily, parsec will linearly go through some input and test it. But what do you do if you need to parse something that doesn’t have a set order? The library Text.Parsec.Perm solves this problem for us.

The function permute has a difficult type signature to follow, so a plain-english explanation is in order. Permute is used in a manner that resembles an applicative functor. Permute takes a function, and then using it’s operators, it takes functions that return a parser for the type of each field in the function in a pseudo-applicative style. These functions don’t need to come in the order that they’re parsed in, but in the order that the initial function expects them to be. Permute will magically parse them in the correct order and return a parser for your type. Let’s break this down line-by-line:

... permute $ ArgV <$$> pInputFile ...

Here we call permute on the constructor for ArgV. We feed it the first function, pInputFile using the <$$> operator. This operator is superficially, and logically similar to the applicative functor’s <$> operator. It creates a new permutation parser for the type on the left using the function on the right.

... <||> pOutputFile ...

Here we feed the second parser function, pOutputFile to the permutation parser using the <||> operator. This operator is superficially, and logically similar to applicative’s <*> operator.

... <|?> (False, pIsVerbose) ...

Finally, we feed it a parser for isVerbose. The operator <|?> works the same as the <||> operator, except that this operator is allowed to fail. If <||> fails, the whole parser fails. If <|?> fails, then the default value (False, in this case) is used. This allows us to have optional parameters.

Moving along from here, here are some extra parsing functions we need.

pInputFile :: CharParser st FilePath pInputFile = do try $ string "--i" manyTill anyChar pEndOfArg

This parser parses the --i parameter. The first line of the do expression parses the input to see if it is --i. Because it is wrapped in a call to try, it will only consume input if it succeeds. Since we don’t actually need this text, we don’t bind it to a name. If the parser had failed, it would exit the function and not evaluated the second line. On the second line, we take all the text until pEndOfArg succeeds. This text is what is returned from the function.

pEndOfArg :: CharParser st () pEndOfArg = nextArg <|> eof where nextArg = do lookAhead $ try $ string "--" return ()

This function detects if the parser is at the end of an argument. Notice the <|> operator. This is the choice operator. Basically, if the parser on the left fails, it runs the parser on the right. You can think of it like a boolean OR.

The parser eof is provided by parsec, and succeeds if there is no more input. The function lookAhead is the inverse of try. It only consumes input if the parser fails. By wrapping a try inside of a lookAhead, we create a parser that never consumes input.

The parser functions pOutputFile and pIsVerbose are almost identical to pInputFile, so I’m not going to bother typeing them out here.

Additional Reading

That’s about all there is to my ArgV parser. You can find some additional reading on the subject in Real World Haskell, whcih has an entire chapter dedicated to this subject. Also, check out the Parsec haddoc page on hackage.

Baby’s First Web Application Framework

Deep in the depths of the DMP Lunar Bunker, I have a file server running. I prefer to do my computing on various internet connected devices rather than be tied to one computer. To this end, I try to keep my documents, images, and things of that nature centralized on the server so that I can get to it anywhere. This is all fine and good on a computer, but my iPad and various smartphones don’t have native support for mounting a smb share.

To that end, a project that I’ve had planned for a long time is to develop a web app running on my server to serve that stuff up through a web browser. That said, this post isn’t announcing the beginning of that project. I have enough on my plate: DMP Photo Booth still needs polish, I want to learn Haskell, I have a wedding to plan, and I have to spend this month figuring out what “Linear Algebra” is. Not to mention that the file server is in need of a software refresh. All of those things are higher priority. That said, as an excercise in learning Haskell, I’ve created a toy HTML generator.

doopHtml

That would be the name of my little suite of functions. I feel its name reflects the gravity of the project. All that it really is is a few types, and customized show instances. First, let’s take a look at my types:

type TagName = String type Attributes = [(String, String)] data Child = NullChild | TagChild Tag | StringChild String data Tag = NullTag | Tag TagName Attributes [Child]

First of all, forgive my sloppy indentation style. I haven’t really settled on a code style for Haskell yet. With that out of the way, let’s get to business.

TagName

This is just a type alias for a String. The concept of a math function is one that I understand, but I’ve always hated the trend of giving things useless names like a, i, or Σ instead of names like numerator, index, or Summation over. Sure, it was good enough for Aristotle, but I live in an age with things like “find and replace” and “code completion”. Maybe thousands of years from now, scholars will remember Chris Tetreault, father of the descriptive variable name. (HA!)

But I digress. I’ve created this alias to document when a String in a function argument or type constructor is a tag name.

Attributes

This is a type alias for an array of String pairs. This type represents the attributes list of a Tag. Things like src="http://doingmyprogramming.com/" It’s pretty self-explanatory. An empty list is an allowed value, and represents a tag with no attributes.

Child

Now we’re getting a bit more complicated. If you picture a HTML document as a tree, then the child is the branches from the top level tag. There are three types of Child: NullChild is the absence of a tag, TagChild is a nested Tag, StringChild is just some free floating text.

Tag

And now the meat of it. Tag has two type constructors: NullTag which is the absence of a tag, or Tag.

Tag takes 3 arguments: a TagName, an Attributes, and a list of Child. This is all you really need to represent a basic HTML document. (no embedded CSS)

None of these types have deriving(Show), because they require a custom implementation. Let’s look at that next.

Show

First, let’s take a look at Tag‘s Show implementation:

instance Show Tag where show NullTag = "" show (Tag n [] []) = "<" ++ n ++ ">" show (Tag n a []) = "<" ++ n ++ flattenAttributes a ++ ">" show (Tag n [] c) = show (Tag n [] []) ++ foldl1 (++) (map show c) ++ "</" ++ n ++ ">" show (Tag n a c) = show (Tag n a []) ++ foldl1 (++) (map show c) ++ "</" ++ n ++ ">"

The show tag has 5 patterns:

  • The first pattern is a NullTag. It returns an empty string.
  • The second pattern is a Tag that only has a TagName. It places the TagName between angle brackets.
  • The third pattern is one that also has an Attributes. It flattens the Attributes, and places it and the TagName between angle brackets
  • The fourth pattern is one that has a TagName and a Child list. It :
    • Recursively calls show with the argument (Tag n [] []). This results in the opening tag being printed
    • maps show over the Child list, then folds the returned list of Strings up using foldl1 (++), concatenating all the returned Strings into 1 large String.
    • Closes the tag
  • The fifth pattern has all fields. it works the same as the fourth, but calls the show (Tag n a []) instead.

The Show implementation for Child is much simpler, and I don’t feel it requires explanation:

instance Show Child where show NullChild = "" show (TagChild x) = show x show (StringChild x) = x

But what about that flattenAttributes function earlier? It looks like this:

flattenAttributes :: Attributes -> String flattenAttributes [] = [] flattenAttributes x = foldl (\acc (k,v) -> acc ++ " " ++ k ++ "=" ++ v) "" x

If given an Empty List, it returns an Empty List. In Haskell, [] is the same thing as "", so this will function as an empty String. However, if not given an empty string the work is much more complicated.

First, foldl is called. foldl‘s function signature is: (a -> b -> a) -> a -> [b] -> a, which reads as: “foldl takes a function that takes an a, b, and returns an a. It also takes an a, and a list of b, and it returns an a”. In english, it needs a starting value, a list to fold up, and a function to be called on each list element. I call foldl on an empty string and x, which is the Attributes passed into the function.

The funky bit in the middle of all of that, (\acc (k,v) -> acc ++ " " ++ k ++ "=" ++ v), is called a Lambda. If you’ve ever written an anonymous function in Java, you’ve done this before. The \ at the beginning tells Haskell that what follows is a Lambda. After that, you have your arguments. Remember the signature of the function to be passed into foldl: (a -> b ->a). In this lambda, acc is a, and this is a String. (k,v) is b, a Tuple of two Strings. The part following the -> is the function body. In this lambda, we concatenate the two pairs together into the form of " k=v". That is then concatenated onto acc, which is the results of all previous calls. The final result looks something like " k1=v1 k2=v2 ... kn=vn", which is returned from the function.

Some Helpers

So, now that’s all done, and the Tag type is usable. It’s not user friendly though:

Tag "html" [] [TagChild $ Tag "head" [] [TagChild $ Tag "title" [] [StringChild $ "DMP"]], TagChild $ Tag "body" [] [TagChild $ Tag "a" [("href","doingmyprogramming.com")] [StringChild "Doing My Programming..."]]] <html><head><title>DMP</title></head><body><a href=doingmyprogramming.com>Doing My Programming...</a></body></html>

Barf, right? It’s like writing HTML, only harder! Clearly we need some helper functions to make working with this easier.

tagHtml :: [Child] -> Tag tagHtml c = Tag "html" [] c tagHead :: Attributes -> [Child] -> Tag tagHead a c = Tag "head" a c tagTitle :: String -> Tag tagTitle t = Tag "title" [] [StringChild t] tagBody :: Attributes -> [Child] -> Tag tagBody a c = Tag "body" a c pageSkeleton :: String -> [Child] -> Tag pageSkeleton t b = tagHtml [TagChild $ tagHead [] [TagChild $ tagTitle t], TagChild $ tagBody [] b]

Here is some basic helpers I whipped up while writing this. These four helpers will take just the fields needed to produce their respective tags. Now, we can make a basic page skeleton by calling the pageSkeleton function:

pageSkeleton "DMP" [TagChild $ Tag "a" [("href","http://doingmyprogramming.com")] [StringChild $ "Doing My Programming..."]] <html><head><title>DMP</title></head><body><a href=doingmyprogramming.com>Doing My Programming...</a></body></html>

While that could use some work, it’s enough to see where this is going. Much like Perl’s CGI module, functions can be created to automate much of the boilerplate.

Reinventing The Wheel

All this is fine and good, but surely this has been done already, right? Considering Haskell’s hackage server is implemented in Haskell, it seems that there is almost certainly a better web development framework for Haskell than doopHtml. I encourage you to not develop your next web application with the doopHtml framework, nor should you include your expertise in doopHtml on your resume.

It was good practice though!

Now For Something Completely Different

For as long as I’ve been trying (successfully or not) to program, I’ve been using C like languages. When I was a kid, I struggled in vain to learn C++. As an adult, I learned Java. After that, I used Java as a spring-board into the wonderful world of C Like Languages: C, C++, Perl, Lua. I wrote hello world in dozens of others as well. I found myself proudly proclaiming that “I’m confident I could pick up any C Like Language!”

Then one day I thought “what about the rest of them?” Sure, maybe I can speak Latin. Maybe I can pick up any Latin based language with relative ease. But what if I need to move to China? I speak C, but what if C falls out of favor for something else? I decided it was time to try something else.

But What?

C and it’s cousins broadly represent the Procedural and Object Oriented paradigms. We’ve all been there and done that. Procedures and Subroutines may or may not take arguments, do something, and may or may not return a result. The global or local state may or may not change. Loops happen. I don’t think it is a stretch to say that these are the two most mainstream paradigms. For the purposes of this blog post, I’m going to lump the Procedural and Imperative paradigms together. I understand that they are not the same thing, but roughly speaking, the Procedural paradigm is an evolution of the Imperative paradigm.

This leaves us with Functional Programming. Unlike the functions of an Object Oriented or Procedural language, the functions of a Functional language closely resemble those in math. In math, f(x+2), where x = 2, will always return 4. Similarly, a function in a Functional language will always return the same result given the same input. Where a function in a Procedural or Object Oriented language describes the steps to perform some task (usually, this involves some sort of loop construct), a function in a functional language just describes what the result of some function is. (usually involving recursion) f(x+f(x-1)) adds x to the result of f(x-1), which recursively adds x-1 to f(x'-1) and so on until the end of time.

So, what programming language to choose? Many languages support functional programming to an extent. Python, Lua, and even C# if you squint hard enough. However, these languages are multi-paradigm. As such, it will be easy to fall back into my C Like ways. What about Lisp?

Lisp is a family of languages: Common Lisp, Scheme, Clojure, Emacs Lisp. Sure, I could learn one, and theoretically be able to transition with ease, but this isn’t a level of fragmentation that I’m comfortable with. In addition, Lisps are multi-paradigm, so I’m more likely to not keep the faith. Which leaves me with…

Haskell

Haskell is a “pure” Functional programming language. While any useful program must have the side effect of reading from or writing to some external source, Haskell places that part of the program neatly in a corner. Let’s talk about some of the neat features of Haskell:

Lazy Evaluation

Expressions in Haskell are evaluated lazily. What this means is that a value isn’t computed until it’s needed. Let’s take a look at an example:

embiggen :: Int -> [Int] embiggen x = x:embiggen (x + 1)

This function takes an integer, and creates a list out of it. (Lists in Haskell behave much the same way as a normal linked list: O(1) insertion, O(n) traversal) The passed-in integer is pushed on to the front of the list resulting from embiggen (x + 1). You may have noticed that this function will go on forever. While maybe not ideal, this is ok in Haskell because of Lazy evaluation. The infinityeth element of this list will not be evaluated until it’s needed!

show (take 5 (embiggen 5)) [5,6,7,8,9] show (embiggen 5) !! 17 22 show (embiggen 5) [OMG INFINITE RECURSION!!!!!]

In the first example, we call the library function take, which returns a list containing the first n elements of the passed in list. In the second example, we call the library function !! (all operators are functions), which is the list indexing operator, which returns the nth element of the list. In a language with strict evaluation, the list would need to be completely evaluated before these things could happen. In Haskell, it doesn’t! Only in the third example, where we attempt to call show on the entire list, does infinite recursion occur.

Tail Call Optimization

This is one of those terms that gets thrown around a lot, but what does it actually mean? The short answer is that it prevents a recursive function call from consuming a new stack frame. In a language without this feature, if foo() calls itself, the new call will consume a new stack frame. This will cause a stack overflow if allowed to go on too long. In Haskell, this isn’t a problem because of Tail Call Optimization.

Type System

Haskell’s type system is quite different from the usual type systems. Sure there are Ints, Chars, Floats, Bools, and the like, but there’s more to it than that. Haskell is very strongly typed. There is no casting in Haskell, if a function takes an Int, there’s no getting around giving it an Int. However, the whole type system operates in a manner similar to generics in languages like C++ or Java. Take the following examples:

putInList :: a -> [a] putInList thing = [thing] addStuff :: (Num a) => a -> a -> a addStuff lhs rhs = lhs + rhs

The first function takes some arbitrary type, and returns a singleton list containing the passed-in argument. Much like generics, the argument shouldn’t depend on any type-specific behavior.

The second function takes two arguments of the same type that behaves like a number (Int, Float, Double, and friends) adds them, and returns the result. the addStuff function accomplishes this by specifying that arguments of type a should be members of the Num Typeclass. Despite the word “class”, Typeclasses aren’t the same as classes in Object Oriented languages. You CAN think of them as being the same as Java’s interfaces. When you create a type, you can make it a member of any number of Typeclasses. You must then implement the functions specified by the Typeclass, just like when a class in Java implements some interface, it must define the methods of that interface.

This is just the tip of the iceberg, but I’m sure you’re beginning to see how you can make very general functions in a very type-safe way.

Partial Function Application

A feature of functional programming is higher order functions. This means that functions can take functions as arguments, and functions can return functions. While nice, this isn’t exactly a new concept. Even C supports this to an extent with function pointers. What is new is partial application of functions. Recall the addStuff function above. It takes two arguments of type a and returns a result of type a. Now let’s look at an example:

doNumFunc :: (Num a) => (a -> a) -> a -> a doNumFunc f a = f a addThree :: (Num a) => a -> a addThree a = addStuff 3 a

The doNumFunc function takes a function that takes a type a and returns a type a (This is what (a -> a) means), and a second type a, and returns a type a. doNumFunc calls the passed in function with the second passed in argument. The addThree function takes a type a and returns a type a. addThree takes an argument, and calls the addStuff function we defined earlier with its argument and 3. How does this all pan out?

addThree 3 6 doNumFunc addThree 3 6

Seems pretty straightforward, right? Though, this isn’t very re-usable. What if I want to add 4? Do I need to define a function addFour? No, I can partially apply addStuff. If you call a function in Haskell with less arguments than it takes, it will return a function that takes the remaining arguments and returns a result! Observe:

doNumFunc (addStuff 3) 3 6

Now things are getting cool. By calling (addStuff 3), we’ve created a function that takes a type a, adds 3 to it, and returns the result! You can’t do that in C!

Getting Started

Excited yet? You know you are, don’t try to act like you’re not. But how does one get started? Like any language, you need two things to begin: a compiler/interpreter and some reading material.

Environment

First up, you should go download the Haskell Platform. This package contains your compiler/interpreter and all the standard libraries. Haskell can be compiled, or interpreted. Or, you could use ghci, the interactive interpreter, if you just want to doop around and try stuff.

If you’re running a Linux distro, haskell-platform is likely in the repositories. In Debian or Ubuntu, it’s a simple:

sudo apt-get install haskell-platform

… and you’re set! Unfortunately, there doesn’t seem to be a great IDE for Haskell. NetBeans definitely has nothing to offer in this regard. Luckily for us, Haskell is simple enough to not really need an IDE. GEdit, the default text editor that ships with Gnome, has built-in syntax highlighting for Haskell. Just enable the built-in terminal in GEdit to test stuff and you should be good to go. I like to run ghci in the embedded terminal to test functions as I write them. Plus, as you code, you can periodically attempt to load the script in ghci to make sure everything is formatted correctly and you haven’t messed up your syntax/types.

Literature

One of the biggest barriers to learning a new language is money. Nobody wants to put down cold hard cash on learning something new when what they have is working just fine. Luckily for us, you can learn you a Haskell for free! Learn You A Haskell For Great Good is a beginner’s guide to learning Haskell aimed at developers coming from C Like Languages. The best part is that the whole book is available to read online for free! You can check it out for the low-low price of zero dollars. If you like it, maybe you buy a copy for your bookshelf. Or maybe you just spread the word.

Whatever you do, you should have a good base of knowledge in Haskell. At that point, you can just consult Hoogle to learn more.

I Used a Regex, Now I Have Two Problems…

So, I’m 16 posts in on this blog now. Every post, I’m getting fancy with code blocks and such. You know,

like this.

To make one of those, it takes a div, span, code, and pre tag, along with some custom CSS. You know how irritating that gets typing it over and over again? Did you guess “very”? You’d be right.

Luckily, I’m a programmer. I know regular expressions! I decided I’d try to remember how to do Perl. I wrote a script to handle regex find and replacing these tags. It can be found here.

I’ll not go over each line of that script, but here are some of the highlights:

On line 19:

if ($current =~ /^#/) { #comment found }

Checks to see if a line begins with a #. If it does, that line is a comment.

On line 23:

elsif ($current =~ /=/) { my @working = split(/=/, $current, 2); chomp($working[0]); chomp($working[1]); $return_value{$working[0]} = $working[1]; }

Checks to see if the current line contains an equals sign. If it does, it splits the current line, using the first equals sign as the delimeter. Thereby allowing replacement text to contain an equals sign. The two tokens then have leading and trailing whitespace trimmed before being added to the return_value hash.

On line 48:

for my $pair (keys(%config)) { $current =~ s/$pair/$config{$pair}/; }

Iterates through the token/replacement pair, replaces tokens in the current line.

The script takes a text file as an argument, reads a dmp_helper.rc file located in the same directory as the script, and prints the expanded text to STDOUT.

Given an input file containing:

I'm %j% a %p% %b%, nobody loves me... He's %j% a %p% %b%, %f% a %p% family! spare him his life %f% %t% monstrosity!

…and a dmp_helper.rc containing:

%j%=just %p%=poor %f%=from %t%=this %b%=boy

…this output is produced:

I'm just a poor boy, nobody loves me... He's just a poor boy, from a poor family! Spare him his life from this monstrosity!
%d bloggers like this: