Archive | Generics RSS for this section

Everything Is An Integer

Today I found myself plumbing the murky depths of C++. My latest task: file IO.

C++ provides a type fstream. Being a proper object-oriented system, the class hierarchy is completely indecipherable. However, the actual use of this type is fairly simple. An instance of fstream has a “get pointer” and a “put pointer”. To write to the file, you must seek the put pointer to a location, then write. To read, you must seek the get pointer, then read. Fairly simple, right?

The seek methods: seekp and seekg are overloaded:

ostream& seek[pg] (streampos pos); ostream& seek[pg] (streamoff off, ios_base::seekdir way);

Should be no problem, right? Let’s de-typedef-ify that:

ostream& seek[pg] (int pos); ostream& seek[pg] (int off, int way);

Ok, still no big deal. This is C++ after all, enums are integers. So, I carry on my merry way. I implemented the following functions:

template <class to_write> static void write_out(fstream * fio, offset location, const to_write & value) { fio->seekp(location, ios_base::end); write_out<to_write>(fio, value); } template <class to_write> static void write_out(fstream * fio, const to_write & value, streampos position) { fio->seekp(position); write_out<to_write>(fio, value); } template <class to_read> static void read_in(fstream * fio, to_read & into_location) { ...stuff... }

The idea: wrap the seeking and writing into one operation, so my high level code need not concern itself with it. The behavior:

write_out<long>(foo, 8, ios_base::beg);

…and…

offset bar = 0; write_out<long>(foo, bar, 0);

…both result in the first function being called! Were this a language with a proper type system, all those types would be distinct and the function overloading would work as advertised. Unfortunately, this is C++, where there is only 1 type: byte.

Let this serve as a warning, so you don’t spend half your day troubleshooting such a stupid problem.

Now For Something Completely Different

For as long as I’ve been trying (successfully or not) to program, I’ve been using C like languages. When I was a kid, I struggled in vain to learn C++. As an adult, I learned Java. After that, I used Java as a spring-board into the wonderful world of C Like Languages: C, C++, Perl, Lua. I wrote hello world in dozens of others as well. I found myself proudly proclaiming that “I’m confident I could pick up any C Like Language!”

Then one day I thought “what about the rest of them?” Sure, maybe I can speak Latin. Maybe I can pick up any Latin based language with relative ease. But what if I need to move to China? I speak C, but what if C falls out of favor for something else? I decided it was time to try something else.

But What?

C and it’s cousins broadly represent the Procedural and Object Oriented paradigms. We’ve all been there and done that. Procedures and Subroutines may or may not take arguments, do something, and may or may not return a result. The global or local state may or may not change. Loops happen. I don’t think it is a stretch to say that these are the two most mainstream paradigms. For the purposes of this blog post, I’m going to lump the Procedural and Imperative paradigms together. I understand that they are not the same thing, but roughly speaking, the Procedural paradigm is an evolution of the Imperative paradigm.

This leaves us with Functional Programming. Unlike the functions of an Object Oriented or Procedural language, the functions of a Functional language closely resemble those in math. In math, f(x+2), where x = 2, will always return 4. Similarly, a function in a Functional language will always return the same result given the same input. Where a function in a Procedural or Object Oriented language describes the steps to perform some task (usually, this involves some sort of loop construct), a function in a functional language just describes what the result of some function is. (usually involving recursion) f(x+f(x-1)) adds x to the result of f(x-1), which recursively adds x-1 to f(x'-1) and so on until the end of time.

So, what programming language to choose? Many languages support functional programming to an extent. Python, Lua, and even C# if you squint hard enough. However, these languages are multi-paradigm. As such, it will be easy to fall back into my C Like ways. What about Lisp?

Lisp is a family of languages: Common Lisp, Scheme, Clojure, Emacs Lisp. Sure, I could learn one, and theoretically be able to transition with ease, but this isn’t a level of fragmentation that I’m comfortable with. In addition, Lisps are multi-paradigm, so I’m more likely to not keep the faith. Which leaves me with…

Haskell

Haskell is a “pure” Functional programming language. While any useful program must have the side effect of reading from or writing to some external source, Haskell places that part of the program neatly in a corner. Let’s talk about some of the neat features of Haskell:

Lazy Evaluation

Expressions in Haskell are evaluated lazily. What this means is that a value isn’t computed until it’s needed. Let’s take a look at an example:

embiggen :: Int -> [Int] embiggen x = x:embiggen (x + 1)

This function takes an integer, and creates a list out of it. (Lists in Haskell behave much the same way as a normal linked list: O(1) insertion, O(n) traversal) The passed-in integer is pushed on to the front of the list resulting from embiggen (x + 1). You may have noticed that this function will go on forever. While maybe not ideal, this is ok in Haskell because of Lazy evaluation. The infinityeth element of this list will not be evaluated until it’s needed!

show (take 5 (embiggen 5)) [5,6,7,8,9] show (embiggen 5) !! 17 22 show (embiggen 5) [OMG INFINITE RECURSION!!!!!]

In the first example, we call the library function take, which returns a list containing the first n elements of the passed in list. In the second example, we call the library function !! (all operators are functions), which is the list indexing operator, which returns the nth element of the list. In a language with strict evaluation, the list would need to be completely evaluated before these things could happen. In Haskell, it doesn’t! Only in the third example, where we attempt to call show on the entire list, does infinite recursion occur.

Tail Call Optimization

This is one of those terms that gets thrown around a lot, but what does it actually mean? The short answer is that it prevents a recursive function call from consuming a new stack frame. In a language without this feature, if foo() calls itself, the new call will consume a new stack frame. This will cause a stack overflow if allowed to go on too long. In Haskell, this isn’t a problem because of Tail Call Optimization.

Type System

Haskell’s type system is quite different from the usual type systems. Sure there are Ints, Chars, Floats, Bools, and the like, but there’s more to it than that. Haskell is very strongly typed. There is no casting in Haskell, if a function takes an Int, there’s no getting around giving it an Int. However, the whole type system operates in a manner similar to generics in languages like C++ or Java. Take the following examples:

putInList :: a -> [a] putInList thing = [thing] addStuff :: (Num a) => a -> a -> a addStuff lhs rhs = lhs + rhs

The first function takes some arbitrary type, and returns a singleton list containing the passed-in argument. Much like generics, the argument shouldn’t depend on any type-specific behavior.

The second function takes two arguments of the same type that behaves like a number (Int, Float, Double, and friends) adds them, and returns the result. the addStuff function accomplishes this by specifying that arguments of type a should be members of the Num Typeclass. Despite the word “class”, Typeclasses aren’t the same as classes in Object Oriented languages. You CAN think of them as being the same as Java’s interfaces. When you create a type, you can make it a member of any number of Typeclasses. You must then implement the functions specified by the Typeclass, just like when a class in Java implements some interface, it must define the methods of that interface.

This is just the tip of the iceberg, but I’m sure you’re beginning to see how you can make very general functions in a very type-safe way.

Partial Function Application

A feature of functional programming is higher order functions. This means that functions can take functions as arguments, and functions can return functions. While nice, this isn’t exactly a new concept. Even C supports this to an extent with function pointers. What is new is partial application of functions. Recall the addStuff function above. It takes two arguments of type a and returns a result of type a. Now let’s look at an example:

doNumFunc :: (Num a) => (a -> a) -> a -> a doNumFunc f a = f a addThree :: (Num a) => a -> a addThree a = addStuff 3 a

The doNumFunc function takes a function that takes a type a and returns a type a (This is what (a -> a) means), and a second type a, and returns a type a. doNumFunc calls the passed in function with the second passed in argument. The addThree function takes a type a and returns a type a. addThree takes an argument, and calls the addStuff function we defined earlier with its argument and 3. How does this all pan out?

addThree 3 6 doNumFunc addThree 3 6

Seems pretty straightforward, right? Though, this isn’t very re-usable. What if I want to add 4? Do I need to define a function addFour? No, I can partially apply addStuff. If you call a function in Haskell with less arguments than it takes, it will return a function that takes the remaining arguments and returns a result! Observe:

doNumFunc (addStuff 3) 3 6

Now things are getting cool. By calling (addStuff 3), we’ve created a function that takes a type a, adds 3 to it, and returns the result! You can’t do that in C!

Getting Started

Excited yet? You know you are, don’t try to act like you’re not. But how does one get started? Like any language, you need two things to begin: a compiler/interpreter and some reading material.

Environment

First up, you should go download the Haskell Platform. This package contains your compiler/interpreter and all the standard libraries. Haskell can be compiled, or interpreted. Or, you could use ghci, the interactive interpreter, if you just want to doop around and try stuff.

If you’re running a Linux distro, haskell-platform is likely in the repositories. In Debian or Ubuntu, it’s a simple:

sudo apt-get install haskell-platform

… and you’re set! Unfortunately, there doesn’t seem to be a great IDE for Haskell. NetBeans definitely has nothing to offer in this regard. Luckily for us, Haskell is simple enough to not really need an IDE. GEdit, the default text editor that ships with Gnome, has built-in syntax highlighting for Haskell. Just enable the built-in terminal in GEdit to test stuff and you should be good to go. I like to run ghci in the embedded terminal to test functions as I write them. Plus, as you code, you can periodically attempt to load the script in ghci to make sure everything is formatted correctly and you haven’t messed up your syntax/types.

Literature

One of the biggest barriers to learning a new language is money. Nobody wants to put down cold hard cash on learning something new when what they have is working just fine. Luckily for us, you can learn you a Haskell for free! Learn You A Haskell For Great Good is a beginner’s guide to learning Haskell aimed at developers coming from C Like Languages. The best part is that the whole book is available to read online for free! You can check it out for the low-low price of zero dollars. If you like it, maybe you buy a copy for your bookshelf. Or maybe you just spread the word.

Whatever you do, you should have a good base of knowledge in Haskell. At that point, you can just consult Hoogle to learn more.

Implementing a Vector in C++: Working Smarter, Not Harder

You may remember when I created a vector in C. Yep, just another day in the life… How about we re-invent this wheel again?

This time we’ll be creating a vector in C++. This is much easier to do in C++ thanks to generics and operator overloading. Since this is a template class, we’ll be implementing the class in our .h file, forgoing the .cpp file. First we declare our class. Ensure that you declare it a template class.

namespace clt {
namespace dmp {

template <typename T>
class vector {

..words..

};
}
}

Remember namespaces? Gotta have those namespaces. Aside from that, nothing particularly special about this block. Next I’ll go over the methods and members of this class.

First up: our destructor. Nothing special going on here, we’re just going to delete our array pointer to ensure proper Resource Acquisition is Initialization (RAII). We’re going to let it be virtual to allow for subclassing.

virtual ~vector()
{
delete [] backing_array;
}

Next I’ll discuss the private method: initialize(). This method is called by all constructors and does 3 things: sets backing_array_size, resize_increment, and news up our array:

void initialize(const unsigned size, const unsigned increment)
{
backing_array_size = size;
resize_increment = increment;
backing_array = new T[backing_array_size];
}

Next up is our default constructor. It sets size and increment to 10. I like 10, it’s a nice, noble number. The kind of number you’d bring home to Mom and Dad:

vector()
{
initialize(10, 10);
}

Next we’ll talk about a private method: array_copy(). This method does pretty much what it says: it iterates through the original array and copies its contents into the new array:

void array_copy(T * orig_array, T * new_array)
{
for (int count = 0; count < backing_array_size; count++)
{
new_array[count] = orig_array[count];
}
}

Next to talk about is our copy constructor. It initializes the new vector with the old vector’s size and increment, then calls array_copy:

vector(const vector& orig)
{
initialize(orig.backing_array_size, orig.resize_increment);
array_copy(orig.backing_array, backing_array);
}

Moving along to operator=, which basically functions the same as the copy constructor, aside from the obligatory return statement:

vector& operator=(const vector& orig)
{
initialize(orig.backing_array_size, orig.resize_increment);
array_copy(orig.backing_array, backing_array);
return * this;
}

Next up: resize(). This private method creates a new T array sized backing_array_size+resize_increment. Then, it calls array_copy to copy the contents of the old array to the new. Finally, it increments backing_array_size by resize_increment, calls delete [] on backing_array, and sets backing_array to the new array:

void resize()
{
T * new_b_a = new T[backing_array_size + resize_increment];
array_copy(backing_array, new_b_a);
backing_array_size = backing_array_size + resize_increment;
delete [] backing_array;
backing_array = new_b_a;
}

Which brings us to operator[]. This method indexes into backing_array. If to_get >= size, it calls resize() until to_get is a valid index. It should be noted that this method does not prevent you from attempting to read into a null pointer:

T& operator[](const unsigned to_get)
{
while (to_get >= backing_array_size)
{
resize();
}
return backing_array[to_get];
}

Finally, we have a setter for resize_increment and a getter for backing_array_size:

void set_resize_increment(const unsigned to_set)
{
resize_increment = to_set;
}

const unsigned size()
{
return backing_array_size;
}

Lastly, this class has 3 private fields: a pointer array to the contents of the vector, the size of the array, and the resize increment:

T * backing_array;
unsigned backing_array_size;
unsigned resize_increment;

…and that’s all there is to it. No casting void pointers, no manual resource allocation, and no unweildly method names. Or you could just use std::vector I guess, but what would be the fun in that?