published on September 9th, 2013

The Haskell wiki gives as one of the examples of the elegance of Haskell the following as a quicksort implementation in Haskell:

qsort :: Ord a => [a] -> [a]
qsort []    = []
qsort (h:t) = qsort (filter (<= h) t) ++ [h] ++ qsort (filter (> h) t)


In terms of elegance, this solution is indeed hard to beat. It is as close to Wikipedia's description of the essence of quicksort as code can get:^[Wikipedia article on quicksort.]

Quicksort first divides a large list into two smaller sub-lists: the low elements and the high elements. Quicksort can then recursively sort the sub-lists.

However, you can argue that this is not the real quicksort, because the beauty of quicksort is that it works in-place and does not require O(n) extra space as the version given above does. Therefore, the question is sometimes raised how the real quicksort would look in Haskell, given that it generally eschews mutuability and in-place update. There are of course various implementations floating around on the interwebs, but I wanted to see how an implementation using unboxed vectors looks like and how that compares in performance to a version in C++'s.

An efficient Quicksort implementation consists of two parts, the partition function, which rearranges the elements of an array so that the left part is less-or-equal to the pivot and the right part is greater and the main function which does the recursive calls on the sub-parts. Here is my Haskell version:

{-# LANGUAGE BangPatterns, ScopedTypeVariables #-}
module Main where
import           Control.Applicative ((<$>)) import qualified Data.Vector.Unboxed as V import qualified Data.Vector.Unboxed.Mutable as M import System.Environment (getArgs) import System.Clock import System.Exit (exitFailure, exitSuccess) import Control.DeepSeq (deepseq) import qualified Data.ByteString as BS import Data.ByteString.Char8 (readInt) partition :: (PrimMonad m, Ord a, M.Unbox a) => Int -> M.MVector (PrimState m) a -> m Int partition !pi !v = do pv <- M.unsafeRead v pi M.unsafeSwap v pi lastIdx pi' <- go pv 0 0 M.unsafeSwap v pi' lastIdx return pi' where !lastIdx = M.length v - 1 go !pv i !si | i < lastIdx = do iv <- M.unsafeRead v i if iv < pv then M.unsafeSwap v i si >> go pv (i+1) (si+1) else go pv (i+1) si go _ _ !si = return si qsort :: (PrimMonad m, Ord a, M.Unbox a) => M.MVector (PrimState m) a -> m () qsort v | M.length v < 2 = return () qsort v = do let !pi = M.length v div 2 pi' <- partition pi v qsort (M.unsafeSlice 0 pi' v) qsort (M.unsafeSlice (pi'+1) (M.length v - (pi'+1)) v) main :: IO () main = do args <- getArgs if length args < 2 then putStrLn "Usage: qsort RUNS FILENAME" >> exitFailure else return () let (nRuns::Int) = read (args !! 0) nums <- V.unfoldr (\s -> readInt$ BS.dropWhile isWs s) <$> BS.readFile (args !! 1) loop nRuns (do nums' <- V.thaw nums start <- getTime ProcessCPUTime qsort nums' time <- getTime ProcessCPUTime - start putStrLn$ show $fromIntegral (sec time) + ((fromIntegral$ nsec time) / 1000000000))
exitSuccess
where
loop 0 _ = return ()
loop n a = a >> loop (n-1) a

isWs !c = c == 10 || c == 20 || c == 9


All in all I'd say this is a pretty direct translation from the imperative version. For comparison, here is an implementation in C++:

#include <iostream>
#include <sstream>
#include <vector>
#include <cstdlib>
#include <fstream>

template<class T>
void swap(std::vector<T>& arr, size_t i1, size_t i2)
{
T buff = arr[i1];
arr[i1] = arr[i2];
arr[i2] = buff;
}

template<class T>
size_t partition(std::vector<T>& arr, size_t lo, size_t hi, size_t pi)
{
swap(arr, pi, hi);
T pv = arr[hi];
size_t si=lo;
for (size_t i=lo; i<hi; ++i) {
if (arr[i] < pv) {
swap(arr, i, si++);
}
}
swap(arr, si, hi);
return si;
}

template<class T>
void qsort(std::vector<T>& arr, size_t lo, size_t hi)
{
size_t n = hi-lo+1;
if (n < 2)
return;
size_t pi = partition(arr, lo, hi, lo + (hi-lo)/2);
qsort(arr, lo,   pi-1);
qsort(arr, pi+1, hi);
}

int main(int ac, char** av)
{
if (ac <= 2)
{
std::cerr << "Usage: qsort RUNS FILENAME" << std::endl;
exit(1);
}
int nRuns = atoi(av);

std::ifstream infile(av);
if (!infile.good())
{
std::cerr << "Can't open file: " << av << std::endl;
exit(1);
}

std::vector<int> input;
while (infile.good())
{
int i;
infile >> i;
input.push_back(i);
}

for (int n=0; n < nRuns; ++n)
{
std::vector<int> unsorted(input);
auto start = clock();
qsort(unsorted, 0, unsorted.size()-1);
auto end = clock();
printf("%11.9f\n", ((double) (end-start)) / CLOCKS_PER_SEC);
}
return 0;
}


Let's see how the two versions compare in terms of performance. For the comparison I generated 10.000.000 random integers, and measured the time it takes to sort them 50 times. The C++ version averages about 0.87 seconds while the Haskell version takes about 1.3 seconds. Not a bad result, but of course I would like the Haskell version to be just as fast. However, with my limited optimization skills I wasn't able to eek out any more performance of the Haskell version.