Non-recursive DFS

Depth-first search (DFS) is a powerful algorithm. The recursive DFS algorithm is probably the easiest to understand, and most of the time it is good enough.

In some problems, the recursion stack gets very large. This can easily blow the stack limit in Python. Fortunately, there is a simple way to make the algorithm iterative.

Recursive DFS implementations sometimes ignore the pre or post numbering of the vertexes, because they aren't needed – the work is done in the recursive call. Those numbers are key to the iterative algorithm.

The main idea of the recursive algorithm is to visit every vertex, and at each visit:

  • Set the pre-order number.

  • Visit all unvisited children.

  • Set the post-order number.

In a recursive algorithm, we can also do some work before visiting the child vertexes, while visiting them, or after.

The pre-order and post-order numbers are the key to making DFS iterative. We will use an explicit stack for this algorithm.

First, we add the starting vertex to the stack. Then, while the stack is non-empty, we pop the last vertex off of the stack and do the following:

  1. If the vertex has a post-order number, skip it.

  2. Otherwise, if the vertex has a pre-order number, we are in the post-child step.

  1. Set the post-order number and do any post processing.

  1. Otherwise, we are now visiting the vertex for the first time.

  1. Set the pre-order number.

  2. Add the vertex back to the stack.

  3. Add all of its unvisited children to the stack.

Let's informally check if this makes sense.

  • A vertex is added to the stack only if it has not been visited (pre-order is not set), or while it is being visited, before its children are added to the stack. (3a and 3b)

  • A vertex only hits step 3 once. After step 3, it has a pre-order number, so the next time the vertex is processed, it will go to step 2.

  • The vertex only makes it to step 2 after its children have been processed.

  • So, step 2 is the post-order step.

Finally, here is some code

class DFS(object):
    def __init__(self,adj):
        self.adj = adj
        self.N = len(adj)
        self.pre = [-1 for x in adj]
        self.post = [-1 for x in adj]
        self.pre_count = 0
        self.post_count = 0

    def dfs(self, i):
        stack = [i]
        while stack:
            u = stack.pop()
            if self.post[u] > -1:
                continue
            elif self.pre[u] > - 1:
                self.post[u] = self.post_count
                self.post_count += 1
            else:
                stack.append(u)
                self.pre[u] = self.pre_count
                self.pre_count += 1
                for v in self.adj[u]:
                    if self.pre[v] == -1:
                        stack.append(v)

This code only handles starting a search tree from a specific vertex. To search the entire graph, the dfs method needs to be called for every vertex. That order is up to the caller. Pre-order or post-order steps can be added to this code, or it could be generalized to provide hooks for sub-classes to use.

JavaScript QuickSort update

The code in this post was a quicksort that used the less than operator for comparison. Here are some times of quicksort using a comparison function the same way the Array.sort works.

The times are quite a bit slower, and I need to dig into it further. These are the means of five runs each on one million values. Some of the cases (triangular, random) had a very large variation between runs.

Timing results on 1 million values (in seconds)

data type

qsort2

Array.sort

qsort2cmp

random double

0.14

0.95

0.38

sawtooth 1

0.07

0.06

0.08

triangular 2

0.14

0.71

0.42

ordered

0.03

0.46

0.13

reversed

0.03

0.59

0.10

shuffled

0.13

0.65

0.28

1

The values in the array are 0, 1, 2, 3, 0, 1, 2, 3, … the whole way to the end of the array.

2

The value in the array are 0, 1, …, N/2-1, N/2, N/2-1, …, 1,0

What does this tell us? Not much yet. I have a lot to learn about the performance of JavaScript engines.

Raytracer

The last couple of days I have been working on a raytracer in JavaScript. Years ago I wrote one in C to learn out to do OO in C, and now I am doing it in JavaScript to see what is possible.

Read more…

Quick select

Why would a programmer want to write their own sort function?

First, and most importantly, to learn about algorithms and programming. Also, a language or library sometimes has an inadequate sort function for a specific situation.

There is also a problem in the wild that a custom sort is nice to use for.

Finding the median of an array of data (or any other percentile, for that matter.) Quickselect is closely related to Quicksort. Instead of recursively sorting both halves of the partition created in the partitioning step, we can just follow the one that contains the element we are looking for.

The C++ STL contains a partition function that can do just what we want. Sometimes a language / library doesn't include this functionality.

Quickselect is an \(O(n)\) algorithm in the average case, which sounds much better than the \(O(n \log n)\) case for Quicksort. But, once \(n\) gets large enough, these two grow at about the same rate. The bigger win is that Quickselect has a lower constant, so even though both it and Quicksort grow at about the same rate, Quickselect is a few times faster on random data.

So, here is a quickMedian function. This function keeps calling a partitioning function on the subarray that contains the middle point. (Not really the median … but it is our example.)

The partition function returns and object containing several indexes.

  • a—the start of the left partition

  • b—the end of the left partition

  • c—the start of the right partition

  • d—the end of the right partition

We also jump out early instead of breaking the array down to a single element, and use insertion sort on the last chunk of the array.

    function quickMedian(xs) {
        var N = xs.length,
            start = 0, end = N,
            middle = Math.floor(N/2),
            idxs;

        while(end - start > INSERT_SORT_THRESHOLD) {
            idxs = qsort2Partition(xs, start, end);

            if(idxs.b > middle) {
                // search left
                end = idxs.b;
            } else if (idxs.c < middle) {
                start = idxs.c;
            } else {
                // We have contained the element!
                break;
            }
        }

        isort(xs, start, end);
        if(middle < start || middle > end) {
            throw 'bug... partitioning failed';
        }
        return xs[middle];
    }

This function can be generalized to find a different value than the center value, or even to find several percentiles at once.

Here are some non-representative timing results, from my Mac, on Chrome.

Timing results on 1 million values (in seconds)

data type

quick sort

quick select median

random double

0.15

0.02

sawtooth 1

0.02

0.02

ordered

0.04

0.02

reversed

0.05

0.02

shuffled

0.15

0.03

1

The values in the array are 0, 1, 2, 3, 0, 1, 2, 3, … the whole way to the end of the array.

Here is a link to the test driver that launches directly into the tests. It runs in Chrome, but hasn't been tested extensively.

Alpha-Beta Pruning

Another page I made to play with is a Tic-Tac-Toe game.

Originally, I just used the MiniMax algorithm for the player. The game is a small one, and MiniMax can solve the complete game pretty quickly.

On my very old phone, though, there was a noticeable delay if the game searched the whole way to the end from the first move.

So, I converted MiniMax to Alpha-Beta. I always manage to mess that up on the first try, but eventually got it working. For a game like this, the result is still amazing. With MiniMax, the first move required about 550,000 searches to get an answer. With Alpha-Beta, that is down to less than 20,000.

And now it works fine on my phone.

Reversi in JavaScript, again

Continuing down the path of mucking with JavaScript, I have ported Peter Norvig's Reversi code from Lisp to JavaScript.

The code is in a pretty ugly state, but it is running. Besides being ugly, the code doesn't do any feature detection or graceful degradation. Though I would like to make it pretty solid, the purpose of this was to test out JavaScript for algorithmic work, not so much for the front-end work. Check it out here.

In this version, I have used Dr. Norvig's data structures, but wrapped it so it can display on an HTML canvas, and a person can play against it on the canvas.

Some of the stuff going on here is:

  • Alpha-Beta game tree searching

  • Position evaluation using features from Logistello

  • Learning via Samuel's strategy of using the look-ahead to train the evaluation function

  • Some HTML5 stuff, like application caching, and maybe a webworker in the future.

JavaScript Quicksort

Quicksort, yet again. This time, in JavaScript!

I have been using JavaScript more frequently, and have been pleasantly surprised at how fast it is.

So, I naturally wanted some test cases of things that I always thought JavaScript was too slow for. Sorting was one of those cases.

To check it out, I wrote some quicksort functions and compared them to Array.sort. Here is the unscientific comparison.

Using Engineering a Sort Function as a reference, along with textbooks, etc, I wrote the following:

function qsort1(xs, start, end) {
    start = start || 0;
    if(end === undefined) { end = xs.length;}

    if(end < start + INSERT_SORT_THRESHOLD) {
        isort(xs, start, end);
        return xs;
    }

    let i = start - 1,
        j = end;
        // pidx = Math.floor(Math.random() * (end - start)) + start,

    const pivot = medianOfThree(xs[start],
                                xs[Math.floor((start + end) / 2)],
                                xs[end-1]);
    while(i < j) {
        i += 1;
        while(i < j && xs[i] < pivot) {
            i += 1;
        }

        j -= 1;
        while(i < j && pivot < xs[j]) {
            j -= 1;
        }

        if(i < j) {
            const t = xs[i];
            xs[i] = xs[j]; xs[j] = t;
        }
    }
    if(xs[i] < pivot) { throw 'invariant'; }

    qsort1(xs, start, i);
    qsort1(xs, i, end);
    return xs;
}

This method:

  • Uses a median of three pivot strategy

  • Leave the pivot in place while partitioning

  • Is recursive down both halves of the partition.

Lines 18 – 32 partition the data.

Originally, I used a random pivot. That works well, but the median of three can help partition the data more evenly, leading to less work.

The pivot value is copied out to a temporary during the partitioning. In JavaScript, this isn't as big of a deal, because objects besides primitives are basically reference types anyways … so copying the pivot value in that case is more like a pointer copy.

A common case that I care about is when there are many equal values in the array. This version of quicksort does not handle that well.

The next version uses the fat partitioning described in the paper to group all of the values equal to the pivot.

function qsort2(xs) {
    sort(xs, 0, xs.length);
    return xs;

    // The actual sort body.
    function sort(xs, start, end) {
        if(end < start + INSERT_SORT_THRESHOLD ) {
            isort(xs, start, end);
            return;
        }

        let i = start - 1,
            j = end,
            u = i,
            v = j;
        const pivot = medianOfThree(xs[start],
                                    xs[Math.floor((start + end) / 2)],
                                    xs[end-1]);
        // Reorder the values, and maintain the indices so we have
        // the following:
        //
        // begin (inclusive)  |  end (exclusive)  |   description
        // start              |  u                |   == pivot
        // u                  |  i                |    < pivot
        // i                  |  v                |    > pivot
        // v                  |  end              |    = pivot

        while(i < j) {
            i += 1;
            while(i < j && xs[i] < pivot) {
                i += 1;
            }

            j -= 1;
            while(i < j && pivot < xs[j]) {
                j -= 1;
            }

            if(i < j) {
                const t = xs[i];
                xs[i] = xs[j]; xs[j] = t;
                /* jshint ignore:start */
                // We are ignoring the !.  Just trying to stick
                // with < for now...
                if(!(xs[i] < pivot)) {
                    u += 1;
                    const t = xs[i];
                    xs[i] = xs[u]; xs[u] = t;
                }
                if( !(pivot < xs[j])) {
                    v -= 1;
                    const t = xs[j];
                    xs[j] = xs[v]; xs[v] = t;
                }
                /* jshint ignore:end */
            }
        }

        // if(xs[i] < pivot) { throw 'invariant'; }

        // Now, 'flip' the sides around to bring the values equal
        // to the pivot to the middle.

        j = vecswap(xs, i, v, end);
        i = vecswap(xs, start, u + 1, i);

        // xs[i] >= pivot.
        sort(xs, start, i);
        sort(xs, j, end);
    }
}

Lines 45 – 52 swap values equal to the pivot to the outer part of the array. At the end of the partition the pivot values are brought back together in the center.

These sort routines use the < operator. The Array.sort routine requires a comparison operator that returns -1, 0, or 1 for less than, equal to, and greater than. The custom routines are less general (so far!)

Here are some timing results on Chrome comparing the two sorts and the system sort. These tests are unfair and likely misleading, but they are a start.

Timing results on 1 million values (in seconds)

data type

qsort1

qsort2

Array.sort

NumPy

random double

0.13

0.15

1.06

0.11

sawtooth 1

0.09

0.02

0.04

0.02

ordered

0.04

0.04

0.59

0.01

reversed

0.02

0.03

0.59

0.02

shuffled

0.21

0.18

0.68

0.09

1

The values in the array are 0, 1, 2, 3, 0, 1, 2, 3, … the whole way to the end of the array.

qsort1 and qsort2 are far less generic than Array.sort, and there are likely several errors in how I am doing this timing. Sometimes, for example I seem to be getting GC pauses while doing the test, but I have not confirmed what is going on yet. Overall, I am optimistic that in these cases, JavaScript can be made to run in a reasonable amount of time. Overall, the timing is pretty variable. That will be worth investigating before too long.

The NumPy values are in there for comparison. I am assuming that the NumPy sort is in native code underneath, and it is far easier to time python code than to write some C++ wrappers just for a comparison.

Here is a link to the test driver that launches directly into the tests.

EOPL in Javascript

I have been working through EOPL again, this time programming in JavaScript. My JS experience is limited, but I finally know enough that it isn't as painful anymore.

Some things that have helped:

  • The debugging tools built into browsers

  • Node.js

  • Emacs -- js2-mode, flymake-jslint, and PEG.js

  • And finally knowing enough that I don't have to look stuff up all of the time.

Prime sieve in JavaScript

Update : After running the C version through Emscripten, the result is about the same speed as the C version when run through Firefox. Wow!

Update : Switching from a Javascript Array to Int8Array halved the time in Firefox, making it comparable to C and Emscripten. (Until moving C to 8 bits, then things got faster there as well …).

Ok, time for some more prime sieving. This time, we are going to try JavaScript, Python, and C.

First, the Python code

import numpy as np

def sieve1(n):
    values = np.arange(n)
    values[1] = 0
    # values[4::2] = 0
    for i in xrange(n):
        i2 = i*i
        if i2 >= n:
            break
        if values[i]:
            values[i2::i] = 0
    result = values[values > 0]
    return result

This is the normal sieve. The inner loop is in NumPy, and running

%timeit ps = primes_np.sieve1(15485864)

Gives me a list of first million primes in about 425 ms on my laptop.

Next, C.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main() {
  const int n = 15485864;
  clock_t start = clock();
  int *arr = malloc(n * sizeof(int));

  for(int i = 0; i < n; ++i) { arr[i] = i; }
  arr[1] = 0;

  for(int i = 0; i < n; ++i) {
    int d = i * i;
    if(d >= n) { break; }
    if(arr[i]) {
      for(int j = d; j < n; j += i) {
        arr[j] = 0;
      }
    }
  }
  int count = 0;
  for(int i = 0; i < n; ++i) {
    if(arr[i]) { ++count; }
  }
  clock_t end = clock();
  printf("found %d primes less than %d\n", count, n);
  double duration = (end - start) / (double)CLOCKS_PER_SEC;
  printf("took %f seconds\n", duration);
}

This code took about 280 ms. (clang -O3)

Finally, the JavaScript version.

const create_array = (() => {
    // Create an array of values as the result of calling fcn[i] for 0
    // <= i < n
    if(typeof Int8Array === 'function') {
        return (n) => {
            return new Int8Array(n);
        };
    } else {
        return () => {
            return new Array();
        };
    }
})();

const array_init = function(fcn, n) {
    const result = create_array(n);
    for(let i = 0; i < n; i++) {
        result[i] = fcn(i);
    }
    return result;
};

// Simple sieve.
const sieve1 = function(n) {
    // alert("version 2");
    const values = array_init(() => 1, n);

    values[1] = 0;
    values[0] = 0;

    for(let i = 0; i < n; i++) {
        if(values[i] > 0) {
            if(i*i > n) { break; }
            for(let j = i*i; j < n; j += i) {
                values[j] = 0;
            }
        }
    }

    const result = [];
    for(let j = 0; j < values.length; j += 1) {
        if(values[j] > 0) {
            result.push(j);
        }
    }
    return result;
};


export {sieve1};

This takes around 650 ms on the same computer.

None of these examples were tweaked or tested rigorously …

Just for fun, you can run the JavaScript on this page. The millionth prime should be 15,485,863.

See the results here …