Sieve of Eratosthenes

A function to generate prime numbers is often useful. This sieve of eratosthenes is one of the simplest ways to find prime numbers, and it is much more efficient than the brute force method.

Here is a nice description of how to perform the sieve, and here is a previous post I wrote about it. This time, I rewrote it in OCaml, and got a little farther by only generating the odd prime numbers.

First, here is a helper function to take an array of integers, and return a new array that contains only the non-zero values. This will be useful later in all of the sieve functions.

(* return a new array of the non-zero items *)
let filter_non_zeros ar =
  let count = Array.fold_left (fun count x ->
    if x <> 0 then count + 1 else count) 0 ar in
  let result = Array.make count 0 in
  let rec go idx jdx =
    if jdx = count then
      result
    else if ar.(idx) = 0 then
      go (idx + 1) jdx
    else
      (result.(jdx) <- ar.(idx);
       go (idx + 1) (jdx + 1))
  in
  go 0 0

Next, the straightforward sieve, first written to use recursion instead of loops. Loops are not necessary in languages like OCaml, but sometimes they are nicer than the recursion, depending on the problem.

(* Sieve of eratosthenes, using all recursion for the looping *)
let sieve n =
  let result = Array.init (n+1) (fun i -> i) in
  let stop_index = int_of_float (sqrt (float_of_int n) ) + 1 in
  result.(1) <- 0;

  let rec loop idx =
    if idx = stop_index then
      ()
    else if result.(idx) = 0 then
      (* not prime *)
      loop (idx + 1)
    else
      let rec clear j =
        if j <= n then
          (result.(j) <- 0;
           clear (j + idx));
      in
      clear (idx * idx);
      loop (idx + 1)
  in
  loop 2;
  filter_non_zeros result

This function creates an array of potential prime numbers. Then, it walks through the array, looking for the next prime. When a new prime is found, the function clears all of the multiples of the prime in the rest of the array.

Next is a function that moves the inner most recursion into a loop.

(* Sieve of Eratosthenes, moving the inner loop into a loop *)
let sieve2 n =
  let result = Array.init (n+1) (fun i -> i) in
  let stop_index = int_of_float (sqrt (float_of_int n) ) + 1 in
  result.(1) <- 0;

  let rec loop idx =
    if idx = stop_index then
      ()
    else if result.(idx) = 0 then
      (* not prime *)
      loop (idx + 1)
    else
      (let j = ref (idx * idx) in
       while !j <= n do
         (result.(!j) <- 0;
          j := !j + idx);
       done;
       loop (idx + 1))
  in
  loop 2;
  filter_non_zeros result

That function takes the same time as the first one, and is just a difference in style.

Next is a function that does away with the recursion altogether.

(* Sieve of Eratosthenes, using all loops instead of recursion *)
let sieve3 n =
  let result = Array.init (n+1) (fun i -> i) in
  let stop_index = int_of_float (sqrt (float_of_int n) ) + 1 in
  result.(1) <- 0;
  let idx = ref 2 in
  while !idx <= stop_index do
    if result.(!idx) <> 0 then
      (let p = !idx in
       let j = ref (p * p) in
       while !j <= n do
         result.(!j) <- 0;
         j := !j + p;
       done);
    idx := !idx + 1;
  done;
  filter_non_zeros result

Finally, the last method only generates the odd primes. We can ignore even numbers right from the start, and halve the memory and timing requirements. This requires a little trickery to do all of the indexing correctly, but it isn't too bad.

(* This one ignores the even numbers entirely, so it is about
   twice as fast now *)
let sieve4 n =
  let m = (n / 2) + 1 in
  let result = Array.init m (fun i->2*i+1) in
  (* Remeber that 2 is a prime *)
  result.(0) <- 2;
  let stop_index = int_of_float (sqrt (float_of_int n) /. 2.) + 1 in

  let idx = ref 1 in
  while !idx <= stop_index do
    if result.(!idx) <> 0 then
      (* 2 * idx + 1 is the next prime.  Now
         remove its multiples.
         We want to start at p*p, and step by p.
         for example, If idx is 1, then p is 3.
         To get result(i) = 9, we need i to be 4.
         If idx is 2, and p = 5, then to get
         result(i) = 25, we need i to be 12.
         So, we can move out by idx * p, then step by p
      *)
      (let p = result.(!idx) in
       let j = ref (!idx * (p + 1)) in
       while !j < m do
         result.(!j) <- 0;
         j := !j + p;
       done);
    idx := !idx + 1;
  done;
  filter_non_zeros result

And finally, a test driver.

let main sieve_fcn =
  let n = 15485863 in
  let m = Array.length (sieve_fcn n) in
  Printf.printf "number of primes less than %d: %d\n" n m;
;;

main sieve4;;

Generating dynamic content for org mode

I wanted to create the indices for my plentiful posts from code, instead of by hand. In the past, I remember doing something like that in Emacs Wiki or Muse modes, and I wasn't sure how to do it in org mode.

Org Babel seemed like the way to go, but I had trouble getting it to work. Instead, I used dynamic blocks in org mode to create the listings.

In my index.org file, I have the following lines.

#+BEGIN: my-get-posts :path "posts/2012"
#+END:

This says to call the function org-dblock-write:my-get-posts, with the argument :path set to posts/2012. The function arguments are passed as a plist, which is basically a list of name/value pairs.

This code is called by typing C-c C-x u some where in the block, which calls (org-dblock-update) to update the region inside the block.

The update function is below. The function looks up the files in the path, and then for each file it looks up the title in the file, and finally generates a link that org will understand.

(defun org-dblock-write:my-get-posts (params)
  (let ((path (plist-get params :path)))
    (let ((files
           (mapcar (lambda (f)
                     (concat (file-name-as-directory path)
                             f))
                   (directory-files path nil "\.*org"))))
      (setq files (nreverse files))
      (dolist (file files)
        (let ((title (org-publish-find-title file)))
          (insert (format " - [[file:%s][%s]]" file title))
          (newline))))))

Here is what the results look like for the example above:

#+begin_example
** 2011
#+BEGIN: my-get-posts :path "posts/2011"
 - [[file:posts/2011/2011-12-18-stanford-online-classes.org][Stanford Online Classes]]
 - [[file:posts/2011/2011-12-18-ipython-on-mac-os-x.org][IPython on the mac]]
 - [[file:posts/2011/2011-12-18-ants-ai-challenge.org][Ants AI Challenge]]
 - [[file:posts/2011/2011-11-19-a-simple-octave-tip.org][A simple Octave tip]]
 - [[file:posts/2011/2011-11-03-posting-from-emacs-with-org2blog-wp.org][Posting from emacs with org2blog/wp]]

#+END:

This output then gets converted into the list of links that show up on the home page.

Extended Euclid's Method

In "The Art of Computer Programming, Volume 1", Donald Knuth describes the algorithm for Euclid's Method.

Euclid's method find the greatest common divisor for two numbers. The greatest common divisor of two integers \(m\) and \(n\) is the largest integer \(d\) such that both \(m\) and \(n\) are divisible by \(d\) .

The extended algorithm goes farther than just finding the gcd. The extended method finds integers \(a\) and \(b\) such that

\begin{equation*} am + bn = \gcd(m,n) \end{equation*}

This OCaml code is a straightforward translation from the algorithm in Knuth's book, and it is written longer than it needs to be to make the names clearer. Also, this is an example of replacing a loop with recursion.

let euclid_extended m n =
  let rec loop a b a' b' c d =
    let q = c / d in
    let r = c - q * d in
    Printf.printf "| %d | %d | %d | %d | %d | %d|\n" a a' b b' c d;
    if r = 0 then
      a,b,d
    else
       let c_new = d in
       let d_new = r in
       let a_new = a' - q * a in
       let a'_new = a in
       let b_new = b' - q * b in
       let b'_new = b in
       loop a_new b_new a'_new b'_new c_new d_new
  in
  loop 0 1 1 0 m n

let test() =
  let m, n = 1769, 551 in
  Printf.printf "| a| a' | b | b' | c| d|\n|---\n";
  let a,b,d = euclid_extended m n in
  Printf.printf "a %d:  b: %d\n" a b;
  Printf.printf "test result: %d\n" (a * m + b * n);

Running the code produces the following table.

a

a'

b

b'

c

d

0

1

1

0

1769

551

1

0

-3

1

551

116

-4

1

13

-3

116

87

5

-4

-16

13

87

29

Here we can easily verify that \(5 * 1769 - 16 * 551 = 29\). The table gives us a nice way to see that \(a m + b n = d\) during all of the steps of the algorithm. Of course, Knuth already has the same table in his book, but some people, like me, don't really learn something until they have done it themselves.

Trying out a static web site

I really need to quit thinking about how to put up a site, and worry more about writing stuff. My hope is this is the step to take to get away from worrying about stuff.

Here is what I want to put in my posts:

  • for(int ii = 0; ii < 99; ++ii)

  • \(\LaTeX\)

  • generated images

Here are the tools I want to use to edit the posts:

Octopress seems nice, but I don't want all of the ruby dependencies for now. Org-mode seemed like it should do the job. So, I finally did a Google search, and found a post from someone I met at the Strange Loop conference this year. I have shamelessly stolen his layout, and am trying it out here.

Ants AI Challenge

The AI Challenge is Ants this time around. The goal is make your your ants collect food, explore, raze opponent ant hills, and protect their own hill. New ants are earned by harvesting food, and there is a combat system. Here is a video of a game. For a better view, go to the main site to watch some games!

From past contests, I knew I could easily spend too much time competing. I also knew that I didn't want to spend most of my free time tweaking weights and settings. So, I decided to work on this for a short while, try to get a respectable bot, then quit. I succeeded at quitting, though maybe not soon enough.

Unlike the previous contests (Tron and Planet Wars), in this version the bots have imperfect information. Like the previous contests, the game is text based, making it very easy for contestants to use their language of choice, subject only to getting the language running on the contest server. This was one of the reasons I entered the Tron contest, besides how awesome it looked. Often these kind of games require players to compile or link their bot into the game engine.

My goal was to create a bot that made only local decisions for each ant because there were too many possibilities to consider every possible set of moves. With any luck, good behavior could emerge from such a bot. I also hoped to only use very simple features for these local decisions, but the features in the bot are lacking in several respects.

So, again, I wanted to treat each ant independently. That makes searching their moves easy, but also makes the bot dumb. The first bot I submitted searched for food, enemy hills, and unknown territory, It avoided enemies, but had no combat code. The bot was pretty dumb.

The second version had some simple combat code. I tried making each ant move as aggressively as possible, and then backed off any move that didn't seem safe enough. This bot also had some code to push ants away from their own hill. The second version was much stronger than the first, but it timed out on some big maps.

The timeouts occurred because the bot does a few breadth-first searches to find out the distances from the various targets. One BFS starts at each known food, one at each known enemy hill, and another at all of the grid cells that haven't been seen for a while. The way I wrote the BFS in Python was almost fast enough, but could take too long on larger maps.

The scoring function for ants that are not in combat is just a weighted combination of the inverse distance squared from each of the interesting targets. This makes the ants go towards food, but if they have a choice between close food and close enemy hill, they charge the hill. They also are attracted to enemy ants, which helps the bot get the numbers needed for attacking.

The third version was a rewrite of the second, but in C++. This made the BFS searches about 50 times faster. I tried making the combat code more aggressive, but it went too far, often walking right into an attack.

The third version is my official version. Some weaknesses it has are

  • no defense of the home base

  • no overall global decision making. (it will send every ant to battle at one hill, and leave another that is only)

  • the combat is pretty bad

  • there is no coordination at all between ants

I spend some time on the code after my official entry, but made no progress that was worth submitting. I wrote code to discover the symmetry of the map to predict where other hills would be, though it wasn't very helpful. The math was fun, though. I tried using logistic regression to tune the weights of my evaluation function. That failed, and I think it was because I was trying to predict the wrong thing, and my features did not account for any global information.

Some of the other competitors are far beyond the rest of us, and I look forward to learning how they did it. Overall though, I am happy with the time I spent, and what I have learned. With the Stanford Online classes, work, and other activities, spending more time on the AI contest was just going to happen for me.

Congratulations to the winners!

IPython on OS X

What has worked for me:

  • setup a virtualenv

  • easy_install readline

  • easy_install ipython

If python gets updated by MacPorts, remove the old virtualenv, and re-run.

Stanford Online Classes

The ML Class and AI Class from Stanford are nearly over, and they have been a lot of fun. I have not had formal classes in either of these topics, but have seen similar ideas over the years of doing numerical programming and signal processing.

There were some technical glitches, but I learned a lot. The pace was good... unlike reading a book where I just skim through pretending to understand. The quizzes and homework encouraged active learning is more efficient than passive learning.

Many of the topics were things I had heard of, but never learned anything about. Some of the topics were completely new. I really enjoyed the ML class progression from linear regression to logistic to neural networks. NN are a simple idea, until you start thinking of how to train it. The jump from logistic regression to NN made a big difference in understanding.

In the AI class, much of the material was from Norvig's book, which I have been using as a paperweight for a while. The book is extremely dense with information, so it was nice to see explanations of even a sample of the material. (That isn't meant as a slight against the book either... just a challenge with reading it.)

Again, these classes were a lot of fun, and I learned a lot. Watching these world-class lecturers, and getting quick feedback on my understand would have been worth paying for. Getting the classes for free was amazing.

A simple Octave tip

Matlab and Octave are both great for quickly trying out numeric code, especially using linear algebra. Many times code can be written to use the built in matrix operators instead of using loops. This makes the code clearer and faster. The following examples all use Octave.

First, consider this formula

\begin{equation*} S = \sum_{i=1}^n x_i y_i \end{equation*}

x and y are one dimensional vectors in Octave, so we can find \(S\) with this code:

S = 0;
N = length(x);
for i = 1:N
  S += x(i) * y(i);
end

That code is straight forward. Octave can do more, though. The =.*= operator does an element by element multiplication over matrices, and =sum= adds up the elements in a matrix. Since our matrices are one-dimensional, we can compose these functions to get our result, at the cost of creating a temporary matrix.

S = sum(x .* y)

Finally, Octave already has a function that does what we want. The formula we are looking at is the dot product of two vectors, so in Octave, we can do

S = dot(x, y)

The first attempt is the clearest when you do not know what .*, sum, and dot do. But, once you know what sum and dot do, they are much clearer. Also, they are much faster in Octave than the naive loop.

The next listing shows a test driver to compare the three methods.

fprintf("\n||||\n| method | size| result | time |\n")
fmt = "| %s | %d | %f | %f |\n";

for SIZE = [100, 1000,10000,100000,1000000]
    x = rand(SIZE, 1);
    y = rand(SIZE, 1);

    fprintf("|---\n")
    tic
    S = 0;
    for i = 1:SIZE
      S += x(i) * y(i);
    end
    t1 = toc;
    fprintf(fmt, "for", SIZE, S, t1);

    tic
    S = sum(x .* y);
    t1 = toc;
    fprintf(fmt, "sum", SIZE, S, t1);

    tic
    S = dot(x, y);
    t1 = toc;
    fprintf(fmt, "dot", SIZE, S, t1);
  end

Running this code gives us the following results.

method

size

result

time

for

100

24.266291

0.000744

sum

100

24.266291

0.000055

dot

100

24.266291

0.000018

for

1000

247.065638

0.007198

sum

1000

247.065638

0.000016

dot

1000

247.065638

0.000017

for

10000

2507.387623

0.074163

sum

10000

2507.387623

0.000083

dot

10000

2507.387623

0.000029

for

100000

25090.247927

0.736105

sum

100000

25090.247927

0.000656

dot

100000

25090.247927

0.000095

for

1000000

249863.296971

7.334975

sum

1000000

249863.296971

0.006773

dot

1000000

249863.296971

0.001246

As a general rule, we should always try to use the language and tools, instead of doing things ourselves. Often, this is mostly a matter of learning what is available.

Sometimes, though we want to learn how something works. Then, of course, it makes sense to do it ourselves instead of just making a call to existing code.

Posting from emacs with org2blog/wp

I don't blog very often for several reasons. My technical excuse was that blogging from emacs to one of the hosted services was too painful. This is an experiment to see if I can get around it.

Org2Blog allows you to post from an OrgMode file in emacs, which is pretty nice. The only prerequisites are org mode and xmlrpc.el. The setup is pretty straightforward, as described in the README.org file.

Speaking of org and github, github appears to render README.org files in the main directory when you are viewing a repository. That is pretty awesome.

Code and latex snippets show up from org2blog too! I am fine with the default formatting, but I imagine this stuff could be tweaked to look however I wanted.