Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

Tree Graph Ordered Traversal Level by Level in C#

Recently as part of a job interview process, I was asked to solve some programming problems. This post shows the solution for one of such problems.

Problem
The problem ( or could we call it an algorithm exercise? ) is this:

Consider a tree of integers. Knowing that its root node is 0, and given its adjacency list as a two dimensional array of integers, write a function that prints out the elements/nodes in order/level by level starting from the root. That is, the root is printed in the first line, elements that can be reached from the root by a path of distance 1 in the second line, elements reached by a path of distance 2 in the third line, and so forth. For example, given the following adjacency list (draw the tree for a better view):

0 => 1, 2, 3
1 => 0, 4
2 => 0
3 => 0, 5
4 => 1, 6
5 => 3
6 => 4

The program should print:

0
1 2 3
4 5
6

Little bit of theory
If you read about Tree in Graph theory, you’ll see that we can represent a tree using a graph because a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree.

The tree in this problem isn’t a binary tree, it’s a n-ary tree.

Solution
With theory in mind, here goes my proposed solution…

I’m reusing some code from past posts. In special, the Graph, AdjacencyList, Node, NodeList and EdgeToNeighbor classes.

I use this method to fill a Graph with the Tree structure:

/// <summary>
/// Fills a graph with a given tree structure.
/// </summary>
/// <param name="graph"></param>
private static void FillGraphWithTreeStructure(Graph graph)
{
    // Vertexes
    graph.AddNode("0", null);
    graph.AddNode("1", null);
    graph.AddNode("2", null);
    graph.AddNode("3", null);
    graph.AddNode("4", null);
    graph.AddNode("5", null);
    graph.AddNode("6", null);

    // Edges
    graph.AddDirectedEdge("0", "1");
    graph.AddDirectedEdge("0", "2");
    graph.AddDirectedEdge("0", "3");

    graph.AddDirectedEdge("1", "4");

    graph.AddDirectedEdge("4", "6");

    graph.AddDirectedEdge("3", "5");

    /* This is the tree:
               
            0
          / | \
         1  2  3
        /       \
       4         5
      /
     6
             
        This is the expected output:
             
        Level 1 = 0
        Level 2 = 1 2 3
        Level 3 = 4 5
        Level 4 = 6

    */
}

This is the method that does the hard work:

/// <summary>
/// Performs an ordered level-by-level traversal in a n-ary tree from top-to-bottom and left-to-right.
/// Each tree level is written in a new line.
/// </summary> 
/// <param name="root">Tree's root node</param>
public static void LevelByLevelTraversal(Node root)
{
    // At any given time each queue will only have nodes that
    // belong to a level
    Queue<Node> queue1 = new Queue<Node>();
    Queue<Node> queue2 = new Queue<Node>();

    queue1.Enqueue(root);

    while (queue1.Count != 0 || queue2.Count != 0)
    {
        while (queue1.Count != 0)
        {
            Node u = queue1.Dequeue();

            Console.Write(u.Key);

            // Expanding u's neighbors in the queue
            foreach (EdgeToNeighbor edge in u.Neighbors)
            {
                queue2.Enqueue(edge.Neighbor);
            }
        }

        Console.WriteLine();

        while (queue2.Count != 0)
        {
            Node v = queue2.Dequeue();

            Console.Write(v.Key);

            // Expanding v's neighbors in the queue
            foreach (EdgeToNeighbor edge in v.Neighbors)
            {
                queue1.Enqueue(edge.Neighbor);
            }
        }

        Console.WriteLine();
    }
}

To spice things up I have implemented a Parallel version of the above method using a ConcurrentQueue:

/// <summary>
/// Performs an ordered level-by-level traversal in a n-ary tree from top-to-bottom and left-to-right in Parallel using a ConcurrentQueue.
/// Each tree level is written in a new line.
/// </summary> 
/// <param name="root">Tree's root node</param>
public static void LevelByLevelTraversalInParallel(Node root)
{
    // At any given time each queue will only have nodes that
    // belong to a level
    ConcurrentQueue<Node> queue1 = new ConcurrentQueue<Node>();
    ConcurrentQueue<Node> queue2 = new ConcurrentQueue<Node>();

    queue1.Enqueue(root);

    while (queue1.Count != 0 || queue2.Count != 0)
    {
        while (queue1.Count != 0)
        {
            Node u;
                    
            queue1.TryDequeue(out u);

            Console.Write(u.Key);

            // Expanding u's neighbors in the queue
            foreach (EdgeToNeighbor edge in u.Neighbors)
            {
                queue2.Enqueue(edge.Neighbor);
            }
        }

        Console.WriteLine();

        while (queue2.Count != 0)
        {
            Node v;
                    
            queue2.TryDequeue(out v);

            Console.Write(v.Key);

            // Expanding v's neighbors in the queue
            foreach (EdgeToNeighbor edge in v.Neighbors)
            {
                queue1.Enqueue(edge.Neighbor);
            }
        }

        Console.WriteLine();
    }
}

Now it’s time to measure the execution time using a StopWatch:

private static void Main(string[] args)
{
    Graph graph = new Graph();

    FillGraphWithTreeStructure(graph);

    Stopwatch stopWatch = new Stopwatch();

    stopWatch.Start();

    LevelByLevelTraversal(graph.Nodes["0"]);

    stopWatch.Stop();

    // Write time elapsed
    Console.WriteLine("Time elapsed: {0}", stopWatch.Elapsed);

    //Resetting the watch...
    stopWatch.Reset();

    stopWatch.Start();

    LevelByLevelTraversalInParallel(graph.Nodes["0"]);

    stopWatch.Stop();

    // Write time elapsed
    Console.WriteLine("Time elapsed: {0}", stopWatch.Elapsed);

    Console.ReadKey();
}

Now the results:

Sequential
0
1 2 3
4 5
6
Time elapsed: 00:00:00.0040340

Parallel
0
1 2 3
4 5
6
Time elapsed: 00:00:00.0020186

As you see, time is cut by a factor of 2. I currently have a Core 2 Duo processor in my Mac mini.

Hope you enjoy it and feel free to add your 2 cents to improve this code! Of course there are other ways of solving this very problem and I would like to see those other ways. Do you have any other better idea?

Download
You can get the Microsoft Visual Studio Console Application Project at:

https://sites.google.com/site/leniel/blog/TreeLevelTraversal.rar

To try out the code you can use the free Microsoft Visual C# 2010 Express Edition that you can get at: http://www.microsoft.com/visualstudio/en-us/products/2010-editions/visual-csharp-express

A* pathfinding search in C# - Part 3

A* pathfinding search in C# - Part 1
A* pathfinding search in C# - Part 2

Code available at GitHub: https://github.com/leniel/AStar

This is the last installment in the series about A* (A star) search.

The C# source code implemented is available in the final part of this post.

As promised in the last words of A* pathfinding search in C# - Part 2 today we’re gonna run a test case using the Romania map.

Romania map

If you want to understand the whole process implemented in this solution, please start reading A* pathfinding search in C# - Part 1.

When you run the console application, you get the following screen:

A* Search console application

You start by entering a Start and a Destination city picking up the ones you want from the list of Romania cities.

When you press Enter the console app will show you the shortest or best path based on the A* search algorithm.

As you can see in the above screenshot, the app shows us that the best path to go from Arad to Bucharest is the one that goes as follows:

From Arad           to  Sibiu          -> Total cost = 223.236 km From Sibiu          to  Rimnicu Vilcea -> Total cost = 301.317 km From Rimnicu Vilcea to  Pitesti        -> Total cost = 348.536 km From Pitesti        to  Bucharest      -> Total cost = 456.108 km

Note that the Total cost is the cost calculated so far for each path, that is, in the example shown above, Total cost = 348.536 km is the distance in kilometers for travelling from Arad to Pitesti.

No doubt this is the shortest path to follow if you plan to go from Arad to Bucharest. We could choose different possible routes but the total distance traveled would be greater than the one the app calculated for the shortest path. Let’s see why this is so using the method ViewOtherPaths (I commented about it in A* pathfinding search in C# - Part 2).

The following is the output of the console app when the method ViewOtherPaths is uncommented inside the FindPath method. This helps you debug and see why the app has chosen the above shortest path.

A* Search - Sample implementation by Leniel Macaferi, June 7-20, 2009

These are the Cities you can choose as Start and Destination in Romania:

Arad
Bucharest
Craiova
Dobreta
Eforie
Fagaras
Giurgiu
Hirsova
Iasi
Lugoj
Mehadia
Neamt
Oradea
Pitesti
Rimnicu Vilcea
Sibiu
Timisoara
Urziceni
Vaslui
Zerind

Enter a Start city: Arad

Enter a Destination city: Bucharest

Possible paths:

From Arad           to Sibiu          -> Total cost = 223.236 km
Estimation          = 213.803 km
Priority Queue Cost = 437.039 km = (Total cost + Estimation)

From Arad           to Timisoara      -> Total cost = 48.459 km
Estimation          = 408.79 km
Priority Queue Cost = 457.249 km = (Total cost + Estimation)

From Arad           to Zerind         -> Total cost = 51.908 km
Estimation          = 431.034 km
Priority Queue Cost = 482.942 km = (Total cost + Estimation)

Possible paths:

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
Estimation          = 154.102 km
Priority Queue Cost = 455.419 km = (Total cost + Estimation)

From Arad           to Timisoara      -> Total cost = 48.459 km
Estimation          = 408.79 km
Priority Queue Cost = 457.249 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Fagaras        -> Total cost = 287.59 km
Estimation          = 178.296 km
Priority Queue Cost = 465.886 km = (Total cost + Estimation)

From Arad           to Zerind         -> Total cost = 51.908 km
Estimation          = 431.034 km
Priority Queue Cost = 482.942 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Lugoj          -> Total cost = 397.029 km
Estimation          = 356.126 km
Priority Queue Cost = 753.155 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Arad           -> Total cost = 446.473 km
Estimation          = 420.536 km
Priority Queue Cost = 867.009 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Oradea         -> Total cost = 444.358 km
Estimation          = 434.745 km
Priority Queue Cost = 879.104 km = (Total cost + Estimation)

Possible paths:

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Pitesti        -> Total cost = 348.536 km
Estimation          = 107.572 km
Priority Queue Cost = 456.108 km = (Total cost + Estimation)

From Arad           to Timisoara      -> Total cost = 48.459 km
Estimation          = 408.79 km
Priority Queue Cost = 457.249 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Fagaras        -> Total cost = 287.59 km
Estimation          = 178.296 km
Priority Queue Cost = 465.886 km = (Total cost + Estimation)

From Arad           to Zerind         -> Total cost = 51.908 km
Estimation          = 431.034 km
Priority Queue Cost = 482.942 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Craiova        -> Total cost = 400.614 km
Estimation          = 183.042 km
Priority Queue Cost = 583.656 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Sibiu          -> Total cost = 379.398 km
Estimation          = 213.803 km
Priority Queue Cost = 593.201 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Lugoj          -> Total cost = 397.029 km
Estimation          = 356.126 km
Priority Queue Cost = 753.155 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Mehadia        -> Total cost = 461.891 km
Estimation          = 299.853 km
Priority Queue Cost = 761.744 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Lugoj          -> Total cost = 504.328 km
Estimation          = 356.126 km
Priority Queue Cost = 860.454 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Arad           -> Total cost = 446.473 km
Estimation          = 420.536 km
Priority Queue Cost = 867.009 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Oradea         -> Total cost = 444.358 km
Estimation          = 434.745 km
Priority Queue Cost = 879.104 km = (Total cost + Estimation)

Possible paths:

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Pitesti        -> Total cost = 348.536 km
From Pitesti        to Bucharest      -> Total cost = 456.108 km
Estimation          = 0 km
Priority Queue Cost = 456.108 km = (Total cost + Estimation)

From Arad           to Timisoara      -> Total cost = 48.459 km
Estimation          = 408.79 km
Priority Queue Cost = 457.249 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Fagaras        -> Total cost = 287.59 km
Estimation          = 178.296 km
Priority Queue Cost = 465.886 km = (Total cost + Estimation)

From Arad           to Zerind         -> Total cost = 51.908 km
Estimation          = 431.034 km
Priority Queue Cost = 482.942 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Pitesti        -> Total cost = 348.536 km
From Pitesti        to Rimnicu Vilcea -> Total cost = 395.755 km
Estimation          = 154.102 km
Priority Queue Cost = 549.858 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Craiova        -> Total cost = 400.614 km
Estimation          = 183.042 km
Priority Queue Cost = 583.656 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Sibiu          -> Total cost = 379.398 km
Estimation          = 213.803 km
Priority Queue Cost = 593.201 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Pitesti        -> Total cost = 348.536 km
From Pitesti        to Craiova        -> Total cost = 452.104 km
Estimation          = 183.042 km
Priority Queue Cost = 635.146 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Pitesti        -> Total cost = 348.536 km
From Pitesti        to Fagaras        -> Total cost = 458.356 km
Estimation          = 178.296 km
Priority Queue Cost = 636.653 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Lugoj          -> Total cost = 397.029 km
Estimation          = 356.126 km
Priority Queue Cost = 753.155 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Mehadia        -> Total cost = 461.891 km
Estimation          = 299.853 km
Priority Queue Cost = 761.744 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Lugoj          -> Total cost = 504.328 km
Estimation          = 356.126 km
Priority Queue Cost = 860.454 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Arad           -> Total cost = 446.473 km
Estimation          = 420.536 km
Priority Queue Cost = 867.009 km = (Total cost + Estimation)

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Oradea         -> Total cost = 444.358 km
Estimation          = 434.745 km
Priority Queue Cost = 879.104 km = (Total cost + Estimation)

This is the shortest path based on the A* Search Algorithm:

From Arad           to Sibiu          -> Total cost = 223.236 km
From Sibiu          to Rimnicu Vilcea -> Total cost = 301.317 km
From Rimnicu Vilcea to Pitesti        -> Total cost = 348.536 km
From Pitesti        to Bucharest      -> Total cost = 456.108 km

Do you wanna try A* Search again? Yes or No?

A small change
One thing I changed in the code I posted on A* pathfinding search in C# - Part 2 was the foreach that enumerates the shortest path to write it on the screen. Before it read:

// Prints the shortest path.
foreach(Node n in shortestPath.Reverse())
{
    Console.WriteLine(n.Key);
}

Now it reads:

// Prints the shortest path.
foreach(Path<Node> path in shortestPath.Reverse())
{
    if(path.PreviousSteps != null)
    {
        Console.WriteLine(string.Format("From {0, -15}  to  {1, -15} -> Total cost = {2:#.###} {3}",
                          path.PreviousSteps.LastStep.Key, path.LastStep.Key, path.TotalCost, distanceType));
    }
}

As you can see I changed from Node to Path<Node>. To get this working I had to change the type returned by GetEnumerator in the class Path so that it returned Path<Node> instead of Node.

public IEnumerator<Path<Node>> GetEnumerator()
{
    for(Path<Node> p = this; p != null; p = p.PreviousSteps)
        yield return p;
}

This allowed me to enumerate over each path that composes the whole shortest path so that we can show the LastStep of the previous path and the LastStep of the current path. The Total cost travelled so far for each path is also available because we’re working with a path object.

Last note
A* is a really powerful search algorithm.

Hope you liked this series of posts about A* search as I liked to implement and write about it! It was a really good programming exercise.

Visual Studio 2013 Solution with C# Console Application Project
You can get the Microsoft Visual Studio Project at this GitHub repository:

https://github.com/leniel/AStar

To try out the code you can use the free Microsoft Visual C# Express Edition that you can get at: http://www.microsoft.com/express/vcsharp/

A* pathfinding search in C# - Part 2

A* pathfinding search in C# - Part 1
A* pathfinding search in C# - Part 3

Code available at GitHub: https://github.com/leniel/AStar

To illustrate the path finding problem and how it can be solved using A* (A star) I decided to use the Romania map (with the same Cities I used in Breadth and depth first search series of posts). Now I modified it adding more connections between the cities so that we have more fun while debugging the code.

This time I decided to get the real values of latitude and longitude for the cities shown in the Romania map. To accomplish this I used this Google Spreadsheet that is a geocoder. It uses geo-google-docs to get geolocalization data for the address column.

This is the Map created using the data stored in the spreadsheet with the help of a Google Map:

You can pass your mouse over each pin to see the name of the city it represents in the map.

The following picture shows the fictitious connections (paths) between the cities and the cost to move from city to city.

Romania Map

To fill the Graph that represents the Romania map above I created a static private method called FillGraphWithEarthMap inside the AStar class that does the job of creating the Nodes and adding them to the graph (bellow I show a shortened version of such a method because it is a really big method given the fact that we have 20 cities and 41 connections (paths) between them:

/// <summary>
/// Fills a Graph with Romania map information.
/// The Graph contains Nodes that represents Cities of Romania.
/// Each Node has as its key the City name and its Latitude and Longitude associated information.
/// Nodes are vertexes in the Graph. Connections between Nodes are edges.
/// </summary>
/// <param name="graph">The Graph to be filled</param>
/// <param name="distanceType">The DistanceType (KM or Miles) between neighbor cities</param>
private static void FillGraphWithEarthMap(Graph graph, DistanceType distanceType)
{
    // 20 Vertexes in total
    Node arad = new Node("Arad", null, 46.1792414, 21.3150154); // Creating a Node...
    graph.AddNode(arad); // Adding the Node to the Graph...

    Node bucharest = new Node("Bucharest", null, 44.4479237, 26.097879);
    graph.AddNode(bucharest);

    Node craiova = new Node("Craiova", null, 44.3182085, 23.8016427);
    graph.AddNode(craiova);

    .
.
.
    // 41 Edges in total
    // Arad <-> Sibiu
    graph.AddUndirectedEdge(arad, sibiu, Haversine.Distance(arad, sibiu, distanceType));
    // Arad <-> Timisoara
    graph.AddUndirectedEdge(arad, timisoara, Haversine.Distance(arad, timisoara, distanceType));
    // Arad <-> Zerind
    graph.AddUndirectedEdge(arad, zerind, Haversine.Distance(arad, zerind, distanceType));
.
.
. }

The Vertexes
You must create a vertex/Node for each City of the map, passing to the Node’s class constructor the name of the City and its Latitude and Longitude coordinates. Add each Node to the Graph passed as a parameter to the method.

The Edges
You must create an undirected edge/path for each connection between two cities, passing to the constructor the first city, the second city and the cost to go from one city to the other. In this case the cost is calculated through the Haversine class discussed in A* pathfinding search in C# - Part 1.

To help debug the code I created two methods: DistanceBetweenNodes and ViewOtherPaths. Both methods are kept in the AStar class.

This is the DistanceBetweenNodes method:

/// <summary>
/// Prints on screen the distance from a city to its neighbors.
/// </summary>
/// <param name="graph">The Graph</param>
/// <param name="distanceType">The distance type: KM or Miles</param>
private static void DistanceBetweenNodes(Graph graph, DistanceType distanceType)
{
    // First we cast the Graph.Nodes which is a NodeList to type Node and then we order the list of Nodes by the Node Key
    // so that we get the list of cities in ascending order.
    foreach(Node n in graph.Nodes.Cast<Node>().OrderBy(n => n.Key))
    {
        // For each city neighbor we gets its information and print it on screen.
        foreach(EdgeToNeighbor etn in n.Neighbors)
        {
            Console.WriteLine("Distance from {0} to {1} is -> {2:#.##} {3}", n.Key, etn.Neighbor.Key, etn.Cost, distanceType);
        }
    }
}

Inside the FindPath method presented on A* pathfinding search in C# - Part 1, I call the ViewOtherPaths method (it’s commented out):

/// <summary>
/// This is the method responsible for finding the shortest path between a Start and Destination cities using the A*
/// search algorithm.
/// </summary>
/// <typeparam name="TNode">The Node type</typeparam>
/// <param name="start">Start city</param>
/// <param name="destination">Destination city</param>
/// <param name="distance">Function which tells us the exact distance between two neighbours.</param>
/// <param name="estimate">Function which tells us the estimated distance between the last node on a proposed path and the
/// destination node.</param>
/// <returns></returns>
static public Path<TNode> FindPath<TNode>(
    TNode start,
    TNode destination,
    Func<TNode, TNode, double> distance,
    Func<TNode, double> estimate) where TNode : IHasNeighbours<TNode>
{
    var closed = new HashSet<TNode>();

    var queue = new PriorityQueue<double, Path<TNode>>();

    queue.Enqueue(0, new Path<TNode>(start));

    while(!queue.IsEmpty)
    {
        var path = queue.Dequeue();

        if(closed.Contains(path.LastStep))
            continue;

        if(path.LastStep.Equals(destination))
            return path;

        closed.Add(path.LastStep);

        foreach(TNode n in path.LastStep.Neighbours)
        {
            double d = distance(path.LastStep, n);

            var newPath = path.AddStep(n, d);

            queue.Enqueue(newPath.TotalCost + estimate(n), newPath);
        }

        //ViewOtherPaths(queue, estimate);
    }

    return null;
}

This is the ViewOtherPaths method:

/// <summary>
/// This method can be used to view the other paths inside the PriorityQueue.
/// </summary>
/// <typeparam name="TNode">The Node type</typeparam>
/// <param name="queue">The priority queue</param>
/// <param name="estimate">Function which tells us the estimated distance between the last node on a proposed path and the
/// destination node.</param>
private static void ViewOtherPaths<TNode>(PriorityQueue<double, Path<TNode>> queue, Func<TNode, double> estimate)
{
    // The priority queue is composed of KeyValuePairs which has as key a double value (the TotalCost) and
    // has as Value a Queue which contains Paths.
    foreach(KeyValuePair<double, Queue<Path<TNode>>> kvp in queue)
    {
        // For each path in the Queue...
        foreach(Path<TNode> otherPath in kvp.Value)
        {
            // Reverse the Path so that we get the order of the cities in a more meaningful way...
            var otherPathReversed = otherPath.Cast<Node>().Reverse();

            // Prints on screen the Cities that are part of this path.
            foreach(Node n in otherPathReversed)
            {
                Console.WriteLine(n.Key);
            }

            // Get the total cost of the other path.
            double otherPathTotalCost = otherPath.TotalCost;
            // Get the estimation cost of the other path.
            double otherPathEstimation = estimate(otherPath.LastStep);

            // Prints on the screen the relevant information so that it gets easier to debug the code and see how
            // the A* search algorithm really does the job...
            Console.WriteLine("Total Cost other path = {0}", otherPathTotalCost);
            Console.WriteLine("Estimation other path = {0}", otherPathEstimation);
            Console.WriteLine(@"Priority Queue Cost other path = {0} =
                            Total Cost other path + Estimation other path =
                            {1}", kvp.Key, otherPathTotalCost + otherPathEstimation);
        }

        Console.WriteLine();
    }

    Console.WriteLine();
}

I also use two additional methods to get the Start and Destination cities entered by the user when s/he runs the code. Both methods keep a loop until the user enters the name of a city that is indeed part of the Graph we’ve just created:

/// <summary>
/// Gets the Destination city.
/// </summary>
/// <param name="graph">The Graph</param>
/// <returns>Name of Destination city</returns>
private static string GetDestinationCity(Graph graph)
{
    string destinationCity;
do { Console.Write("\nEnter a Destination city: "); destinationCity = Console.ReadLine(); } while(!graph.Nodes.ContainsKey(destinationCity));
return destinationCity; }
/// <summary>
/// Gets the Destination city.
/// </summary>
/// <param name="graph">The Graph</param>
/// <returns>Name of Destination city</returns>
private static string GetDestinationCity(Graph graph)
{
    string destinationCity;
do { Console.Write("\nEnter a Destination city: "); destinationCity = Console.ReadLine(); } while(!graph.Nodes.ContainsKey(destinationCity));
return destinationCity; }

This is the main entry point of the Console Application I created:

static void Main(string[] args)
{
    do
    {
        // Creating the Graph...
        Graph graph = new Graph();

        //FillGraphWithGridMap(graph);

        FillGraphWithEarthMap(graph, DistanceType.Kilometers);

        // Prints on the screen the distance from a city to its neighbors.
        // Used mainly for debug information.
        // DistanceBetweenNodes(graph, DistanceType.Kilometers);

        Console.WriteLine("A* Search - Sample implementation by Leniel Macaferi, June 7-20, 2009\n");
        Console.WriteLine("These are the Cities you can choose as Start and Destination in Romania: \n");

        // Prints on screen the cities that you can choose as Start and Destination.
        foreach(Node n in graph.Nodes.Cast<Node>().OrderBy(n => n.Key))
        {
            Console.WriteLine(n.Key);
        }

        string startCity = GetStartCity(graph);

        string destinationCity = GetDestinationCity(graph);

        Node start = graph.Nodes[startCity];

        Node destination = graph.Nodes[destinationCity];

        // Function which tells us the exact distance between two neighbours.
        Func<Node, Node, double> distance = (node1, node2) => node1.Neighbors.Cast<EdgeToNeighbor>().Single(etn => etn.Neighbor.Key == node2.Key).Cost;

        // Estimation/Heuristic function (Manhattan distance)
        // It tells us the estimated distance between the last node on a proposed path and the destination node.
        //Func<Node, double> manhattanEstimation = n => Math.Abs(n.X - destination.X) + Math.Abs(n.Y - destination.Y);

        // Estimation/Heuristic function (Haversine distance)
        // It tells us the estimated distance between the last node on a proposed path and the destination node.
        Func<Node, double> haversineEstimation = n => Haversine.Distance(n, destination, DistanceType.Kilometers);

        //Path<Node> shortestPath = FindPath(start, destination, distance, manhattanEstimation);
        Path<Node> shortestPath = FindPath(start, destination, distance, haversineEstimation);

        // Prints the shortest path.
        foreach(Node n in shortestPath.Reverse())
        {
            Console.WriteLine(n.Key);
        }

        Console.Write("\nDo you wanna try A* Search again? Yes or No? ");
    }
    while(Console.ReadLine().ToLower() == "yes");
}

The main entry point above calls all the other methods I showed in this post.

This is all you need to run the A* pathfinding search algorithm! :-)

I think the code is well documented through comments.

I wish this helps you understand how to put together the working pieces necessary to run a test case.

Next time: I’ll run a test case using the Graph we have assembled in this post.

A* pathfinding search in C# - Part 1

A* pathfinding search in C# - Part 2
A* pathfinding search in C# - Part 3

Code available at GitHub: https://github.com/leniel/AStar

Last Sunday after I got back from church I read Kirill Osenkov’s post Algorithms in C#: shortest path around a polygon (polyline routing). Kirill mentions Eric Lippert’s excellent series of posts titled Path Finding Using A* in C# 3.0. I’ve read that series of posts a long time ago (2007 - I was on the last year of the computer engineering course).

See you how life is interesting: in 2007 after reading Lippert’s posts I wanted to see a running sample of the code but didn’t try to implement that until the time I read Osenkov’s post. Why did it took so long? I don’t know how to answer this.

I studied A* (A star) in the 9th term of the computer engineering course  (Feb-Jun, 2007). The discipline was Artificial Intelligence. At that time I didn’t implement any code for this specific algorithm (I think this is due the so famous excuse during the classes - we just don’t have enough time to do that. We have lots of topics to see yet.) . Therefore, we’ve just run test cases on sheets of paper. Not so good but a start.

Well, Eric wrote the base structure of the A* pathfinding algorithm but didn’t provide a complete running sample. He wrote the following:

I have a nice example of using A* to find the shortest path around a lake (but not the longest shortest path!) but I'm not going to post it here until Orcas ships. I'll just put up the whole project file at that time and you can check it out all at once.

Still on Sunday night I took the time to write some code that put up a running A* search.

I basically merged some code I had already posted on the series of posts titled Breadth and depth first search and created some properties here and there to get the whole thing running. It’s nothing more than code reuse.

The following is the basic project’s structure I created:

A* Search Solution Explorer

In Path Finding Using A* in C# 3.0, Part One the A* algorithm is described.

In Path Finding Using A* in C# 3.0, Part Two the data structures necessary to implement the A* algorithm are defined.

In this part Eric writes:

Let’s make an immutable stack of nodes which tracks the total cost of the whole path. Later on, we’ll see that we need to enumerate the nodes of this thing, so we’ll do a trivial implementation of IEnumerable while we’re at it. And since we don’t know exactly what a node will look like, let’s make the thing generic.

In part two we are presented to an immutable stack that I’ve put inside the Path class.

In Path Finding Using A* in C# 3.0, Part Three we are presented to a priority queue. This structure is responsible for putting the best paths so far in priority order, that is, the path with a lesser cost has a major priority.

In this part Eric writes:

In order to make the A* algorithm work we need to get the lowest-estimated-cost-path-discovered-so-far out of the list of paths under consideration. The standard data structure for doing so is called a “priority queue”. Priority queues are so-called because they are typically used to store a list of jobs where each job has an associated priority.

The PriorityQueue lies within its namesake class.

In Path Finding Using A* in C# 3.0, Part Four we get the FindPath method that is the one responsible for finding the shortest path possible. This method uses all of the remaining classes shown on the project structure above.

I put the FindPath method inside the AStar class.

In this part Eric writes:

What do we need to make the algorithm run? We need a start node, a destination node, a function which tells us the exact distance between two neighbours, and a function which tells us the estimated distance between the last node on a proposed path and the destination node.

This is the beautiful PathFind method:

static public Path<TNode> FindPath<TNode>(
    TNode start,
    TNode destination,
    Func<TNode, TNode, double> distance,
    Func<TNode, double> estimate) where TNode : IHasNeighbours<TNode>
{
    var closed = new HashSet<TNode>();

    var queue = new PriorityQueue<double, Path<TNode>>();

    queue.Enqueue(0, new Path<TNode>(start));

    while(!queue.IsEmpty)
    {
        var path = queue.Dequeue();

        if(closed.Contains(path.LastStep))
            continue;

        if(path.LastStep.Equals(destination))
            return path;

        closed.Add(path.LastStep);

        foreach(TNode n in path.LastStep.Neighbours)
        {
            double d = distance(path.LastStep, n);

            var newPath = path.AddStep(n, d);

            queue.Enqueue(newPath.TotalCost + estimate(n), newPath);
        }
    }

    return null;
}

When you get at the fourth part of Eric’s posts you may get thinking: How can I run this thing? That’s what I try to explain in this post.

The immutable stack implemented in part two of Eric’s posts is a generic one, that is, it accepts an object custom made to represent a Node in the path.

I already had a class Node that I used sometime ago while studying Artificial Intelligence. Take a look at Breadth and depth first search series of posts.

I also needed a Graph that represents the map formed with the connections between the nodes.

The Graph has a NodeList that stores all the nodes currently in the graph.

A node can have neighbors, that is, other nodes connected to it and such connections are represented in the code through the class AdjacencyList. Each node has an AdjacencyList that stores its neighbors.

A connection (edge) between two nodes (vertexes) is represented by the EdgeToNeighbor class that stores the neighbor of a node and the cost to get to that neighbor.

The AdjacencyList, EdgeToNeighbor, Graph, Node and NodeList classes are the outcome of an excellent series of articles by Scott Mitchell titled An Introduction to Data Structures. I got those classes and added them to the project.

This way I had all set to pass in the first two parameters of the PathFind method.

Now I needed to figure out how to create the distance and estimate functions to pass as the third and fourth parameters.

For the distance function I did this:

// Function which tells us the exact distance between two neighbours.
Func<Node, Node, double> distance = (node1, node2) => node1.Neighbors.Cast<EdgeToNeighbor>().Single(etn => etn.Neighbor.Key == node2.Key).Cost;

The distance function receives two nodes as parameters as can be seen in the PathFind method and returns a double value. I then use a lambda expression to find the Cost, (distance) in this case, between the two nodes. I get the neighbors of node1 and cast those to EdgeToNeighbor and execute a query using a lambda expression again to return the Cost only if the key of one of the neighbors of node1 is equal to the key of node2.

For the estimation function I did this:

// Estimation/Heuristic function (Haversine distance)
// It tells us the estimated distance between the last node on a proposed path and the destination node.
Func<Node, double> haversineEstimation = n => Haversine.Distance(n, destination, DistanceType.Kilometers);

Now I’m using as the estimation the Haversine formula. This formula gives the great-circle distances between two points on a sphere (Earth) from their longitudes and latitudes.

I got the Haversine formula code from the post Haversine Formula in C# and in SQL by Seth Long. I just adapted it a little bit so that I could use it with the Node data structure.

The haversine generic delegate receives as input a Node and return a double. Using a lambda expression I get the Haversine distance from node n to the destination node and I set that I want the distance in Kilometers.

With the distance and estimate functions on hand we’re OK to call the PathFind method.
Wait! Don’t we need something before that? Of course we need. We need to populate the graph with nodes. After all we’re after the shortest path between two nodes and we need some nodes to test the A* star pathfinding search. :-)

Next time: I’ll show you how to create a complete scenario using the same Romania map I used in the Breadth and depth first search series of posts.

References
[1] Lippert, Eric. Path Finding Using A* in C# 3.0, Part One. 2007. Available at <http://blogs.msdn.com/ericlippert/archive/2007/10/02/path-finding-using-a-in-c-3-0.aspx>. Accessed on June 13, 2009.

[2] Lippert, Eric. Path Finding Using A* in C# 3.0, Part Two. 2007. Available at <http://blogs.msdn.com/ericlippert/archive/2007/10/04/path-finding-using-a-in-c-3-0-part-two.aspx>. Accessed on June 13, 2009.

[3] Lippert, Eric. Path Finding Using A* in C# 3.0, Part Three. 2007. Available at <http://blogs.msdn.com/ericlippert/archive/2007/10/08/path-finding-using-a-in-c-3-0-part-three.aspx>. Accessed on June 13, 2009.

[4] Lippert, Eric. Path Finding Using A* in C# 3.0, Part Four. 2007. Available at <http://blogs.msdn.com/ericlippert/archive/2007/10/10/path-finding-using-a-in-c-3-0-part-four.aspx>. Accessed on June 13, 2009.

[5] Mitchell, Scott. An Extensive Examination of Data Structures. 2005. Available at <http://msdn.microsoft.com/en-us/library/ms379570(VS.80).aspx>. Accessed on June 13, 2009.

[6] Long, Seth. Haversine Formula in C# and in SQL. 2008. Available at <http://megocode3.wordpress.com/2008/02/05/haversine-formula-in-c>. Accessed on June 13, 2009.

Calculating prime numbers with LINQ in C#

If you want to see how to use PLINQ (Parallel LINQ) to get performance gains when performing this calculation, take a look at the post titled
Parallel LINQ (PLINQ) with Visual Studio 2010.

To serve as a processing test in a follow up post I’m posting how to calculate prime numbers using a LINQ query expression.

In mathematics, a prime number (or a prime) is a natural number which has exactly two distinct natural number divisors: 1 and itself. The first twenty-five prime numbers are:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

The following code[1] shows how to calculate the prime numbers that are less equal ( <= ) a given number “max”:

Func<int, IEnumerable<int>> primeNumbers = max =>
     from i in Enumerable.Range(2, max - 1)
     where Enumerable.Range(2, i - 2).All(j => i % j != 0)
     select i;

IEnumerable<int> result = primeNumbers(10);
foreach(int i in result)
{
  Console.WriteLine(i);
}

In my final computer engineering assignment I presented LINQ - Language Integrated Query and its constructs. With such constructs it’s possible to achieve a high degree of method chaining.

LINQ is declarative, not imperative. It allows us to simply state what we want to do without worrying about how it is done.

In the code above we declare a Func<(Of <(T, TResult>)>) generic delegate which has an int as input parameter and an IEnumerable<int> as the result.
Func delegates are very useful for encapsulating user-defined expressions that are applied to each element in a set of source data.

Using a lambda expression ( => ) we assign a query expression to the delegate.
A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.

The query expression states that from each value i in the Enumerable.Range(2, max - 1) where all elements of the range Enumerable.Range(2, i – 2) satisfy the condition All(j => i % j != 0), we select i.

max is the delegate input parameter.

The % symbol is the modulus operator in C#. It computes the remainder after dividing its first operand by its second.

For example: considering max = 10 we’d have the following…

The range in the from clause is: { 2, 3, 4, 5, 6, 7, 8, 9, 10 }

Taking the 1st value out of the from range, we have i = 2.

The range in the where clause is Range (2, 2 - 2) = Range (2 , 0) = {  }.

Since i = 2 is by default included in the result, what must be evaluated lies where the variable max is used. So, taking the 2nd value and assigning it to i we have i = 3.

The range in the where clause is Range (2, 3 - 2) = Range (2 , 1) = { 2 }.

Evaluating j => i % j we have 3 % 2 = 1, and so, 3 is a prime number.

Taking the 3rd value and assigning it to i we have i = 4.

The range in the where clause is Range (2,  4 - 2) = Range (2, 2) = { 2, 3 }.

Evaluating j => i % j we have 4 % 2 = 0 and 4 % 3 = 1, and so 4 is not a prime number because all elements of the where range must yield a result != 0 for the expression i % j != 0.

Now we have i = 5.

The range in the where clause is Range (2, 5 - 2) = Range (2 , 3) = { 2, 3, 4 }.

Evaluating j => i % j we have 5 % 2 = 1, 5 % 3 = 2 and 5 % 4 = 1. From this we have that 5 is a prime number.

Now we have i = 6.

The range in the where clause is Range (2, 6 - 2) = Range (2 , 4) = { 2, 3, 4, 5 }.

Evaluating j => i % j we have 6 % 2 = 0, 6 % 3 = 0, 6 % 4 = 2 and 6 % 5 = 1. From this we have that 6 is not prime number.

Now we have i = 7.

The range in the where clause is Range (2, 7 - 2) = Range (2 , 5) = { 2, 3, 4, 5, 6 }.

Evaluating j => i % j we have 7 % 2 = 1, 7 % 3 = 1, 7 % 4 = 3, 7 % 5 = 2 and 7 % 6 = 1. From this we have that 7 is prime number.

DebuggingPrimeNumbersLINQQueryExpression

Now we have i = 8.

The range in the where clause is Range (2, 8 - 2) = Range (2 , 6) = { 2, 3, 4, 5, 6, 7 }.

Evaluating j => i % j we have 8 % 2 = 0, 8 % 3 = 2, 8 % 4 = 0, 8 % 5 = 3, 8 % 6 = 2 and 8 % 7 = 1. 8 isn’t a prime number.

Now we have i = 9.

The range in the where clause is Range (2, 9 - 2) = Range (2 , 7) = { 2, 3, 4, 5, 6, 7, 8 }.

Evaluating j => i % j we have 9 % 2 = 1, 9 % 3 = 0, 9 % 4 = 1, 9 % 5 = 4, 9 % 6 = 3, 9 % 7 = 2 and 9 % 8 = 1. 9 isn’t a prime number.

Now we have i = 10.

The range in the where clause is Range (2, 10 - 2) = Range (2 , 8) = { 2, 3, 4, 5, 6, 7, 8, 9 }.

Evaluating j => i % j we have 10 % 2 = 0, 10 % 3 = 1, 10 % 4 = 2, 10 % 5 = 0, 10 % 6 = 4, 10 % 7 = 3, 10 % 8 = 2 and 10 % 9 = 1. 10 isn’t a prime number.

Finally we have 4 prime numbers <= 10; they are: { 2, 3, 5, 7 }.

The following table illustrates the result:

i where Range(2, i - 2) All(j => i % j != 0)
2 { }  
3 { 2 } 3 % 2 = 1
4 { 2, 3 } 4 % 2 = 0
4 % 3 = 1
5 { 2, 3, 4 } 5 % 2 = 1
5 % 3 = 2
5 % 4 = 1
6 { 2, 3, 4, 5 } 6 % 2 = 0
6 % 3 = 0
6 % 4 = 2
6 % 5 = 1
7 { 2, 3, 4, 5, 6 } 7 % 2 = 1
7 % 3 = 1
7 % 4 = 3
7 % 5 = 2
7 % 6 = 1
8 { 2, 3, 4, 5, 6, 7 } 8 % 2 = 0
8 % 3 = 2
8 % 4 = 0
8 % 5 = 3
8 % 6 = 2
8 % 7 = 1
9 { 2, 3, 4, 5, 6, 7, 8 } 9 % 2 = 1
9 % 3 = 0
9 % 4 = 1
9 % 5 = 4
9 % 6 = 3
9 % 7 = 2
9 % 8 = 1
10 { 2, 3, 4, 5, 6, 7, 8, 9 } 10 % 2 = 0
10 % 3 = 1
10 % 4 = 2
10 % 5 = 0
10 % 6 = 4
10 % 7 = 3
10 % 8 = 2
10 % 9 = 1

I’ll use the primeNumbers delegate in a future post to evaluate PLINQ (Parallel LINQ) and measure the performance gains when the same calculation is done in parallel instead of in sequence.

References
[1] Perfetti, Michel. LINQ Quiz: The list of prime numbers in 3 clauses? 2007. Available at <http://blogs.developpeur.org/raptorxp/archive/2007/11/26/quizz-linq-la-liste-des-nombres-premiers-en-3-clauses.aspx>. Accessed on May 31, 2009.

Quicksort and Binary Search algorithms in C++

This post is related to one of the best coursework I've done during the computer engineering course. I'm really proud of it. It was the cornerstone in my programming career and helped me choose an area to put my efforts from that moment on. I was in the 5th term out of 10 (3rd year of the course out of 5 more exactly). The discipline was Programming Languages.

This subject is fantastic and is used extensively throughout the the computer science field.

First I'll give a short description about the Quicksort and Binary Search algorithms and then I'll present the work that I and my dear brother in faith Wellington Magalhães Leite did.

Quicksort

Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare. Typically, quicksort is significantly faster in practice than other sorting algorithms, because its inner loop can be efficiently implemented on most architectures.

Binary Search

A binary search algorithm (or binary chop) is a technique for finding a particular value in a sorted list. It makes progressively better guesses, and closes in on the sought value by selecting the median element in a list, comparing its value to the target value, and determining if the selected value is greater than, less than, or equal to the target value. A guess that turns out to be too high becomes the new top of the list, and a guess that is too low becomes the new bottom of the list. Pursuing this strategy iteratively, it narrows the search by a factor of two each time, and finds the target value.

Our paper

Our paper is entitled Quicksort and Binary Search Algorithms. You can get a copy at the end of this post.

Without more ado, see its abstract bellow:

Sorting and searching algorithms are a core part of the computer science area. They are used throughout the programming work when you need to sort a set of data and when you need to search for a specific record (key) present in such set of data.

Quicksort is one of the fastest (quick) sorting algorithms and is most used in huge sets of data. It performs really well in such situations.

Binary search tree is one of the fastest searching algorithms and is applied in a sorted set of data. It reduces the search space by 2 in each iteration, hence its name (binary).

In this paper we present the intrinsic nature of each algorithm as well as a functional implementation of such algorithms in the C++ programming language.

Keywords: quicksort, binary search, sorting algorithms, searching algorithms, c++ programming language

CONTENTS
1 INTRODUCTION 6
  1.1 Objective 6
  1.2 Definition 6
      1.2.1 Sorting algorithms 6
      1.2.2 Searching algorithms 7
2 DEVELOPMENT 8
  2.1 Quicksort algorithm (swapping and partitioning) 8
      2.1.1 Detailed steps 8
  2.2 Binary search algorithm 9
  2.3 Studying the efficiency of the methods 9
      2.3.1 The big O notation 9
      2.3.2 Quicksort efficiency 10
            2.3.2.1 The best and the worst case 10
            2.3.2.2 Comparison with other sorting algorithms 11
      2.3.3 Binary search efficiency 11
3 APPLICATION 12
  3.1 Quicksort implementation 12
  3.2 Binary search implementation 13
4 CONCLUSION 14
5 REFERENCES 15

Words of wisdom

As I mentioned, during the 5th term of the computer engineering course our teacher Marcus Vinicius Carvalho Guelpeli selected some sorting and searching algorithms to pass to the class as a coursework.

The coursework should be done in pairs and each pair should select a sorting and searching algorithm to compose a paper about it. We selected the quicksort and the binary search algorithms. The teacher advised us that these weren't the easy ones. I just thought: that's what I want. I don't want the easy ones. Why? Because if you get just the easy problems I'll never understand something that demands a more deep approach and every time a difficult task is given you'll tend to refuse it. What's the best thing to do? Just accept the challenge and go for it. Chances are you'll succeed. That's just what happened with us.

We haven't just written about the quicksort and the binary search, we implemented it and presented it to the class in a power point presentation. The teacher liked it so much that our grade was the highest possible! :-) In the end what did we feel? An amazing feeling. Something such that the work had been done and we learned a lot from it. That’s what a college and even an online computer science degree program is supposed to do. Give you the subjects and motivate you; teaching the basic so that you can dig up the more difficult aspects of the subject being taught.

So what's next? Well, I'll explain how we implemented the quicksort and the binary search algorithms.

Quicksort and Binary search algorithms implementation in C++

One of the things that I've always listened to was about code reuse. You search for something already implemented so that you just haven't to reinvent the wheel. Of course you'll complement or even adapt the available code to your situation. That was what we did. I found some code for the quick sort at MSDN and implemented the binary search one. Unfortunately the page at MSDN isn't available anymore. It's been three years since I hit that page.

I wanted a way of measuring the time elapsed so that we could compare the efficiency of both methods when they were fed with different input data sets. The input is nothing more than a text file .txt full of numbers in this case. Can be any data you want. Each test case we passed a text file with different random numbers and different quantity of numbers. For example, in a test case we passed a file named 2500.txt, that means 2500 random numbers. In another test case we passed other file named 7500.txt as so on. I think you got it. Doing so we could compare how well the algorithms were performing.

To generate the random numbers we used an Excel spreadsheet with the formula =RAND()*1000000. For each set of data we generated new numbers and copied and pasted those numbers into the text files that are the input for our program. During a coursework as this one we get to learn everywhere, even a new formula in Excel. It's really good. ;-)

Again, I searched for a timing class that I could reuse with the code and for sure I found it. I didn't use it at all but I used it to learn about how to measure time in C++. It's amazing how fast you can implement something. Much of the things you need related to programming are already implemented. You just have to search for it as is what you're doing here, I think! You searched for the subject of this post and here you are seeing something implemented. Try to learn from it and just don't copy the entire work and think that you know about it. It's wrong. Try to understand what the code is doing. Dive into the theory because it explains the inner essence.

The code that follows is well commented which is something every developer should do. You see, it was three years ago when we worked with this code. Today it's difficult to remember every step I took. The comments helped me to remember almost everything.

Bellow I present the quick sort method we borrowed from MSDN (we adapted it to fit our case). Note the use of the Partition method (explained in the accompanying paper):

// QuickSort implementation
void QuickSort(char** szArray, int nLower, int nUpper)
{
 // Check for non-base case
 if(nLower < nUpper)
 {
   // Split and sort partitions
   int nSplit = Partition(szArray, nLower, nUpper);
   QuickSort(szArray, nLower, nSplit - 1);
   QuickSort(szArray, nSplit + 1, nUpper);
 }
}

// QuickSort partition implementation
int Partition (char** szArray, int nLower, int nUpper)
{
 // Pivot with first element
 int nLeft = nLower + 1;
 char* szPivot = szArray[nLower];
 int nRight = nUpper;

 // Partition array elements
 char* szSwap;
 while(nLeft <= nRight)
 {
   // Find item out of place
   while(nLeft <= nRight && strcmp (szArray[nLeft], szPivot) <= 0)
     nLeft = nLeft + 1;
   while (nLeft <= nRight && strcmp (szArray[nRight], szPivot) > 0)
     nRight = nRight - 1;

   // Swap values if necessary
   if(nLeft < nRight)
   {
     szSwap = szArray[nLeft];
     szArray[nLeft] = szArray[nRight];
     szArray[nRight] = szSwap;
     nLeft = nLeft + 1;
     nRight = nRight - 1;
   }
 }

 // Move pivot element
 szSwap = szArray[nLower];
 szArray[nLower] = szArray[nRight];
 szArray[nRight] = szSwap;
 return nRight;
}

Now see the binary search method implementation that we did:

int BinarySearch(char** szArray, char key[], int nLower, int nUpper)
{
 // Termination case
 if(nLower > nUpper)
   return 0;

 int middle = (nLower + nUpper) / 2;

 if(strcmp(szArray[middle], key) == 0)
   return middle;
 else
 {
   if(strcmp(szArray[middle], key) > 0)
     // Search left
     return BinarySearch(szArray, key, nLower, middle - 1);
   // Search right
   return BinarySearch(szArray, key, middle + 1, nUpper);
 }
}

The next ones are the method prototypes and the main entry point that calls a menu. According to the user passed parameters we call the quicksort and the binary search methods:

// Function prototypes
void Menu(void);
void QuickSort(char** szArray, int nLower, int nUpper);
int Partition(char** szArray, int nLower, int nUpper);
int BinarySearch(char** szArray, char key[], int nLower, int nUpper);

// Application initialization
void main(void)
{
 char op;

 do
 {
   Menu();
   printf("\n\nDo you wanna a new QuickSort? Y/N");
   op = getche();
   if(islower(op))
     op = toupper(op);
 }
 while(op == 'Y');
}

void Menu(void)
{
 // Clear screen
 system("CLS");

 // Control execution time
 clock_t initial, final;

 // Print startup banner
 printf("\nQuickSort C++ Sample Application\n");
 printf("Copyright (c)2001-2002 Microsoft Corporation. All rights reserved.\n\n");
 printf("MSDN ACADEMIC ALLIANCE [http://www.msdnaa.net/]\n\n");
 printf("BinarySearch C++ Sample Application\n");
 printf("Copyright (c)2005 Leniel Braz de Oliveira Macaferi & Wellington Magalhaes Leite.\n");
 printf("UBM COMPUTER ENGINEERING - 5TH SEMESTER [http://www.ubm.br/]\n\n");

 // Describe program function
 printf("This program example demonstrates the QuickSort and BinarySearch algorithms by\n");
 printf("reading an input file, sorting its contents, writing them to a new file and\n");
 printf("searching on them.\n\n");

 // Prompt user for filenames
 char szSrcFile[1024], szDestFile[1024];
 printf("Source: ");
 gets(szSrcFile);
 printf("Output: ");
 gets(szDestFile);

 // Read contents of source file
 const long nGrow = 8;
 long nAlloc = nGrow;
 long nSize = 0;
 char** szContents = new char* [nAlloc];
 char szSrcLine[1024];
 FILE* pStream = fopen(szSrcFile, "rt");

 while(fgets(szSrcLine, 1024, pStream))
 {
   // Trim newline character
   char* pszCheck = szSrcLine;
   while(*pszCheck != '\0')
   {
     if(*pszCheck == '\n' && *(pszCheck + 1) == '\0')
       *pszCheck = '\0';
     pszCheck++;
   }

   // Append to array
   szContents[nSize] = new char [strlen(szSrcLine) + 1];
   strcpy(szContents[nSize], szSrcLine);
   nSize = nSize + 1;

   if(nSize % nGrow == 0)
   {
     // Resize the array
     char** szPrev = szContents;
     nAlloc += nGrow;
     szContents = new char* [nAlloc];
     memcpy(szContents, szPrev, nSize * sizeof(char*));
     delete szPrev;
   }
 }
 fclose (pStream);

 initial = clock();

 // Pass to QuickSort function
 QuickSort(szContents, 0, nSize - 1);

 final = clock();

 // Write sorted lines
 pStream = fopen (szDestFile, "wt");
 for(int nIndex = 0; nIndex < nSize; nIndex++)
 {
   // Write line to output file
   fprintf (pStream, "%s\n", szContents[nIndex]);
 }
 fclose (pStream);

 // Report program success
 printf("\nThe sorted lines have been written to the output file.\n\n");

 // QuickSort execution time
 double duration = (double)(final - initial) / CLOCKS_PER_SEC;

 printf("The QuickSort execution time was: %2.9lf s = %.0lf ms = %.0lf \xE6s\n\n", duration, duration * 1000, duration * 1000000);

 char op = '\0';

 do
 {
   printf("Do you wanna a BinarySearch to locate a specific key? Y/N");

   op = getche();
   if(islower(op))
     op = toupper(op);
   if(op == 'Y')
   {
     printf("\n\nType the key you want to search for: ");
     char key[1024];
     gets(key);

     initial = clock();

     if(BinarySearch(szContents, key, 0, nSize - 1))
     {
       final = clock();

       duration = (double)(final - initial) / CLOCKS_PER_SEC;

       printf("\nKey found!\n\n");

       printf("The BinarySearch execution time was: %2.9lf s = %.0lf ms = %.0lf \xE6s\n\n", duration, duration * 1000, duration * 1000000);

     }
     else
     {
       final = clock();

       duration = (double)(final - initial) / CLOCKS_PER_SEC;

       printf("\nKey not found!\n\n");

       printf("The BinarySearch execution time was: %2.9lf s = %.0lf ms = %.0lf \xE6s\n\n", duration, duration * 1000, duration * 1000000);

     }
   }
   else
   {
     // Deallocate entire array
     for(int nIndex = 0; nIndex < nSize; nIndex++)
       // Delete current array element
       delete szContents[nIndex];

     delete szContents;
     szContents = NULL;
   }
 }
 while(op == 'Y'); 
}

Visual Studio C++ Console Application

You can get the project files at:
http://leniel.googlepages.com/QuicksortBinarySearchCPlusPlus.zip

Random number generator

You can get the spreadsheet responsible for this task at:
http://leniel.googlepages.com/QuicksortBinarySearchRandomNumGen.xls

How to use it?

To use the program:
  1. Enter the name of a file that contains unsorted data;
  2. Use the sample files included in the .ZIP package as: 1000.txt and 2500.txt;
  3. In the command line "Source", type: 1000.txt;
  4. In the command line "Output", type a name to the file that will be sorted. e.g.: sorted.txt;
  5. After the sorting process, choose if you want or not to execute a Binary Search. If yes, provide a value to be searched. If not, choose if it is or not desired to execute a new Quicksort.

Postscript:
- To generate random numbers, use the file Random numbers generator.xls file;
- The file QuicksortBinarySearch.cpp contains the source code. The same can be used freely. Mention the authors.

Efficiency comparison

For the sake of comparison I've run some test cases with different input files. See the result in the table that follows:

Quicksort and Binary search performance
n File name File size (bytes) Timing (milliseconds)
Quicksort Binary search
10000 10000.txt 122.880 16 0
25000 25000.txt 200.704 78 0
50000 50000.txt 401.408 219 0
75000 75000.txt 602.112 360 0
100000 100000.txt 802.816 516 0

It's important to note that the time the quicksort takes appears to be longer but it is not. Why? Because the the program needs to read the file content and write the sorted data back to the output file so that it appears to take longer than the milliseconds shown on the above table. The timing functions just operate while the quicksort is running.

For the the binary search key I've input a value localized in the beginning of the sorted file, in the middle and in the end. There was no time changes. The binary search found the key I entered with a time less than (0 µs - microsecond). I have an AMD Athlon XP 2400 with 512 MB RAM.

See a screenshot of the last test case:

QuicksortBinarySearchCPlusPlusTestCase

The paper

You can get a copy of the paper in the .PDF format at:
http://leniel.googlepages.com/QuicksortAndBinarySearchAlgorithms.pdf