Showing posts with label parallel programming. Show all posts
Showing posts with label parallel programming. Show all posts

Tree Graph Ordered Traversal Level by Level in C#

Recently as part of a job interview process, I was asked to solve some programming problems. This post shows the solution for one of such problems.

The problem ( or could we call it an algorithm exercise? ) is this:

Consider a tree of integers. Knowing that its root node is 0, and given its adjacency list as a two dimensional array of integers, write a function that prints out the elements/nodes in order/level by level starting from the root. That is, the root is printed in the first line, elements that can be reached from the root by a path of distance 1 in the second line, elements reached by a path of distance 2 in the third line, and so forth. For example, given the following adjacency list (draw the tree for a better view):

0 => 1, 2, 3
1 => 0, 4
2 => 0
3 => 0, 5
4 => 1, 6
5 => 3
6 => 4

The program should print:

1 2 3
4 5

Little bit of theory
If you read about Tree in Graph theory, you’ll see that we can represent a tree using a graph because a tree is an undirected graph in which any two vertices are connected by exactly one simple path. In other words, any connected graph without cycles is a tree.

The tree in this problem isn’t a binary tree, it’s a n-ary tree.

With theory in mind, here goes my proposed solution…

I’m reusing some code from past posts. In special, the Graph, AdjacencyList, Node, NodeList and EdgeToNeighbor classes.

I use this method to fill a Graph with the Tree structure:

/// <summary>
/// Fills a graph with a given tree structure.
/// </summary>
/// <param name="graph"></param>
private static void FillGraphWithTreeStructure(Graph graph)
    // Vertexes
    graph.AddNode("0", null);
    graph.AddNode("1", null);
    graph.AddNode("2", null);
    graph.AddNode("3", null);
    graph.AddNode("4", null);
    graph.AddNode("5", null);
    graph.AddNode("6", null);

    // Edges
    graph.AddDirectedEdge("0", "1");
    graph.AddDirectedEdge("0", "2");
    graph.AddDirectedEdge("0", "3");

    graph.AddDirectedEdge("1", "4");

    graph.AddDirectedEdge("4", "6");

    graph.AddDirectedEdge("3", "5");

    /* This is the tree:
          / | \
         1  2  3
        /       \
       4         5
        This is the expected output:
        Level 1 = 0
        Level 2 = 1 2 3
        Level 3 = 4 5
        Level 4 = 6


This is the method that does the hard work:

/// <summary>
/// Performs an ordered level-by-level traversal in a n-ary tree from top-to-bottom and left-to-right.
/// Each tree level is written in a new line.
/// </summary> 
/// <param name="root">Tree's root node</param>
public static void LevelByLevelTraversal(Node root)
    // At any given time each queue will only have nodes that
    // belong to a level
    Queue<Node> queue1 = new Queue<Node>();
    Queue<Node> queue2 = new Queue<Node>();


    while (queue1.Count != 0 || queue2.Count != 0)
        while (queue1.Count != 0)
            Node u = queue1.Dequeue();


            // Expanding u's neighbors in the queue
            foreach (EdgeToNeighbor edge in u.Neighbors)


        while (queue2.Count != 0)
            Node v = queue2.Dequeue();


            // Expanding v's neighbors in the queue
            foreach (EdgeToNeighbor edge in v.Neighbors)


To spice things up I have implemented a Parallel version of the above method using a ConcurrentQueue:

/// <summary>
/// Performs an ordered level-by-level traversal in a n-ary tree from top-to-bottom and left-to-right in Parallel using a ConcurrentQueue.
/// Each tree level is written in a new line.
/// </summary> 
/// <param name="root">Tree's root node</param>
public static void LevelByLevelTraversalInParallel(Node root)
    // At any given time each queue will only have nodes that
    // belong to a level
    ConcurrentQueue<Node> queue1 = new ConcurrentQueue<Node>();
    ConcurrentQueue<Node> queue2 = new ConcurrentQueue<Node>();


    while (queue1.Count != 0 || queue2.Count != 0)
        while (queue1.Count != 0)
            Node u;
            queue1.TryDequeue(out u);


            // Expanding u's neighbors in the queue
            foreach (EdgeToNeighbor edge in u.Neighbors)


        while (queue2.Count != 0)
            Node v;
            queue2.TryDequeue(out v);


            // Expanding v's neighbors in the queue
            foreach (EdgeToNeighbor edge in v.Neighbors)


Now it’s time to measure the execution time using a StopWatch:

private static void Main(string[] args)
    Graph graph = new Graph();


    Stopwatch stopWatch = new Stopwatch();




    // Write time elapsed
    Console.WriteLine("Time elapsed: {0}", stopWatch.Elapsed);

    //Resetting the watch...




    // Write time elapsed
    Console.WriteLine("Time elapsed: {0}", stopWatch.Elapsed);


Now the results:

1 2 3
4 5
Time elapsed: 00:00:00.0040340

1 2 3
4 5
Time elapsed: 00:00:00.0020186

As you see, time is cut by a factor of 2. I currently have a Core 2 Duo processor in my Mac mini.

Hope you enjoy it and feel free to add your 2 cents to improve this code! Of course there are other ways of solving this very problem and I would like to see those other ways. Do you have any other better idea?

You can get the Microsoft Visual Studio Console Application Project at:

To try out the code you can use the free Microsoft Visual C# 2010 Express Edition that you can get at:

Parallel LINQ (PLINQ) with Visual Studio 2010/2012 - Perf testing

On the last day of May I wrote about how to calculate prime numbers with LINQ in C#. To close that post I said that I’d use the PrimeNumbers delegate to evaluate PLINQ (Parallel LINQ) and measure the performance gains when the same calculation is done in parallel instead of in a sequential fashion.

PLINQ is LINQ executed in Parallel, that is, using as much processing power as you have in your current computer.

If you have a computer with 2 processor cores like a dual core processor you'll get your Language Integrated Query operators do the work in parallel using both cores.

Using "only" LINQ you won't get as much performance because the standard Language Integrated Query operators won't parallelize your code. That means your code will run in a serial fashion not taking advantage of all your available processor cores.

There are lots of PLINQ query operators capable of executing your code using well known parallel patterns.

After this brief introduction to PLINQ let’s get to the code.

As promised, today I show the performance gains when the PrimeNumbers delegate is run in 2 cores (parallel) instead of only 1 core (sequential).

Here’s the delegate code:

Func<int, IEnumerable<int>> PrimeNumbers = max =>
from i in Enumerable.Range(2, max - 1)
where Enumerable.Range(2, i - 2).All(j => i % j != 0)
select i;  

To make it a candidate to parallelization we must just call the AsParallel() extension method on the data to enable parallelization for the query:

Func<int, IEnumerable<int>> PrimeNumbers = max =>
from i in Enumerable.Range(2, max - 1).AsParallel()
where Enumerable.Range(2, i - 2).All(j => i % j != 0)
select i;  

I set up a simple test to measure the time elapsed when using the two possible ways of calling the delegate function, that is, sequentially in one core and parallelized in my two available cores (I have an Intel Pentium Dual Core E2180 @ 2.00 GHz / 2.00 GHz).

Let’s calculate the prime numbers that are less than 50000 sequentially and in parallel:

IEnumerable<int> result = PrimeNumbers(50000);
Stopwatch  stopWatch = new Stopwatch();


foreach(int i in result)


// Write time elapsed
Console.WriteLine("Time elapsed: {0}", stopWatch.Elapsed);

Now the results:

1 core
Time elapsed: 00:00:06.0252929

2 cores
Time elapsed: 00:00:03.2988351

8 cores*
Time elapsed: 00:00:00.8143775

* read the Update addendum bellow

When running in parallel using the #2 cores, the result was great - almost half the time it took to run the app in a sequential fashion, that is, in only #1 core.

The whole work gets divided into two worker threads/tasks as shown in Figure 1:

Prime Numbers PLINQ Parallel Stacks Window ( #2 cores )
Figure 1 - The Parallel Stacks window in Visual Studio 2010 ( #2 cores )

You can see that each thread is responsible for a range of values (data is partitioned among available cores). Thread 1 is evaluating the value 32983 and Thread 3 is evaluating 33073. This all occurs synchronously.

If I had a computer with 4 cores, the work would be divided into 4 threads/tasks and so on. If the time kept decreasing I’d achieve 1.5 seconds to run the app. Fantastic, isn’t it?

The new Microsoft Visual Studio 2010 (currently in Beta 2) comes with great debugging tooling for parallel applications as for example the Parallel Stacks shown in Figure 1 and the Parallel Tasks window shown in Figure 2:

Prime Numbers PLINQ Parallel Tasks Window ( #2 cores )
Figure 2 - The Parallel Tasks window in Visual Studio 2010 ( #2 cores )

This post gives you a rapid view of PLINQ and how it can leverage the power of you current and future hardware.

The future as foreseen by hardware industry specialists is a multicore future. So why not get ready to it right now? You certainly can with PLINQ. It abstracts all the low level code to get parallel and let’s you focus on what’s important: your business domain.

If you want to go deep using PLINQ, I advise you to read Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4 by Stephen Toub.

Updated on February 15, 2013

Running this same sample app on a Intel Core i7-3720QM 2.6GHz quad-core processor (with #4 cores and #8 threads) this is the result:

Time elapsed: 00:00:00.8143775

This is on a par with the #1 core and #2 cores tests shown above. The work is being divided "almost" evenly by 8 if we compare with the first benchmark ( only #1 core ).

00:00:06.0252929 / 8 = 0.7525

Of course there’s been lots of improvements between these different processor generations. The software algorithms used to parallelize the work have also been improved (now I’m running on Visual Studio 2012 with .NET 4.5). Both hardware/software specs are higher now. Nonetheless these numbers give a good insight about the performance gains both in terms of hardware and software. Software developers like me have many reasons to celebrate! Party smile

Prime Numbers PLINQ Parallel Tasks Window ( #8 threads )Figure 3 - The Parallel Stacks window in Visual Studio 2012 ( #8 threads )

If I take out the .AsParallel() operator, the program runs on a single core and the time increases substantially:

Time elapsed: 00:00:03.4362160

If compared with the #4 cores benchmark above, we have:

00:00:03.4362160 / 4 = 0.8575

0.8575 – 0.8143 = 0.0432 (no difference at all)

Note: this faster processor running on a single core has a performance equivalent to the old Intel #2 core processor. Pretty interesting.

Features new to parallel debugging in VS 2010
Debugging Task-Based Parallel Applications in Visual Studio 2010 by Daniel Moth and Stephen Toub

Great lecture on what to expect from the multicore and parallel future…
Slides from Parallelism Tour by Stephen Toub

PLINQ documentation on MSDN

Parallel Computing Center on MSDN

Daniel Moth’s blog

Microsoft Visual Studio 2010

New job at ITA-Petrobras

Last month was a busy one. On February 11th I started working on a project called Galileu. This project is being implemented by a joint venture between Petrobras and a group of universities. Amongst those universities is the Aeronautics Technological Institute (ITA).

I got such job opportunity through ITA's mechanical engineering department, more specifically the Computational Transport Phenomena Laboratory. I received an email from its group leader Marcelo de Lemos. I answered the email right way sending him my resume. They needed someone with expertise in programming languages and high performance computing (HPC). I was eagerly waiting a job opportunity because I finished the computer engineering course and there was no place to work. This opportunity was just what I needed.

My work is really exciting. Firstly I was given material about the message passing interface (MPI) and the Rocks clusters distribution for the Linux platform. After that I started playing with the Linux cluster, which was already installed. Who installed it was a great friend I met at the lab. His name is Arkady Petchenko.

As a lab we experiment with the available HPC platforms. The last two weeks I've been playing with the Windows Computer Cluster Server 2003. I was responsible for installing the Windows cluster.

I'm also doing development of code in C/C++ and Fortran. I compile and build applications that can explore the full potential of a cluster through the MPI API.

It's fantastic to see how a parallel program performs on different numbers of processors.

In future posts I'd like to discuss about the aspects of the technologies I work with. It's great stuff! I love programming languages and the parallel computing world is a fascinating one. With the increase of the number of processors in a single chip, we're going to see a vast amount of parallelized code being executed on distributed computing systems. Parallel APIs as the Microsoft Parallel Extensions for .NET framework are evolving rapidly. This will for sure change the way we think when working with code and specially with clusters and supercomputers.