Archive for category Parallel Extensions

How awesome is System.Threading.Tasks.Parallel class?

 

The short answer is “profoundly awesome”. This class is introduced in .NET 4.0 and allows you to execute tasks parallelly using threads without you even thinking about thead-way at all.  Recently I worked on a tool which fetches data from database and persists that data as files in the file system. Following is the simplified version of the data in the database tables (Folder and Content)

Folder table :-
image 
Above data suggests that Folder2 is created under Folder1.

Content table :-
image

And the requirement is to persist the content of “MyContent” as MyContent.Html under Folder2/Folder1. This is a simplified example and ofcourse, there were much deeper levels of folders.

Now one way to go about this is :-
a) First create all the folders.
b) Get all contents one by one from the database and persist each of them in the Html file under the corresponding folder.

for the point b) the usual way we would write is something like below :-

  DataSet ContentIdsAndPathsDS = GetAllContentIdsAndPaths();
if (ContentIdsAndPathsDS != null && ContentIdsAndPathsDS.Tables.Count > 0)
{
                   DataRowCollection rows = ContentIdsAndPathsDS.Tables[0].Rows;
                   Stopwatch watch = Stopwatch.StartNew();
                   watch.Start();
                    foreach (DataRow row in ContentIdsAndPathsDS.Tables[0].Rows)
                    {
                      if (row["ContentId"] != DBNull.Value && row["Path"] != DBNull.Value)
                        {
                               string contentId = row["ContentId"].ToString();
                               string fileFullPath = rootFolderPath + row["Path"].ToString() + extensionOfFilesCreated;
                               OutputSingleContent(contentId, fileFullPath);
                         }
                     }
watch.Stop();
                    string message = rows.Count.ToString() + " contents have been written to the disk and It took " + (watch.ElapsedMilliseconds / 1000).ToString() + " seconds";
}

Here is how you would achieve things much faster ( we will see “how much” fast ) using Parallel.ForEach() method.

  DataSet ContentIdsAndPathsDS = GetAllContentIdsAndPaths();
if (ContentIdsAndPathsDS != null && ContentIdsAndPathsDS.Tables.Count > 0)
{
                   DataRowCollection rows = ContentIdsAndPathsDS.Tables[0].Rows;
                   Stopwatch watch = Stopwatch.StartNew();
                   watch.Start();
                   Parallel.ForEach
                       (
                            documentsAndElementsDS.Tables[0].AsEnumerable(), row =>
                            {
                               if (row["ContentId"] != DBNull.Value && row["Path"] != DBNull.Value)
                                {
                                    string contentId = row["ContentId"].ToString();
                                    string fileFullPath = rootFolderPath + row["Path"].ToString() + extensionOfFilesCreated;
                                    OutputSingleContent(contentId, fileFullPath);
                                }
                            }
                        );watch.Stop();
                    string message = rows.Count.ToString() + " contents have been written to the disk and It took " + (watch.ElapsedMilliseconds / 1000).ToString() + " seconds";
}

I highlighted the code in both cases just to indicate that the Parallel.ForEach() executes the same code as our regular ForEach. By the way, following is the method definition which both foreach and Parallel.ForEach() call :-

static void OutputSingleContent(string contentId, string fileFullPath)
       {
           try
           {
               WriteContentToFile(fileFullPath, GetFormattedContent(contentId));
               //WriteContentToFile(logFileFullPath, "File created :- " + fileFullPath, true);
               Console.WriteLine("File created :- " + fileFullPath);
           }
           catch (Exception)
           {
               throw;
           }
       }

The result:-
Following is the result on my Intel(R) Core 2 Duo 2.20 GHz machine:-
image

As you see the Parallel.ForEach is almost 4 times faster than the regular one. The numbers might be diffrent in your machine. However, Parallel.ForEach would still be way faster.

Why it is faster:-
When the program is run, there is only one thread and when you use regular foreach, only that main thread performs the specified operation in each iteration. However, when you use Parallel.ForEach(), a worker thread is used from the threadpool for each iteration and those worker threads would “independently” do the things they have been asked to do.

Debugging parallel programs in VS.NET 2010 is fun. So just for fun, put a berakpoint on the line you have Parallel.ForEach and the start of OutputSingleContent() and debug. It would first hit Parallel.ForEach().

image 
and while you are in debugging mode, open Debug > Windows > Parallel tasks [note that this window is available only when you are in debugging mode]image
You look at your parallel task window and It should show something like this. Note that as the execution is stuck at the Parallel.ForEach() breakpoint At this time, no task has been started in parallel mode.
image 

The above parallel stack window indicates that there is only one thread running as of now(the 3 threads are for running internals methods of .NET). If you right click on the main thread box and click on the “Show External Code”, It would show you all those method calls and this is all regular stuff which happens even if you don’t use threads explicitely or Parallel class.
image
But If you step into OutputSingleContent() twise or more, you should see more worker threads under the Parallel stacks window.

image 
Parallel stacks window indicates that there are 2 threads executing OutputSingleContent() method independently.

When do you use this :-
Have a look at the this document which recommends that one should execute  “independent” tasks  with Parallel.ForEach(). In the above example, getting data and saving to a file for each content is an independent operation for each content.

Parallel.ForEach() is just one thing. We can look at msdn to explore more about parallel computing.

Hope this help you in getting started with parallel programming in .NET.

Leave a comment

Random Thoughts

The World as I see it

Simple Programmer

Making The Complex Simple

Ionic Solutions

Random thoughts on software construction, design patterns and optimization.

Long (Way) Off

A tragic's view from the cricket hinterlands