Systematic Gaming

January 4, 2009

Performance: C# vs C++ – Revisted

Filed under: c++ vs c#, optimization, profiling — Tags: , , — systematicgaming @ 2:18 am

I thought I’d comment on my own post about C# vs C++ performance.  The purpose of the test was to compare the time to sort 128-byte objects in C++ vs C#, nothing more.  Again if you understood that, then you can propbably see that the performance measured has more to do with execution envrionments than languange differences.

That said, I’d like to address out the following issues in my testing:

  1. I used a char array in C# to simulate a similar char array in C++.  This incorrect since C# chars are 16 bits while C++ they are only 8 bits.  I admit this was pretty sloppy.  Replacing char with byte fixes this, and the result is the C# version runs about 15% faster.  Faster, yes, but still not fast enough to make a significant difference.  The most striking thing is that halving the memory usage only gives a 15% difference, in C++ the memory had direct linear impact on runtime.
  2. Some people complained about using the fixed keyword to make the array part of the Data struct.  It’s trivial to replace it with a number of int or float to create a 128-byte struct and makes no difference in performance.  The point was to approximate C++ and to represent a larger class, fixed was a simple way to do this.
  3. Some people complained this isn’t an apple to apple comparison – I disagree.  It’s about as direct a way as possible of comparing two very different languages and operating environments – perform the same task and see how long it takes.
  4. I was running Visual Studio 2005 SP1 for both tests, newer/different versions may have varying results.

The purpose of the previous post isn’t to claim that C++ is better than C#, but many people seem to think that I am – I suppose this is what I get for posting such an inflammatory title. I use C# on a daily basis and like it.

There were a number of comments on my last post claiming to “debunk” my tests referring to what is  posted here.  This debunking itself admits it’s comparing apples to oranges – however it completely misses the point of creating a fixed size array in the first place.   Replacing the fixed array with a heap pointer shows nothing about C# performance – only that copying a smaller struct is faster than copying a larger one.  This is a fact I addressed previously.

The claim is made that this test isn’t representative of what a C# programmer would write.  True, a C# programmer would very rarely used a struct with a fixed array.  However, it’s fairly easy to create objects with 128-bytes or more of data.  That’s 32 4-bytes variables, which isn’t a huge amount of variables to have in a class, especially one with a few parent classes each with a few member variables.

Anyways, other that the char/byte mistake, I stand by the original results – C# takes about 10 times longer to sort an array on my computer, compared to a similar C++ verision.

Advertisements

37 Comments »

  1. When benchmarking it’s important to set a goal, and have each language persue it’s own implementation toward that goal. So for instance “sort this data”. The means by which that is done will vary from language to language.

    One can consider going the other way and producing a .NET implementation of a solution that would be more challengine to implement *literally* in C++

    I thought the “debunk” post a very fair answer to the issue you outlined. You’re concentrating on making a .NET implementation tie eactly as possible, this isn’t meaningful because it’s not something anybody does.

    You were given a better solution for sorting an array on the .NET platform. One that produced interesting results. Was the method suggested to you 10 times slower than the C++ implementation on your computer?

    Comment by Guy — January 4, 2009 @ 12:38 pm

    • When benchmarking it’s important to set a goal, and have each language persue it’s own implementation toward that goal. So for instance “sort this data”. The means by which that is done will vary from language to language.

      No, benchmarking has nothing to do with goals, it’s a process of measuring and comparing results. The purpose of my test was not to provide an optimal sort but to compare two very similar algorithms in C++ and C#. The “means by which that is done” may vary from language to language, but the output of the tests cannot or the comparison is invalid.

      I performed near identical algorithms in C++ and C# – sorting a contiguous array of objects. The “debunk” does not output a contiguous array of objects and therefore is not a useful comparison. The “debunk” code outputs a sorted list of keys that point to objects – a very different data structure.

      You’re concentrating on making a .NET implementation tie eactly as possible, this isn’t meaningful because it’s not something anybody does.

      It is meaningful because having your data in a single, contagious block of memory is something that people do, both in C++ and C#. If you want some examples of why I suggest you read some of my posts on optimization, caching and memory.

      Comment by systematicgaming — January 4, 2009 @ 1:18 pm

  2. You are trying to program C# as C++ and that is the mistake. With Java happened same shit, other C++ programmers that went to Java they want to program C++ with Java and that is a mess. Their code sucks and of course the performance is really bad. The bad reputation of Swing in the past was because the same problem trying to program C or C++ with Java, They really did not understand how to program Swing but after the years everything changed.

    Don’t live in the past my friend all this happened before and again and again, Assembly was king for game programming and they said C and C++ ware to slow for real game programmers but that changed and it will change again. Check LLVM is a great VM with high performance and you can code with C++,Java, Python etc to it, All it will change believe me.

    Adapt your self to C# patterns and use the garbage collector and you will see your code will be great and with great performance maybe getting the speed of native code as C++.

    Comment by OtengiM — January 4, 2009 @ 2:00 pm

  3. > True, a C# programmer would very rarely used a struct with a fixed array. However, it’s fairly easy to create objects with 128-bytes or more of data. That’s 32 4-bytes variables, which isn’t a huge amount of variables to have in a class, especially one with a few parent classes each with a few member variables.

    Note the difference between a class and a struct. As you say, it’s pretty easy to get an object (that is, a class) over 128 bytes using inheritance (31 field, actually, not 32 — you need 4 bytes for the tag).

    But 1) there is no inheritance for structs, so you can’t get a struct that big “accidentally”; 2) every introductory article about structs will tell you never to have structs over 32 bytes; 3) in the .Net version you use, methods which do anything with structs can’t be inlined at all; 4) many other optimizations available for classes are not available for structs.

    Comment by Alexey Romanov — January 4, 2009 @ 4:23 pm

    • There is a need for contiguous arrays of structs in C#. You need to look no further than VertexBuffers in XNA. (Example: Here)
      There is a need to sort these arrays to get proper results (back-to-front rendering for alpha blending).
      Therefore the time taken sorting these arrays is important.

      Do you have any reference for #3? I’ve never heard that methods dealing structs cannot be inlined.
      What optimizations can be applied to classes that can’t be applied to structs? (Other than the obvious pass-by-value/reference)

      Comment by systematicgaming — January 4, 2009 @ 11:30 pm

  4. Because leaving your data alone in a single block of memory and sorting the keys/pointers to the data is a bad idea?

    Genuine question, not baiting.

    Optomising caching and data is truthfully beyond me, but I suspect that most .NET implementation approaching this are going to be playing with memory streams no?

    The way you approach this problem in anything from LISP to Erlang is going to be fundamentally different because the platforms are very different. Just because C# looks a lot like C++ doesn’t mean porting algorithms directly to solve problems is any more meaningful the doing so to Python or you end up fighting the end platform not working with it… The .NET platform is quite different.

    Describe a problem and goal, then look for how that might be best accomplished in C# otherwise you are engaging in a very ellaborate straw man argument… Then compare the best solutions in C# and C++ and compare.

    And I’m sorry but benchmarking most definately is goal orientated or you’re way off in the long grass, I suggest you think about your assertion there… You revised your expressed goals in your reply here.

    Comment by Guy — January 4, 2009 @ 6:25 pm

    • Because leaving your data alone in a single block of memory and sorting the keys/pointers to the data is a bad idea?

      No, it’s usually a good idea, and I had an entire post to confirm that idea (which was the C++ Cache Pressure post). However, it’s clearly not the same algorithm, and may not be appropriate in all cases – sometimes you need a single block of memory. (Also, the “debunk” C# version doesn’t do this – it’s not indexing a single array, but has multiple arrays, however if could be trivially fixed to do this).

      Describe a problem and goal, then look for how that might be best accomplished in C# otherwise you are engaging in a very ellaborate straw man argument… Then compare the best solutions in C# and C++ and compare.

      Umm, I suggest you look back at my Cache Pressure post, the “best” C# solution given in the “debunk” post gets smacked around by the same solution in C++. I really have no language bias – but the fact remains, C++ is much faster for sorting. I’m not sure why that seems to anger people, the facts speak for themselves.

      Also, I think you’re confusing benchmarking with design or optimization. Benchmarking is just a small part of a bigger picture, the only goal is the timing.

      Comment by systematicgaming — January 4, 2009 @ 11:25 pm

  5. I’m sorry, but you’ve got to use the native idioms of the languages in question. The “apples to apples” point that you try to make is specious; language implementations and runtimes are optimized for idiomatic code, not “C++ converted into C#”.

    Comment by Barry Kelly — January 4, 2009 @ 9:11 pm

    • I took clear requirements – output a contiguous array of data sorted by the key – and implemented this in C++ and C# and timed the results. While the C# may not be optimized for such a task, it is sometimes required in a C# environment, and thus the performance of C# in this task is useful information. It has nothing to do with “using the language” properly or not.

      Comment by systematicgaming — January 4, 2009 @ 11:46 pm

  6. I’m afraid not many are going to be on your side in this one. Being this picky about “apples-to-apples” invites many issues. Imagine what would happen if you tried to apply the standard C# idioms used in the “debunking” to C++ with this level of stringency!

    1. You’d be either using an add-on garbage collector or using SharedPtr everywhere.

    2. You’d have to add a type tag/vtable to your “array” type because introspectable arrays are not apples with compared to c++ arrays.

    3. The array wouldn’t live in the class. You’d have to use something like SharedPtr to get the copy semantics closer.

    4. You’d have to either use or avoid using virtual dispatch to invoke the sort function in both cases. If templates are apples, then interfaces are barely fruit.

    You’re kind of coming off as the douchebag here. You published something, it was decredited, publish something better. There’s a lot of C++ bias going on in this article:

    1. Using esoteric, uncommonly used C# features that are known to have an adverse impact on performance and are primarily provided to interop with C/C++ for a pure managed code benchmark isn’t fair.

    2. Using 2005 isn’t fair. Microsoft is putting way more effort into their compilers/runtime for managed code than C++ these days. It stands to reason that from 2005-2008, .net would have improved considerably more than C++ in performance.

    3. If you’re going to write it the idiomatic c++ way then do a half-assed job of converting that to C# that resembles it, at least do the reverse test and write some idiomatic C# code and do a half-assed job at converting it into unrealistically bad C++ code and benchmark that too.

    Comment by Brian L. — January 4, 2009 @ 11:50 pm

    • I don’t think I’m being too picky – I’ve set clear requirements: given a contiguous array of data output a sorted contiguous array of data. For a real-world use case you only have to look at vertex buffers in XNA – sorted for proper alpha blending. The “debunked” version does not meet these requirements.

      This has nothing to do with garbage collection, shared pointers or anything of the sort. It’s about data structures (arrays) and algorithms (sorting), two very language agnostic topics.

      1. Using esoteric, uncommonly used C# features that are known to have an adverse impact on performance and are primarily provided to interop with C/C++ for a pure managed code benchmark isn’t fair.

      It is trivial to replace the fixed array with a bunch of ints or floats – I saw no performance difference.

      2. Using 2005 isn’t fair. Microsoft is putting way more effort into their compilers/runtime for managed code than C++ these days. It stands to reason that from 2005-2008, .net would have improved considerably more than C++ in performance.

      There’s nothing unfair here. The results may be “old”, but I’ve done nothing unfair. I would also prefer than instead of claiming “it stands to reason” that we see improvement, that an actual comparison, with actual code and results be done. Since I don’t have VS2008 I can’t run the tests on that platform. However, I would be interested in seeing what improvements were made.

      3. If you’re going to write it the idiomatic c++ way then do a half-assed job of converting that to C# that resembles it, at least do the reverse test and write some idiomatic C# code and do a half-assed job at converting it into unrealistically bad C++ code and benchmark that too.

      If you have a better way of meeting the requirements than my implementation, please post it. We are all programmers here, code speaks volumes over the written word.

      Comment by systematicgaming — January 5, 2009 @ 12:48 am

  7. The problem is with improper use of the fixed keyword. Pinning causes fragmentation and hurt performance.

    If you use managed c++ with pinning pointer, probably you will see similar result with C#.

    http://www.codinghorror.com/blog/archives/000299.html

    It is correct that managed code with pinned object performs much worse than unmanaged code. In that case, either the developer must know what he is doing and accept the consequences, or he does not know how managed code works.

    Comment by Anonymous — January 5, 2009 @ 4:18 am

  8. i think you still have no idea what “fixed” does in c#. also have you got any reason for not simply using a class rather than struct?

    Comment by pelister — January 5, 2009 @ 8:37 am

    • I believe my reasons for using a struct have been well explained, I want a single contiguous array of data, not objects spread out on the heap.

      I also don’t know why people are so hung up on the use of fixed, if if bothers you just replace the Data struct with this version and run the tests for yourself:

      struct Data
      {
      int key;
      int a0, a1, a2, a3, a4, a5, a6, a7;
      int b0, b1, b2, b3, b4, b5, b6, b7;
      int c0, c1, c2, c3, c4, c5, c6, c7;
      int d0, d1, d2, d3, d4, d5, d6, d7;
      };

      I saw no performance difference.

      Comment by systematicgaming — January 5, 2009 @ 8:47 am

  9. Reference for no inlining for structs in framework version 2.0: http://blogs.msdn.com/clrcodegeneration/archive/2007/11/02/how-are-value-types-implemented-in-the-32-bit-clr-what-has-been-done-to-improve-their-performance.aspx

    This has been greatly improved in 3.5SP1.

    Note especially “Try not to create value types that contain more than 4 fields.”

    I don’t really understand your objection to classes here. If you use

    class Data {
    byte[] array;
    }

    you _do_ have a “single contiguous array of data”. This array is itself on the heap, but it will still be sorted in the same way.

    Comment by Alexey Romanov — January 5, 2009 @ 10:51 am

  10. Try doing the same benchmark where the array is a local variable instead.

    Comment by Alexey Romanov — January 5, 2009 @ 10:54 am

  11. Array of value type is contiguous, even without using fixed statement.

    Since char is a value type, it will be contiguous.

    Array of objects, the array itself is contiguous, but the objects pointed by the reference in the array itself may not.

    From msdn http://msdn.microsoft.com/en-us/library/aa289148.aspx, “The contents of an array are stored in contiguous memory.”

    Comment by Anonymous — January 6, 2009 @ 4:44 am

    • My goal was to have a single, contiguous array containing N Data structs, each being 128-bytes (ignoring the sorting key) in size. Or, simply put an array of length 128 * N bytes, representing N Data structs.

      If Data contains a “normal” array, I would have N Data structs pointing to 128 bytes of data, which would no longer be single contiguous block of memory. The “fixed” array gave me the desired layout. Alternatively I could have used 32 ints, which at least would have made people stop focusing on the “fixed” keyword, and gives the same result.

      Comment by systematicgaming — January 6, 2009 @ 7:08 am

  12. Actually, now I see what are you trying to do here, you are trying to create a fat structure, inside an array and moving it around.

    I saw your test code and there are few comments that I would like to make. First of all, you are correct, if people are hung up on the fixed keyword, but it was a validissue.

    After removing the fixed keyword, the performance increased quite significantly.

    I also replaced the fixed array with a list of integers, just like what you said, the perf is also increased (run the times three times).

    The problem is with moving the structure around. Structure is value type and live in the thread stack. One guideline when developing .Net code is to not create fat structure as it becomes more expensive to copy and to box/unbox. If you have a fat structure, you should use a reference (class). If you have a value type and you need to pass it to a method, you should use pass it as reference (ref parameter).

    In your test, Array.Sort() performs horribly because the structure was a fat structure, and Array.Sort() boxes the structure.

    Now, I have to agree that the test is valid, but I disagree with these points:
    – It is not C++ vs C#, it is the way .Net Framework works. If you write Managed C++, I think you should see similar result with C#. If you say managed vs unmanaged, that is more acceptable.
    – The C# code is not optimum, replace the struct with class, and I am sure you will see huge perf improvement.

    I understand that you tried to have close structure between two tests, but unfortunately, the way it works in unmanaged code may not be the best way to work in managed world. (Have you seen people write finalizers liberally in C#? It is common in C++, but it is bad for perf in C#).

    I would say the more ‘fair’ test to be like this. You have a data structure of 256 bytes in an array, and this array has to be sorted. With C++ and C#, please write the optimum solution. I think C# will see similar perf with .Net, if it is develop correctly.

    The most valuable thing that I think all developers should have is the ability to ‘unlearn’ old stuff while learning new stuff. .Net is a different platform, things that you know work well may not be in this platform.

    For you and whomever read your blog, please take a look at this link: http://msdn.microsoft.com/en-us/library/ms998541.aspx#scalenetchapt03_topic14
    http://blogs.msdn.com/scottholden/archive/2005/01/27/362084.aspx

    The key of why managed code performed badly in your test was due to bad data structure choice. You may did that on purpose, but a solid .Net dev will not do that.

    Anyway this has been a good exercise. Thanks for the article, the exercise, and your openness.

    Comment by Anonymous — January 7, 2009 @ 5:05 am

  13. I totally agree with the author. Every one should accept the reality, that c# is much slow than c++. And that is normal, cause c# is a interpret language. If c# is as fast as c++, it will be much stranger. If you don’t want to accept this, just write a video decoder with these 2 languages, you can get the result very easily with the FPS value, I think the real result should be much more than 10x.

    Comment by codefuns — January 7, 2009 @ 9:10 pm

  14. Maybe I made a mistake, C# seems not a interpret language like what I think. I made a small program, that do some math operations 1,000,000,000 times, and get almost same result, which is impossible for a interpret language. I think maybe it uses JIT by default on window platform? Someone said C# is 10x slower on xbox360 than on windows, so dose that mean c# is interpreted on xbox360?

    I was thinking C# is a interpret language because in the earlier time, everyone compare it with Java.

    Comment by codefuns — January 8, 2009 @ 4:23 pm

    • I’m pretty sure C# has always been JIT, it’s generally slower than C++ because of the managed environment of the CLR. As to why it’s slower on the 360, my guess would be
      1) Less effort into optimization on non-x86 platforms
      2) The 360’s PowerPC CPU is slower and less forgiving then an average home PC, so it generally runs any code slower.

      Comment by systematicgaming — January 8, 2009 @ 10:52 pm

  15. I am an ex-C++ programmer that jumps this year in a C# program. What I’ve seen in performance standpoint (I work in a CAD system, so performance there in some points is pretty critical):
    – C# code have a bigger pressure on memory because most programmers creates small objects that make the GC to be called often
    – the allocating operation on the other hand (at least when you have a longer time from running program, existing memory fragmentation, etc.) is much slower in C++. In a specific case I can get more than 10 times slower code from C++ than C# in allocations.

    The JIT starting from C# 2.0 (using it’t features… mostly generics) is like 15-20 % slower than C++. The reasons are that the most good optimizations are made by JIT compiler. The too much consuming time operations are not done by JIT but most of them are there.

    Using Linq and Generics can lead to high performance code. I had get in a specific benchmark of sorting using Mono/GCC the ratio 1/3 (in favor of GCC) timewise and using the built in code but using exceptions, things turns to make minimalist performance win for GCC.

    At last but not at least, there was a program specific for a company I’ve worked in: there was a need of fast-parsing of a protocol because the application have close to real-time requirements. And three developers implement it: 2 in C++ and one in C#. The timings were: C++ (slowest): 12 seconds, C#: 1.2 seconds, C++ (was mostly C code, fastest) = 0.3 seconds.

    The C++ code slowest was put in a profiler, we sow the bottlenecks and we was able to reduce the time of C++ (slowest) to 1.1 seconds. What the point is: you can really get better performance using C++ (or at least at level of C), but with how much? How experienced should you be? Do you like to debug dangling pointers? How about that the hardest operation in C++ code is memory management? You have 3 allocators (new, malloc/calloc, stack) that even in C# you have them accessible, most of programmers use the heap with new.

    Extra overhead that was in C# 1.0 (the auto-boxing) is greatly reduced by Generics of C# 2.0 and Linq. In LinQ to sort an array, all you have to write is: var linqExpr = from a orderby x; where A is an array and the ordering is the x field.

    Comment by Ciprian Mustiata — April 24, 2009 @ 7:00 pm

    • Assembly is faster than C (or C++) if coded properly – even the best compilers cannot generate code of the speed that a good assembly programmer can – but you dont see anyone programming in assembly any more because the speed benefit is offset by the dramatic decrease in productivity, except in very special cases like embedded applications (because every generation of hardware is 10x times faster than the last anyway).

      Think of it – MSDOS booted on an 8086 PC-XT in about 15 sec. Today’s processors are approximately 10,000 times faster, but even the best Linux boots in about 30 sec.

      Comment by Shanky — December 18, 2009 @ 6:25 am

  16. actually it depends on what the code for.. if we use the code to make a microcontroller, c is more simple and easy to implement. But for oop codes, c# wins the game..

    Comment by farhad — July 7, 2010 @ 1:19 pm

  17. I’m a novice programmer, using C# as my first language of choice. Out of curiosity I did the performace comparison too. My results are rather different from those in the original article. I’ve used standard C#, nothing optimized, as I wouldn’t have a clue how to do that anyway.

    Here’s the struct in C# I used:

    struct LargeObject: IComparable
    {
    public LargeObject(int key)
    {
    this.Key = key;
    Data = new byte[128];
    }
    public int Key;
    public byte[] Data;

    public int CompareTo(LargeObject other)
    {
    return Key.CompareTo(other.Key);
    }
    }

    Here’s the code that was used to time the sorting of the array, using the static Array.Sort(array) method:

    static double TimeTestArray(int size)
    {
    LargeObject[] array = new LargeObject[size];
    Random rnd = new Random();

    for (int i = 0; i < size; i++)
    {
    array[i] = new LargeObject(rnd.Next());
    }

    Stopwatch timer = new Stopwatch();
    timer.Start();
    Array.Sort(array);
    timer.Stop();
    return timer.Elapsed.TotalMilliseconds;
    }

    The C++ code used was as in the original code, built for 'Release' and not 'Debug'. For both C# and C++ the Visual Studio Express Editions 2010 were used, with the .NET 4.0 framework for C#. No special compiler optimization flags were set for C++.

    My results of the 'Direct Sort comparison, average of 8 iterations, in milliseconds:

    Size C++ C#
    1024 0.364173 0.5741375
    2048 0.878638 0.3914672
    4096 1.7784 0.7326459
    8192 3.78749 1.5477432
    16384 8.52266 3.2872429
    32768 19.7161 7.1406929
    65536 43.297 15.1832366
    131072 92.9336 34.4461296
    262144 201.216 71.5098787
    524288 404.462 141.903659
    1048576 851.317 289.326945

    So, it is rather untrue that C++ is 10x faster than C# for this particular test. In fact, C# is considerable faster.

    Anyone else able to reproduce this result? If so, any explanations?

    (System used: Windows XP Pro, Intel E7300 processor, 2GB of RAM)

    Comment by Novice — July 8, 2010 @ 11:25 am

  18. Something went wrong with the copy/paste.

    the struct implements IComparable<LargeObject>
    and not IComparable, the bit between the brackets was lost due to HTML…

    Comment by Novice — July 8, 2010 @ 11:31 am

    • You missed the point of the original experiment – your class doesn’t stress cache since the byte array is allocated on the heap and not part of the LargeObject class.

      Comment by systematicgaming — July 8, 2010 @ 1:52 pm

  19. OKAY, fair enough, let’s define
    cache pressure: “a term that refers to overworking the memory cache with too many or wasteful memory accesses. Basically putting pressure on the memory cache means you’re wasting time accessing more memory than you need to. The main cause is inefficient data access, often by using data structures that are too large, or too sparse reducing any benefit from memory locality.” (from this site)

    So, correct me if I’m wrong, your test was to see how both languages dealt with using inefficient data access.

    Or was it?

    Your remark that “The purpose of the test was to compare the time to sort 128-byte objects in C++ vs C#, nothing more.” should be ignored in that case, I suppose.

    The same goes for your other remark that “C# takes about 10 times longer to sort an array on my computer, compared to a similar C++ verision

    (Similar? My version is in fact rather similar to the C++ version. It uses a struct of the same size that implements an interface used for sorting. But that’s completely irrelevant, right?)

    “Sorting an array” and “sorting 128-byte objects” is not by definition equivalent to “inefficient data access“, is it?

    What were you trying to say:
    a) C++ sorts a lot faster, or
    b) C++ is a lot faster when you use really inefficient data access?

    Please clarify..I’m confused…

    Comment by Novice — July 8, 2010 @ 2:57 pm

    • Your remark that “The purpose of the test was to compare the time to sort 128-byte objects in C++ vs C#, nothing more.” should be ignored in that case, I suppose.

      The same goes for your other remark that “C# takes about 10 times longer to sort an array on my computer, compared to a similar C++ verision”

      What you’re missing is that the C# version puts the array on the heap, not as part of the struct. So you’re not using a 128-bytes struct.

      “Sorting an array” and “sorting 128-byte objects” is not by definition equivalent to “inefficient data access“, is it?

      What were you trying to say:
      a) C++ sorts a lot faster, or
      b) C++ is a lot faster when you use really inefficient data access?

      I wasn’t really “saying” anything, I was reporting emperical data gathered during my research into a performance problem I had encountered. I thought it was interesting comparing the different built-in methods of C# sorting, and how the object size affected (or rather didn’t) sort time. I used C++’s sorting speed as a baseline to see how fast the sorting could have been, hence the title of the blog.

      Comment by systematicgaming — July 8, 2010 @ 10:56 pm

      • Yeah, OK. You’re were simply reporting your findings.

        Now, if you will allow, I’ll present some of my empirical data:

        Managed C# several thousand times faster than native C++!

        Let’s use a struct that is made up of value types only, so that all the memory is handled on the stack. Thus, as you propose:

        struct LargeObject
        {
        public LargeObject(int key)
        {
        this.Key = key;
        a0 = a1 = a2 = a3 = a4 = a5 = a6 = a7 = 0;
        b0 = b1 = b2 = b3 = b4 = b5 = b6 = b7 = 0;
        c0 = c1 = c2 = c3 = c4 = c5 = c6 = c7 = 0;
        d0 = d1 = d2 = d3 = d4 = d5 = d6 = d7 = 0;
        }
        public int Key;

        int a0, a1, a2, a3, a4, a5, a6, a7;
        int b0, b1, b2, b3, b4, b5, b6, b7;
        int c0, c1, c2, c3, c4, c5, c6, c7;
        int d0, d1, d2, d3, d4, d5, d6, d7;
        }

        And, then, let’s run the following sorting test:

        (reference the System.Linq namespace)

        static double TimeTestLinq(int size)
        {
        LargeObject[] array = new LargeObject[size];
        Random rnd = new Random();

        for (int i = 0; i q.Key);
        timer.Stop();
        return timer.Elapsed.TotalMilliseconds;
        }

        My results, average of 8 runs, in ms.:

        Size____C++_______C#
        1024____0.364173__0.0589375
        2048____0.878638__0.0076047
        4096____1.7784____0.0014131
        8192____3.78749___0.0009016
        16384___8.52266___0.0012252
        32768___19.7161___0.0035032
        65536___43.297____0.0038754
        131072__92.9336___0.0045594
        262144__201.216___0.0041449
        524288__404.462___0.0042181
        1048576_851.317___0.0038273

        For the results it doesn’t matter whether data is places on the heap or the stack, by the way.

        It makes one thing clear, though, and that is that your test has been rendered irrelevant, the managed environment circumvents inefficient data structures, unless you rather foolishly force it to using keywords like unsafe, fixed and such. But I know no one who’d be so silly to do that. Do you?

        Comment by Novice — July 9, 2010 @ 11:55 am

  20. Oops, my bad, again!

    The code for the sorting method should read:

    static double TimeTestLinq(int size)
    {
    LargeObject[] array = new LargeObject[size];
    Random rnd = new Random();

    for (int i = 0; i < size; i++)
    {
    array[i] = new LargeObject(rnd.Next());
    }

    Stopwatch timer = new Stopwatch();
    timer.Start();
    array.OrderBy(q => q.Key);
    timer.Stop();
    return timer.Elapsed.TotalMilliseconds;
    }

    Comment by Novice — July 9, 2010 @ 12:25 pm

    • As should have been obvious from the timings, you’re not actually sorting anything.

      array.OrderBy(q => q.Key);

      You may want to read up on how Linq works. OrderBy creates a enumerator, and doesn’t change the original array.

      If you want to actually sort the array try something like this this:
      array.OrderBy(q => q.Key).ToArray();

      Comment by systematicgaming — July 9, 2010 @ 12:42 pm

      • Well, I’m a novice after all.

        I actually meant this:
        var mySortedArray = array.OrderBy(q => q.Key);
        long val = mySortedArray.ElementAt(4).Key + mySortedArray.ElementAt(size – 4).Key;

        instead of:
        array.OrderBy(q => q.Key);

        Now I have an array I can do stuff with, how cool! And it is still just as fast, but I’m missing the point of the experiment, I know.

        However, I still can’t understand your logic. As fas as I can see, your experiment is as follows:

        – Cache pressure is bad
        – Let’s write code that creates tremendous cache pressure
        – the c++ code submitted is a first step towards reducing this pressure (as you demonstrate in your article on cache pressure)
        – the C# code is written using contrived code, that runs contrary to the intended usage of the language (e.g. using keywords like ‘unsafe’ and ‘fixed’ to block out the managed environment and creating structs many times larger than the 16 byte limit recommended by the official C# reference)
        – any C# code that uses standard language constructs to improve performance misses the point of the experiment
        – the c++ code runs a lot faster

        Based on this, and this only, you conclude that “you will think twice about using C# for high-performance code”?

        Comment by Novice — July 9, 2010 @ 2:42 pm

      • – the C# code is written using contrived code, that runs contrary to the intended usage of the language (e.g. using keywords like ‘unsafe’ and ‘fixed’ to block out the managed environment and creating structs many times larger than the 16 byte limit recommended by the official C# reference)
        – any C# code that uses standard language constructs to improve performance misses the point of the experiment

        Everyone who complains about my comparison gets hung up on the “unsafe” and “fixed” stuff, but removing them had no difference in performance.

        People also seem to miss that there are actually valid use cases for having large structs. A particle system in XNA is one such possible use case – where you have thousands of large structs that need to be sorted. A struct with a pointer to a isn’t a memory layout the GPU can handle.

        – the c++ code runs a lot faster

        The original test had a 10x speed difference, that’s HUGE. The follow up with Visual Studio 2008 showed that the difference was improved, but it was still around 2.5 times slower. 2.5x is still a lot, especially in a real time application. Following through with the XNA particle example, it means a C# game would probably only be able to have 1/2 (or less) particles than the same game in C++.

        Comment by systematicgaming — July 9, 2010 @ 11:42 pm

  21. “Everyone who complains about my comparison gets hung up on the “unsafe” and “fixed” stuff, but removing them had no difference in performance.”

    Yes, it did. Without those keywords, the managed environment handles it on the heap, optimizing away time consuming stuff that has no bearing on the outcome of the program anyway, making the C# version actually 2-3 times faster.

    You can either call that cheating, but it is more reasonable to construct an experiment where you put forward the best solution each language has for dealing with a certain problem.

    The way it is constructed now reminds me of an anecdote told by Bjarne Stroustrup (cannot remember the source), where they measured performace between C++ and another language. C++ was a lot faster, but the proponents of the other language claimed the test was unfair because their language needed the hard disk for additional memory, where C++ did not. Stroustup rightly refuted this as not being unfair, as C++ handled memory so efficiently it simply did not need to use the hard disk.

    Your experiment is tantamount to saying that C++ should have been obligated to use the hard drive, because otherwise it would miss the point of the experiment.

    Comment by Novice — July 13, 2010 @ 10:58 am

    • You can either call that cheating, but it is more reasonable to construct an experiment where you put forward the best solution each language has for dealing with a certain problem..

      The expieriment was derived from a real-life use case that requried contiguous memory (GPU vertex buffer). That has nothing at all to do with the language of implementation. The hardware requries a specific memory layout. Putting the payload of the struct into numerous, non-contiguous, heap-allocated arrays is a completely different data structure which is unusable by the hardware.

      To put it simply: I wrote the code to meet a fixed hardware specification. Everyone who attempted to use the “proper C# way” of a heap based pointer did not meet these specifications.

      In hindsight, I should have put the contiguous memory condition in big bold letters at the top, but given the target audience of this blog (people interested in low-level game programming) I didn’t feel it was necessary as the need to work with contiguous memory is common.

      Comment by systematicgaming — July 13, 2010 @ 11:57 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: