Tuesday, December 30, 2008

Performance tips while using loops(for, while, do)

Reference :- http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx

There are a lot of posts on the Internet discussing varying methods of looping using for loops and which perform best. These posts also generally give advice as to how you should handle your looping based upon performance metrics. Most of these types of posts suffer from an affliction I have discussed previously; they look at IL (Intermediate Language) to rationalize performance which is just flat out wrong as subtle differences can cause JIT (Just in Time) optimizations to go haywire.

In this post we will look at the unmanaged code produced by varying constructs to come up with a definitive answer to the performance question. We will also look at some other optimization methods that people claim happen but do not always happen and discuss some reasons why they may not be happening and of course set some rules for fast looping.

In trying to apply this the version of your framework is important as the JIT may vary from version to version, I do not believe anything here will change based upon the platform but that is a possibility. I used version 2.0.50727 for all examples in this document. Other versions such as Rotor, Mono, or 1.x  will most likely show differing behaviors.

Hoisting?

Before I start getting into code I would like to discuss the concept of hoisting as it will be the focal point of this entire discussion. When dealing with loops we can break any given loop into three sections

1)      The preamble (setting up for the loop, doing things such as setting our counter to 0)

2)      The code within the loop

3)      The code to increment our counter and check for the end of iteration possibly jumping back to step two

For the rest of this post I will color code the disassembled assembly as section one (green), section two (blue), section 3 (yellow) to make things easier to follow

Any code that is in step two or three will be run N times as the loop is executed. The code in step one will only be executed a single time when we first enter the loop. The concept of hoisting involves moving code out of steps two and three and into step one. Hoisting obviously gives you a huge performance gain as the code will only be run once. 

Simple Hoisting Example

If you read alot articles on for loop performance they will tell you to not manually hoist because the JIT will end up doing it for you. In order to test this I have created the following test code (which can be used in conjunction with the harness discussed in the previous Performance Measurement post)

    class Program {

        static int[] foo = new int[100000];

        static int Dummy;

        static void Test1NoHoist() {

            int total = 0;

            for (int i = 0; i <>

                total|=i;

            }

            Dummy = total;

        }

 

        static void Test1WithHoist() {

            int total = 0;

            int length = foo.Length;

            for (int i = 0; i <>

                total|=i;

            }

            Dummy = total;

        }

 

        static void Main(string[] args) {

            TestHarness.Test("Test1NoHoist", 10000, newTestHandler(Test1NoHoist));

            TestHarness.Test("Test1WithHoist", 10000, newTestHandler(Test1WithHoist));

            Console.ReadLine();

        }

    }

Listing 1: Basic manual hoisting example

 

You will quickly notice that in Test1WithHoist we manually took the check Length property on foo and moved it to before the loop; this is a common optimization if you come from a C/C++ world. The thought behind this is that since Length translates to get_Length() that we are saving ourselves from having to call the method n times (remember that this would be in the second or third section and everything in the third section is called n times).

An astute reader will notice an oddity dealing with the “Dummy” variable that is assigned after the loop. We will come back this a bit later on… good catch ;)

The by running our performance test, we can see that the code executes in roughly equivalent time. 

Test

Total Time (ns)

Average (ns)

Test1NoHoist

820583612.35

82058.36

Test1WithHoist

812735363.70

81273.53

 

Let’s take a look at the generated code to offer proof that the two bits of code should run in equivalent time (if you want to see the results for yourself remember to follow the instructions for getting JIT optimized code from Viewing Unmanaged Code in VS.NET.

Test1NoHoist

Test1WithHoist

00000000  xor         ecx,ecx

00000002  xor         edx,edx

00000004  mov         eax,dword ptr ds:[022B1EC4h]

00000009  mov         eax,dword ptr [eax+4]

0000000c  test        eax,eax

0000000e  jle         00000019

00000010  or          ecx,edx

00000012  add         edx,1

00000015  cmp         eax,edx

00000017  jg          00000010

00000019  mov         dword ptr ds:[00912FE8h],ecx

0000001f  ret             

 

 

00000000  xor         ecx,ecx

00000002  mov         eax,dword ptr ds:[022B1EC4h]

00000007  mov         edx,dword ptr [eax+4]

0000000a  xor         eax,eax

0000000c  test        edx,edx

0000000e  jle         00000019

00000010  or          ecx,eax

00000012  add         eax,1

00000015  cmp         eax,edx

00000017  jl          00000010

00000019  mov         dword ptr ds:[00912FE8h],ecx

0000001f  ret             

Listing 2: Disassembled versions of our functions

 

 

A kind of neat item in these two bits of code is that the only real difference is that EAX and EDX are interchanged … This makes them look a bit more different than they are but rest assured they are pretty much identical. In both cases, the access to the stop variable has been hoisted out of the loop (in section three, both are comparing their counter to a register that was preloaded in section one). To better illustrate the point, here is a disassembly of the first for loop (without manual hoisting) when run without JIT optimizations (i.e. it will not be hoisted).

0000002a  xor         esi,esi

0000002c  nop             

0000002d  jmp         00000034

0000002f  nop             

00000030  or          edi,esi

00000032  nop             

00000033  inc         esi 

00000034  mov         eax,dword ptr ds:[02275A34h] ;inlined length

00000039  cmp         esi,dword ptr [eax+4]

0000003c  setl        al  

0000003f  movzx       eax,al

00000042  mov         ebx,eax

00000044  test        ebx,ebx

00000046  jne         0000002F

 

Listing 3: First loop (no hoisting with optimizations disabled) non-important code removed

 

If this does not convince you to NEVER release code that is not being optimized I don’t know what will. It should also give you an idea of how much work the JIT optimizer really does. What has happened here is that the property was inlined, but it was inlined into section three. The optimizer was smart enough to realize this and to move it up to the preamble.  +1 for the optimizer

Non-Trivial Hoisting

Now that we have gone through a simple example, let’s try a more difficult one for the optimizer. In this example we will use a method GetUpperBound(0) which will not be inlined for us, let’s take a look at how well the JIT handles it. Here is the testing code to add to our previous code.

        static void Test1NoHoistNoInlining() {

            int total = 0;

            for (int i = 0; i <>

                total |= i;

            }

            Dummy = total;

        }

 

        static void Test1WithHoistNoInlining() {

            int total = 0;

            int length = foo.GetUpperBound(0);

            for (int i = 0; i <>

                total |= i;

            }

            Dummy = total;

        }

Listing 4: Tests without inlining available

 

Next we should add the following to our Main to run the tests.

            TestHarness.Test("Test1NoHoist", 10000, newTestHandler(Test1NoHoistNoInlining));

            TestHarness.Test("Test1WithHoist", 10000, newTestHandler(Test1WithHoistNoInlining));

 

Listing 5: Calling our new methods

 

When we run the tests in release mode, we get quite different results than previously. I am including the previous results in the table for comparison.

Test

Total Time (ns)

Average (ns)

Test1NoHoist

820583612.35

82058.36

Test1WithHoist

812735363.70

81273.53

Test1NoHoistNoInlining

25457518879.48

2545751.88

Test1WithHoistNnInlining

824641533.67

82464.15

 

Interesting, not manually hoisting makes our function 32x slower. Before we even get into code on this one I will put my money on the optimizer not hoisting the call. I believe however there are some reasons for this that we will discuss after looking through the code.

Test1NoHoistNoInlining

Test1WithHoistNoInlining

00000004  xor         esi,esi

00000006  mov         ecx,dword ptr ds:[022B1EC4h]

0000000c  xor         edx,edx

0000000e  cmp         dword ptr [ecx],ecx

00000010  call        792661F8

00000015  test        eax,eax

00000017  jle         00000031

00000019  or          edi,esi

0000001b  add         esi,1

0000001e  mov         ecx,dword ptr ds:[022B1EC4h]

00000024  xor         edx,edx

00000026  cmp         dword ptr [ecx],ecx

00000028  call        792661F8

0000002d  cmp         eax,esi

0000002f  jg          00000019

 

 

00000003  mov         ecx,dword ptr ds:[022B1EC4h]

00000009  xor         edx,edx

0000000b  cmp         dword ptr [ecx],ecx

0000000d  call        792661A8

00000012  mov         edx,eax

00000014  xor         eax,eax

00000016  test        edx,edx

00000018  jle         00000023

0000001a  or          esi,eax

0000001c  add         eax,1

0000001f  cmp         eax,edx

00000021  jl         0000001A             

Listing 6: Disassembled versions

 

As we suspected the code on the left hand side is not automatically hoisting the call of the function for us.  It would seem that the JIT will not automatically hoist method calls for us, that instead we have to explicitly state that we want them hoisted on our own.

It seems that that the JIT cannot get to the point of having a value in a register or memory that it can consider its result and as such feels the need to make the call on every iteration. This would make sense in general as I may be depending upon the behavior of being called at every interval. Consider the following code.

        static int t = 0;

        static bool KeepGoing() {

            Console.WriteLine("Still Going!");

            t++;

            return t <>

        }

 

 

        static void TestShortMethod() {

            for (; KeepGoing();) {

                System.Threading.Thread.Sleep(100);

            }

        }

Listing 7: Odd but valid code

 

Obviously this is nasty code but it is still valid. This is a simplified example it appears that it was a conscious decision to not support these types of situations as they can quickly become extremely complex from the JIT point of view, my guess is that it won’t deal with anything beyond simply returning a variable which will be inlined into a simple read anyway.  In my opinion, something like this should still be inlined (and left in section 3) to avoid the overhead of setting up the call on every iteration but I may be missing some other case that makes this more difficult. Based upon these results we can create the following rule.

If you wish to use a method call for your stop condition that does anything more complex than simply returning a variable or is not inlinable for other reasons such as being virtual; you should hoist it manually by placing it in the first part of your for loop in order to make it explicit to the JIT that you do not wish the behavior to be called on every iteration.

 Mark Lubischer brought up an excellent point here. We can in fact also hoist the call by using for like this.

        static int MarkWithoutInlining() {

            int total = 0;

            int[] length = new int[10000];

 

            for (int i = 0, j = length.GetUpperBound(0); i <>

                total |= i;

            }

            return total;

        }

 

This is a much better way of handling our hoisting as it better defines the scope of our variable while the behavior is in fact equivalent to our original hoist.

        static int MarkWithoutInlining() {

            int total = 0;

00000000  push        edi 

00000001  push        esi 

00000002  push        ebx 

00000003  xor         ebx,ebx

            int[] length = new int[10000];

00000005  mov         edx,2710h

0000000a  mov         ecx,7915982Ah

0000000f  call        FFB21D98

00000014  mov         edi,eax

 

            for (int i = 0, j = length.GetUpperBound(0); i <>

00000016  xor         esi,esi

00000018  mov         ecx,edi

0000001a  xor         edx,edx

0000001c  cmp         dword ptr [ecx],ecx

0000001e  call        792664C8

00000023  test        eax,eax

00000025  jle         00000039

00000027  mov         edx,dword ptr [edi+4]

                total |= lengthIdea [I];

0000002a  cmp         esi,edx

0000002c  jae         0000003F

0000002e  or          ebx,dword ptr [edi+esi*4+8]

            for (int i = 0, j = length.GetUpperBound(0); i <>

00000032  add         esi,1

00000035  cmp         esi,eax

00000037  jl          0000002A

            }

            return total;

00000039  mov         eax,ebx

0000003b  pop         ebx 

0000003c  pop         esi 

0000003d  pop         edi 

0000003e  ret             

0000003f  call        792B42E9

00000044  int         3   

 

A Bit About Foreach

Foreach is not an IL concept, it is a compiler concept. Compilers generate normal for loops when they see a foreach being used to iterate an array. If you are interested in seeing how foreach loops work I would highly recommend taking a look with ILDASM or Reflector. I will not discuss heavily foreach loops as they have at this moment no difference when they reach the native level.

Foreach could in the future offer many benefits over a general for loop. Since the foreach loop is explicitly stating that you want to iterate through the array (not allowing things like addition and subtraction to your counter), other things such as hoisting array bounds checks (which we will discuss shortly) would be much more easily accomplished. I would imagine that in the future foreach will be the preferred iteration construct.

I will assure you though that as of now foreach is not faster than the equivalent for loop when dealing with arrays, in fact both VB.NET and C# insure that they are identical. There is one case where foreach will be slower than a for loop and that is if you do not actually use the item within your loop, the for version will obviously be faster since it never loads the variable where as the foreach does this implicitly on every iteration. Foreach can however offer you great performance increases when dealing with other types that implement IEnumerable (not including collections as they simply perform an indexed lookup in their enumerator).

Logically one can come up with a great example for this when looking at a linked list. The for loop will cause ∑ n total operations on the iteration where as the foreach will only cause n. The reason for this is every index operation would have to start at the beginning of the list and iterate n nodes in order to return the nth node where as the enumerator will simply remember the last node it was on and give the next node. For this reason we will add our second rule.

When dealing with items that are enumerable as opposed to dealing with arrays or collections; prefer foreach to a for loop as the enumerator will often offer a much faster way of enumerating than using an index.

Array Bounds Check Hoisting

The check to find out whether or not we are valid to continue is not the only thing that can be hoisted from inside of a loop. In fact every time the variable is used in a comparison within the loop is an opportunity for hoisting to occur. The first example of this type of hoist we will look at is array bounds checking.

Array bounds are checked automatically by the CLR in order to prevent things like buffer overflows from occurring. Every time that you access an array in safe code you will in fact have a comparison occur to insure that you are within the range of the array. If you are not within a valid range an IndexOutOfRangeException will occur as opposed to writing happily beyond the end of your array as many other language such as C would do.

The problem with array bounds checks is that they are extremely redundant when dealing with loops. Consider the following code that shows what is happening.

        static void SampleArrayBoundsCheck() {

            int total = 0;

            for (int i = 0; i <>

                if (i <>

                    total |= foo[ i ];

                }

            }

        }

Listing 9: Sample Array Bounds Code

 

Naturally you are not issuing these checks in your code, but this example helps to make what’s really going on a bit clearer. When looking at this code anyone who has read the first few chapters of a C# book would scratch their head and wonder why all of the redundancy has been placed into the code. It is obvious that if our counter has a constraint to stay below foo.Length that it will in fact always succeed the conditional of being below foo.Length. To test how intelligent the JIT is with handling this situation we can use the following code. Note that for this test we will simply be looking at the native code generated as opposed to measuring performance.

        static int [] SampleArrayBoundsCheck() {

            int [] Destination = new int[foo.Length];

            for (int i = 0; i <>

                Destination[ i ]= foo[ i ];

            }

            return Destination;

        }

Listing 10: Test to see if array bound checks hoist

 

This code is simply copying the static array that we have been using previously to another array. This code should be particularly good to test as it in fact has two bounds checks occurring within it (one for each array).

        static int [] SampleArrayBoundsCheck() {

            int [] Destination = new int[foo.Length];

00000000  push        edi 

00000001  push        esi 

00000002  push        ebx 

00000003  push        ebp 

00000004  push        eax 

00000005  mov         edi,dword ptr ds:[022B1EC4h]

0000000b  mov         ebx,dword ptr [edi+4]

0000000e  mov         edx,ebx

00000010  mov         ecx,7915982Ah

00000015  call        FFB21FC0

0000001a  mov         esi,eax

            for (int i = 0; i <>

0000001c  xor         edx,edx

0000001e  test        ebx,ebx

00000020  jle         00000045

00000022  mov         ebp,dword ptr [edi+4]

00000025  mov         eax,dword ptr [esi+4]

00000028  mov         dword ptr [esp],eax

                Destination[ i ]= foo[ i ];

0000002b  cmp         edx,ebp

0000002d  jae         0000004D

0000002f  mov         ecx,dword ptr [edi+edx*4+8]

00000033  mov         eax,dword ptr [esp]

00000036  cmp         edx,eax

00000038  jae         0000004D

0000003a  mov         dword ptr [esi+edx*4+8],ecx

            for (int i = 0; i <>

0000003e  add         edx,1

00000041  cmp         ebx,edx

00000043  jg          0000002B

            }

            return Destination;

00000045  mov         eax,esi

00000047  pop         ecx 

00000048  pop         ebp 

00000049  pop         ebx 

0000004a  pop         esi 

0000004b  pop         edi 

0000004c  ret              

0000004d  call        792B4511

00000052  int         3   

Listing 11: Disassembly of simple array bound checks

 

Unfortunately even this most trivial of examples does not hoist the array bounds checks. The jumps can clearly be seen on lines 2D and 38.  As to why this does not work I am not sure but perhaps it is because code like this could exist which could cause a buffer overflow? Perhaps the amount of time detect this situation has been deemed too much.

        static int [] SampleArrayBoundsCheck() {

            int [] Destination = new int[foo.Length];

            for (int i = 0; i <>

                i += 100000000;

                Destination[ i ]= foo[ i ];

            }

            return Destination;

        }

Listing 12: Buffer overfow example

 

Checking before applying the optimization would require a good amount of overhead and if it did not check to make sure anyone was messing with our counter in the interior of the loop then we could run into code like this which would cause a buffer overflow (who knows what we just wrote over, maybe we would get an exception or maybe we just caused a funny character in a string some place, boy do I miss the days of C/Pascal in embedded systems J).  Let’s try something even simpler, Brad Abrams says it works so it must?

        static int SampleArrayConstantBoundsCheck() {

            int foo2 = 0;

            for (int i = 0; i <>

              foo2 = foo[ i ];

            }

            return foo2;

        }

        static int SampleArrayConstantBoundsCheck() {

            int foo2 = 0;

00000000  push        esi 

            for (int i = 0; i <>

00000001  xor         edx,edx

00000003  mov         ecx,dword ptr ds:[022B1EC4h]

00000009  mov         esi,dword ptr [ecx+4]

              foo2 = foo[ i ];

0000000c  cmp         edx,esi

0000000e  jae         00000021

00000010  mov         eax,dword ptr [ecx+edx*4+8]

            for (int i = 0; i <>

00000014  add         edx,1

00000017  cmp         edx,3E8h

0000001d  jl          0000000C

0000001f  pop         esi 

            }

            return foo2;

00000020  ret             

00000021  call        792B44A9

00000026  int         3   

Listing 13: Simplest possible example

 

Again, no love from the JIT optimizer, we cannot possibly make this example any simpler but it still has the bounds checks firmly placed within our loop. The JIT optimizer does not hoist array bounds checks for you. My guess is that it will not do this do to threading reasons, I would assume the process to actually be thread safe (as it is only copying something of reference size which is assured to be atomic but I am probably missing something. There is some similar functionality which happens that I think may cause the confusion that array bounds hoist are actually occurring. Let’s take a look at this other optimization, but first we can make a new rule.

If you are dealing with array accesses in a highly performant area and the bounds checks being in the loop are too much. You will have to handle your iteration in unsafe code to remove the bounds checks.

Local Array Bounds Check Removal

The optimization that the JIT does support is removing bounds checks for locally created arrays. Since the array has been created locally and is only known by a local reference, the JIT can be sure that the array cannot possibly change. In these cases the JIT will completely remove bounds checks. Let’s take a look at an example.

        static int TestLocalArrayBoundsCheckRemoval() {

            int [] Test = new int[10000];

            int total = 0;

            for(int i=0;i

                total |= Test[ i ];

            }

            return total;

        }

        static int TestLocalArrayBoundsCheckRemoval() {

            int[] Test = new int[10000];

00000000  push        esi 

00000001  mov         edx,2710h

00000006  mov         ecx,7915982Ah

0000000b  call        FFB21FF0

00000010  mov         ecx,eax

            int total = 0;

00000012  xor         esi,esi

            for(int i=0;i

00000014  xor         edx,edx

00000016  mov         eax,dword ptr [ecx+4]

00000019  test        eax,eax

0000001b  jle         00000028

                total |= Test[ i ];

0000001d  or          esi,dword ptr [ecx+edx*4+8]

            for(int i=0;i

00000021  add         edx,1

00000024  cmp         eax,edx

00000026  jg          0000001D

            }

            return total;

00000028  mov         eax,esi

0000002a  pop         esi 

0000002b  ret             

Listing 14: Removal of local array bounds

 

What is really neat here is that we have in fact created unmanaged code which does not even check for out of bounds access. This code will never throw an exception (it doesn’t even have code to throw an exception). This optimization is even better than hoisting as it has no overhead, the hoisting would make it a single comparison; this had no comparison. My guess is that people were getting confused between this and hoisting. If you do not use a length property or a constant in your loop this optimization will not occur, as an example the following code will not remove the bounds checks

        static int SampleArrayConstantBoundsCheckRemoval() {

            int foo2 = 0;

            int size = foo.GetUpperBound(0) ;

            for (int i = 0; i <>

                foo2 = foo[ i ];

            }

            return foo2;

        }

 

        static int TestLocalArrayBoundsCheckRemoval() {

            int[] Test = new int[10000];

00000000  push        edi 

00000001  push        esi 

00000002  mov         edx,2710h

00000007  mov         ecx,7915982Ah

0000000c  call        FFB21FF0

00000011  mov         esi,eax

            int total = 0;

00000013  xor         edi,edi

            int size = Test.GetUpperBound(0);

00000015  mov         ecx,esi

00000017  xor         edx,edx

00000019  cmp         dword ptr [ecx],ecx

0000001b  call        79266720

00000020  mov         edx,eax

            for(int i=0;i

00000022  xor         eax,eax

00000024  test        edx,edx

00000026  jle         0000003A

00000028  mov         ecx,dword ptr [esi+4]

                total |= Test[ i ];

0000002b  cmp         eax,ecx

0000002d  jae         0000003F

0000002f  or          edi,dword ptr [esi+eax*4+8]

            for(int i=0;i

00000033  add         eax,1

00000036  cmp         eax,edx

00000038  jl          0000002B

            }

            return total;

0000003a  mov         eax,edi

0000003c  pop         esi 

0000003d  pop         edi 

0000003e  ret             

0000003f  call        792B4541

00000044  int         3   

Listing 15: Manual hoisting causes Array Bounds Check Removal to fail

 

As we can see the manual hoisting caused the optimization to fail. So manual hoisting is not always a good thing as it can break some JIT patterns such as this. This optimization will also work for writes.

        static int [] TestLocalArrayBoundsCheckWrite() {

            int[] Test = new int[10000];

            for(int i=0;i

                Test[ i ] = i;

            }

            return Test;

        }

        static int [] TestLocalArrayBoundsCheckWrite() {

            int[] Test = new int[10000];

00000000  mov         edx,2710h

00000005  mov         ecx,7915982Ah

0000000a  call        FFB21FF0

0000000f  mov         ecx,eax

            for(int i=0;i

00000011  xor         edx,edx

00000013  mov         eax,dword ptr [ecx+4]

00000016  test        eax,eax

00000018  jle         00000025

                Test[ i ] = i;

0000001a  mov         dword ptr [ecx+edx*4+8],edx

            for(int i=0;i

0000001e  add         edx,1

00000021  cmp         eax,edx

00000023  jg          0000001A

            }

            return Test;

00000025  mov         eax,ecx

00000027  ret             

Listing 16: Bounds Check Removal for writes

 

As we can see, it again removed all of the bounds checking. Of course as we have seen earlier, this will only work for an array that is created in the same method. If we move the creation of the array out of the method this optimization will no longer occur. This leads us to our third rule.

Try to keep creation and initialization of arrays within the same method as the JIT can often times remove the bounds checks during your initialization.

Another oddity

Were you one of the people I said “good catch” to earlier? Remember those odd “Dummy” variables were using in the first set of examples

The JIT is extremely smart in some optimizations. For example it realizes if you are doing calculations and never use the result, it will remove the code for you. It does however have a slight problem when dealing with loops. Let’s try removing those Dummy calls from our previous code and see what happens. Let’s take a look at only one of them since we have already shown that they produce identical results.

        static void Test1NoHoist() {

            int total = 0;

            for (int i = 0; i <>

                total|=i;

            }

        }

              static void Test1NoHoist() {

            int total = 0;

00000000  xor         edx,edx

            for (int i = 0; i <>

00000002  mov         eax,dword ptr ds:[022B1EC4h]

00000007  mov         eax,dword ptr [eax+4]

0000000a  test        eax,eax

0000000c  jle         00000015

0000000e  add         edx,1

00000011  cmp         eax,edx

00000013  jg          0000000E

00000015  ret             

Figure 17: Our method without the dummy value being set

 

I have color coded the example as with the rest of the example.  You will notice that the blue section is missing. The JIT optimizer has correctly realized that we were in fact never using our total variable after we performed all of the calculation on it. It has however failed to realize that by taking out our code, it has in fact left a loop which does nothing. Nothing super interesting here, this is just something I came across while writing this.

Summary

We have looked through quite a few JIT optimizations in this post and have made a few rules which that can help us in situations where measuring becomes very difficult as we need to know what to measure against. Let’s go back through our rules, keep in mind that these rules are not version agnostic so using 1.x, mono, or rotor you will likely get different results!

1)      If you wish to use a method call for your stop condition that does anything more complex than simply returning a variable or is not inlinable for other reasons such as it being virtual, you should hoist it manually in order by placing it in the first part of your for loop to make it explicit to the JIT that you do not wish the method to be called on every iteration.

2)      When dealing with items that are enumerable as opposed to dealing with arrays or collections; prefer foreach to a for loop as the enumerator will often offer a much faster way of enumerating than using an index.

3)      If you are dealing with array access in a highly performant area and the bounds checks being in the loop are too much overhead. You will have to handle your iteration in unsafe code to remove the bounds checks (in other words, it is impossible to write performant looping code in VB.NET)

4)      Try to keep creation and initialization of arrays within the same method as the JIT can often times remove the bounds checks during your initialization.

Hopefully by following these rules you can save yourself some time the next time you are optimizing code as many of these rules deal with things that are difficult to measure, but remember measurement is key.

I hope you enjoyed reading about these optimizations as much as I enjoyed writing about them.

Reference :- http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx