Reference :- http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx
There are a lot of posts on the Internet discussing varying methods of looping using for loops and which perform best. These posts also generally give advice as to how you should handle your looping based upon performance metrics. Most of these types of posts suffer from an affliction I have discussed previously; they look at IL (Intermediate Language) to rationalize performance which is just flat out wrong as subtle differences can cause JIT (Just in Time) optimizations to go haywire.
In this post we will look at the unmanaged code produced by varying constructs to come up with a definitive answer to the performance question. We will also look at some other optimization methods that people claim happen but do not always happen and discuss some reasons why they may not be happening and of course set some rules for fast looping.
In trying to apply this the version of your framework is important as the JIT may vary from version to version, I do not believe anything here will change based upon the platform but that is a possibility. I used version 2.0.50727 for all examples in this document. Other versions such as Rotor, Mono, or 1.x will most likely show differing behaviors.
Hoisting?
Before I start getting into code I would like to discuss the concept of hoisting as it will be the focal point of this entire discussion. When dealing with loops we can break any given loop into three sections
1) The preamble (setting up for the loop, doing things such as setting our counter to 0)
2) The code within the loop
3) The code to increment our counter and check for the end of iteration possibly jumping back to step two
For the rest of this post I will color code the disassembled assembly as section one (green), section two (blue), section 3 (yellow) to make things easier to follow
Any code that is in step two or three will be run N times as the loop is executed. The code in step one will only be executed a single time when we first enter the loop. The concept of hoisting involves moving code out of steps two and three and into step one. Hoisting obviously gives you a huge performance gain as the code will only be run once. 
Simple Hoisting Example
If you read alot articles on for loop performance they will tell you to not manually hoist because the JIT will end up doing it for you. In order to test this I have created the following test code (which can be used in conjunction with the harness discussed in the previous Performance Measurement post)
|     class Program {         static int[] foo = new int[100000];         static int Dummy;         static void Test1NoHoist() {             int total = 0; for (int i = 0; i <>                 total|=i;             }             Dummy = total;         }         static void Test1WithHoist() {             int total = 0;             int length = foo.Length; for (int i = 0; i <>                 total|=i;             }             Dummy = total;         }         static void Main(string[] args) {             TestHarness.Test("Test1NoHoist", 10000, newTestHandler(Test1NoHoist));             TestHarness.Test("Test1WithHoist", 10000, newTestHandler(Test1WithHoist));             Console.ReadLine();         } } | 
| Listing 1: Basic manual hoisting example | 
You will quickly notice that in Test1WithHoist we manually took the check Length property on foo and moved it to before the loop; this is a common optimization if you come from a C/C++ world. The thought behind this is that since Length translates to get_Length() that we are saving ourselves from having to call the method n times (remember that this would be in the second or third section and everything in the third section is called n times).
An astute reader will notice an oddity dealing with the “Dummy” variable that is assigned after the loop. We will come back this a bit later on… good catch ;)
The by running our performance test, we can see that the code executes in roughly equivalent time.
| Test | Total Time (ns) | Average (ns) | 
| Test1NoHoist | 820583612.35 | 82058.36 | 
| Test1WithHoist | 812735363.70 | 81273.53 | 
Let’s take a look at the generated code to offer proof that the two bits of code should run in equivalent time (if you want to see the results for yourself remember to follow the instructions for getting JIT optimized code from Viewing Unmanaged Code in VS.NET.
| Test1NoHoist | Test1WithHoist | 
| 00000000  xor         ecx,ecx 00000002  xor         edx,edx 00000004  mov         eax,dword ptr ds:[022B1EC4h] 00000009  mov         eax,dword ptr [eax+4] 0000000c  test        eax,eax 0000000e  jle         00000019 00000010  or          ecx,edx 00000012  add         edx,1 00000015  cmp         eax,edx 00000017  jg          00000010 00000019  mov         dword ptr ds:[00912FE8h],ecx 0000001f  ret                | 00000000  xor         ecx,ecx 00000002  mov         eax,dword ptr ds:[022B1EC4h] 00000007  mov         edx,dword ptr [eax+4] 0000000a  xor         eax,eax 0000000c  test        edx,edx 0000000e  jle         00000019 00000010  or          ecx,eax 00000012  add         eax,1 00000015  cmp         eax,edx 00000017  jl          00000010 00000019  mov         dword ptr ds:[00912FE8h],ecx 0000001f  ret              | 
| Listing 2: Disassembled versions of our functions | 
A kind of neat item in these two bits of code is that the only real difference is that EAX and EDX are interchanged … This makes them look a bit more different than they are but rest assured they are pretty much identical. In both cases, the access to the stop variable has been hoisted out of the loop (in section three, both are comparing their counter to a register that was preloaded in section one). To better illustrate the point, here is a disassembly of the first for loop (without manual hoisting) when run without JIT optimizations (i.e. it will not be hoisted).
| 0000002a  xor         esi,esi 0000002c  nop              0000002d  jmp         00000034 0000002f  nop              00000030  or          edi,esi 00000032  nop              00000033  inc         esi  00000034  mov         eax,dword ptr ds:[02275A34h] ;inlined length 00000039  cmp         esi,dword ptr [eax+4] 0000003c  setl        al   0000003f  movzx       eax,al 00000042  mov         ebx,eax 00000044  test        ebx,ebx 00000046  jne         0000002F   | 
| Listing 3: First loop (no hoisting with optimizations disabled) non-important code removed | 
If this does not convince you to NEVER release code that is not being optimized I don’t know what will. It should also give you an idea of how much work the JIT optimizer really does. What has happened here is that the property was inlined, but it was inlined into section three. The optimizer was smart enough to realize this and to move it up to the preamble. +1 for the optimizer
Non-Trivial Hoisting
Now that we have gone through a simple example, let’s try a more difficult one for the optimizer. In this example we will use a method GetUpperBound(0) which will not be inlined for us, let’s take a look at how well the JIT handles it. Here is the testing code to add to our previous code.
|         static void Test1NoHoistNoInlining() {             int total = 0; for (int i = 0; i <>                 total |= i;             }             Dummy = total;         }         static void Test1WithHoistNoInlining() {             int total = 0;             int length = foo.GetUpperBound(0); for (int i = 0; i <>                 total |= i;             }             Dummy = total; } | 
| Listing 4: Tests without inlining available | 
Next we should add the following to our Main to run the tests.
|             TestHarness.Test("Test1NoHoist", 10000, newTestHandler(Test1NoHoistNoInlining));             TestHarness.Test("Test1WithHoist", 10000, newTestHandler(Test1WithHoistNoInlining)); | 
| Listing 5: Calling our new methods | 
When we run the tests in release mode, we get quite different results than previously. I am including the previous results in the table for comparison.
| Test | Total Time (ns) | Average (ns) | 
| Test1NoHoist | 820583612.35 | 82058.36 | 
| Test1WithHoist | 812735363.70 | 81273.53 | 
| Test1NoHoistNoInlining | 25457518879.48 | 2545751.88 | 
| Test1WithHoistNnInlining | 824641533.67 | 82464.15 | 
Interesting, not manually hoisting makes our function 32x slower. Before we even get into code on this one I will put my money on the optimizer not hoisting the call. I believe however there are some reasons for this that we will discuss after looking through the code.
| Test1NoHoistNoInlining | Test1WithHoistNoInlining | 
| 00000004  xor         esi,esi 00000006  mov         ecx,dword ptr ds:[022B1EC4h] 0000000c  xor         edx,edx 0000000e  cmp         dword ptr [ecx],ecx 00000010  call        792661F8 00000015  test        eax,eax 00000017  jle         00000031 00000019  or          edi,esi 0000001b  add         esi,1 0000001e  mov         ecx,dword ptr ds:[022B1EC4h] 00000024  xor         edx,edx 00000026  cmp         dword ptr [ecx],ecx 00000028  call        792661F8 0000002d  cmp         eax,esi 0000002f  jg          00000019   | 00000003  mov         ecx,dword ptr ds:[022B1EC4h] 00000009  xor         edx,edx 0000000b  cmp         dword ptr [ecx],ecx 0000000d  call        792661A8 00000012  mov         edx,eax 00000014  xor         eax,eax 00000016  test        edx,edx 00000018  jle         00000023 0000001a  or          esi,eax 0000001c  add         eax,1 0000001f  cmp         eax,edx 00000021  jl         0000001A              | 
| Listing 6: Disassembled versions | 
As we suspected the code on the left hand side is not automatically hoisting the call of the function for us. It would seem that the JIT will not automatically hoist method calls for us, that instead we have to explicitly state that we want them hoisted on our own.
It seems that that the JIT cannot get to the point of having a value in a register or memory that it can consider its result and as such feels the need to make the call on every iteration. This would make sense in general as I may be depending upon the behavior of being called at every interval. Consider the following code.
|         static int t = 0;         static bool KeepGoing() {             Console.WriteLine("Still Going!");             t++; return t <>         }         static void TestShortMethod() {             for (; KeepGoing();) {                 System.Threading.Thread.Sleep(100);             } } | 
| Listing 7: Odd but valid code | 
Obviously this is nasty code but it is still valid. This is a simplified example it appears that it was a conscious decision to not support these types of situations as they can quickly become extremely complex from the JIT point of view, my guess is that it won’t deal with anything beyond simply returning a variable which will be inlined into a simple read anyway. In my opinion, something like this should still be inlined (and left in section 3) to avoid the overhead of setting up the call on every iteration but I may be missing some other case that makes this more difficult. Based upon these results we can create the following rule.
If you wish to use a method call for your stop condition that does anything more complex than simply returning a variable or is not inlinable for other reasons such as being virtual; you should hoist it manually by placing it in the first part of your for loop in order to make it explicit to the JIT that you do not wish the behavior to be called on every iteration.
Mark Lubischer brought up an excellent point here. We can in fact also hoist the call by using for like this.
|         static int MarkWithoutInlining() {             int total = 0;             int[] length = new int[10000]; for (int i = 0, j = length.GetUpperBound(0); i <>                 total |= i;             }             return total;         } | 
This is a much better way of handling our hoisting as it better defines the scope of our variable while the behavior is in fact equivalent to our original hoist.
|         static int MarkWithoutInlining() {             int total = 0; 00000000  push        edi  00000001  push        esi  00000002  push        ebx  00000003  xor         ebx,ebx             int[] length = new int[10000]; 00000005  mov         edx,2710h 0000000a  mov         ecx,7915982Ah 0000000f  call        FFB21D98 00000014  mov         edi,eax for (int i = 0, j = length.GetUpperBound(0); i <> 00000016  xor         esi,esi 00000018  mov         ecx,edi 0000001a  xor         edx,edx 0000001c  cmp         dword ptr [ecx],ecx 0000001e  call        792664C8 00000023  test        eax,eax 00000025  jle         00000039 00000027  mov         edx,dword ptr [edi+4]                 total |= length 0000002a  cmp         esi,edx 0000002c  jae         0000003F 0000002e  or          ebx,dword ptr [edi+esi*4+8] for (int i = 0, j = length.GetUpperBound(0); i <> 00000032  add         esi,1 00000035  cmp         esi,eax 00000037  jl          0000002A             }             return total; 00000039  mov         eax,ebx 0000003b  pop         ebx  0000003c  pop         esi  0000003d  pop         edi  0000003e  ret              0000003f  call        792B42E9 00000044 int 3 | 
A Bit About Foreach
Foreach is not an IL concept, it is a compiler concept. Compilers generate normal for loops when they see a foreach being used to iterate an array. If you are interested in seeing how foreach loops work I would highly recommend taking a look with ILDASM or Reflector. I will not discuss heavily foreach loops as they have at this moment no difference when they reach the native level.
Foreach could in the future offer many benefits over a general for loop. Since the foreach loop is explicitly stating that you want to iterate through the array (not allowing things like addition and subtraction to your counter), other things such as hoisting array bounds checks (which we will discuss shortly) would be much more easily accomplished. I would imagine that in the future foreach will be the preferred iteration construct.
I will assure you though that as of now foreach is not faster than the equivalent for loop when dealing with arrays, in fact both VB.NET and C# insure that they are identical. There is one case where foreach will be slower than a for loop and that is if you do not actually use the item within your loop, the for version will obviously be faster since it never loads the variable where as the foreach does this implicitly on every iteration. Foreach can however offer you great performance increases when dealing with other types that implement IEnumerable (not including collections as they simply perform an indexed lookup in their enumerator).
Logically one can come up with a great example for this when looking at a linked list. The for loop will cause ∑ n total operations on the iteration where as the foreach will only cause n. The reason for this is every index operation would have to start at the beginning of the list and iterate n nodes in order to return the nth node where as the enumerator will simply remember the last node it was on and give the next node. For this reason we will add our second rule.
When dealing with items that are enumerable as opposed to dealing with arrays or collections; prefer foreach to a for loop as the enumerator will often offer a much faster way of enumerating than using an index.
Array Bounds Check Hoisting
The check to find out whether or not we are valid to continue is not the only thing that can be hoisted from inside of a loop. In fact every time the variable is used in a comparison within the loop is an opportunity for hoisting to occur. The first example of this type of hoist we will look at is array bounds checking.
Array bounds are checked automatically by the CLR in order to prevent things like buffer overflows from occurring. Every time that you access an array in safe code you will in fact have a comparison occur to insure that you are within the range of the array. If you are not within a valid range an IndexOutOfRangeException will occur as opposed to writing happily beyond the end of your array as many other language such as C would do.
The problem with array bounds checks is that they are extremely redundant when dealing with loops. Consider the following code that shows what is happening.
|         static void SampleArrayBoundsCheck() {             int total = 0; for (int i = 0; i <> if (i <>                     total |= foo[ i ];                 }             } } | 
| Listing 9: Sample Array Bounds Code | 
Naturally you are not issuing these checks in your code, but this example helps to make what’s really going on a bit clearer. When looking at this code anyone who has read the first few chapters of a C# book would scratch their head and wonder why all of the redundancy has been placed into the code. It is obvious that if our counter has a constraint to stay below foo.Length that it will in fact always succeed the conditional of being below foo.Length. To test how intelligent the JIT is with handling this situation we can use the following code. Note that for this test we will simply be looking at the native code generated as opposed to measuring performance.
|         static int [] SampleArrayBoundsCheck() {             int [] Destination = new int[foo.Length]; for (int i = 0; i <>                 Destination[ i ]= foo[ i ];             }             return Destination; } | 
| Listing 10: Test to see if array bound checks hoist | 
This code is simply copying the static array that we have been using previously to another array. This code should be particularly good to test as it in fact has two bounds checks occurring within it (one for each array).
|         static int [] SampleArrayBoundsCheck() {             int [] Destination = new int[foo.Length]; 00000000  push        edi  00000001  push        esi  00000002  push        ebx  00000003  push        ebp  00000004  push        eax  00000005  mov         edi,dword ptr ds:[022B1EC4h] 0000000b  mov         ebx,dword ptr [edi+4] 0000000e  mov         edx,ebx 00000010  mov         ecx,7915982Ah 00000015  call        FFB21FC0 0000001a  mov         esi,eax for (int i = 0; i <> 0000001c  xor         edx,edx 0000001e  test        ebx,ebx 00000020  jle         00000045 00000022  mov         ebp,dword ptr [edi+4] 00000025  mov         eax,dword ptr [esi+4] 00000028  mov         dword ptr [esp],eax                 Destination[ i ]= foo[ i ]; 0000002b  cmp         edx,ebp 0000002d  jae         0000004D 0000002f  mov         ecx,dword ptr [edi+edx*4+8] 00000033  mov         eax,dword ptr [esp] 00000036  cmp         edx,eax 00000038  jae         0000004D 0000003a  mov         dword ptr [esi+edx*4+8],ecx for (int i = 0; i <> 0000003e  add         edx,1 00000041  cmp         ebx,edx 00000043  jg          0000002B             }             return Destination; 00000045  mov         eax,esi 00000047  pop         ecx  00000048  pop         ebp  00000049  pop         ebx  0000004a  pop         esi  0000004b  pop         edi  0000004c  ret               0000004d  call        792B4511 00000052 int 3 | 
| Listing 11: Disassembly of simple array bound checks | 
Unfortunately even this most trivial of examples does not hoist the array bounds checks. The jumps can clearly be seen on lines 2D and 38. As to why this does not work I am not sure but perhaps it is because code like this could exist which could cause a buffer overflow? Perhaps the amount of time detect this situation has been deemed too much.
|         static int [] SampleArrayBoundsCheck() {             int [] Destination = new int[foo.Length]; for (int i = 0; i <>                 i += 100000000;                 Destination[ i ]= foo[ i ];             }             return Destination; } | 
| Listing 12: Buffer overfow example | 
Checking before applying the optimization would require a good amount of overhead and if it did not check to make sure anyone was messing with our counter in the interior of the loop then we could run into code like this which would cause a buffer overflow (who knows what we just wrote over, maybe we would get an exception or maybe we just caused a funny character in a string some place, boy do I miss the days of C/Pascal in embedded systems J). Let’s try something even simpler, Brad Abrams says it works so it must?
|         static int SampleArrayConstantBoundsCheck() {             int foo2 = 0; for (int i = 0; i <>               foo2 = foo[ i ];             }             return foo2; } | 
|         static int SampleArrayConstantBoundsCheck() {             int foo2 = 0; 00000000  push        esi  for (int i = 0; i <> 00000001  xor         edx,edx 00000003  mov         ecx,dword ptr ds:[022B1EC4h] 00000009  mov         esi,dword ptr [ecx+4]               foo2 = foo[ i ]; 0000000c  cmp         edx,esi 0000000e  jae         00000021 00000010  mov         eax,dword ptr [ecx+edx*4+8] for (int i = 0; i <> 00000014  add         edx,1 00000017  cmp         edx,3E8h 0000001d  jl          0000000C 0000001f  pop         esi              }             return foo2; 00000020  ret              00000021  call        792B44A9 00000026  int         3    | 
| Listing 13: Simplest possible example | 
Again, no love from the JIT optimizer, we cannot possibly make this example any simpler but it still has the bounds checks firmly placed within our loop. The JIT optimizer does not hoist array bounds checks for you. My guess is that it will not do this do to threading reasons, I would assume the process to actually be thread safe (as it is only copying something of reference size which is assured to be atomic but I am probably missing something. There is some similar functionality which happens that I think may cause the confusion that array bounds hoist are actually occurring. Let’s take a look at this other optimization, but first we can make a new rule.
If you are dealing with array accesses in a highly performant area and the bounds checks being in the loop are too much. You will have to handle your iteration in unsafe code to remove the bounds checks.
Local Array Bounds Check Removal
The optimization that the JIT does support is removing bounds checks for locally created arrays. Since the array has been created locally and is only known by a local reference, the JIT can be sure that the array cannot possibly change. In these cases the JIT will completely remove bounds checks. Let’s take a look at an example.
|         static int TestLocalArrayBoundsCheckRemoval() {             int [] Test = new int[10000];             int total = 0;             for(int i=0;i                 total |= Test[ i ];             }             return total; } | 
|         static int TestLocalArrayBoundsCheckRemoval() {             int[] Test = new int[10000]; 00000000  push        esi  00000001  mov         edx,2710h 00000006  mov         ecx,7915982Ah 0000000b  call        FFB21FF0 00000010  mov         ecx,eax             int total = 0; 00000012  xor         esi,esi             for(int i=0;i 00000014  xor         edx,edx 00000016  mov         eax,dword ptr [ecx+4] 00000019  test        eax,eax 0000001b  jle         00000028                 total |= Test[ i ]; 0000001d  or          esi,dword ptr [ecx+edx*4+8]             for(int i=0;i 00000021  add         edx,1 00000024  cmp         eax,edx 00000026  jg          0000001D             }             return total; 00000028  mov         eax,esi 0000002a  pop         esi  0000002b ret | 
| Listing 14: Removal of local array bounds | 
What is really neat here is that we have in fact created unmanaged code which does not even check for out of bounds access. This code will never throw an exception (it doesn’t even have code to throw an exception). This optimization is even better than hoisting as it has no overhead, the hoisting would make it a single comparison; this had no comparison. My guess is that people were getting confused between this and hoisting. If you do not use a length property or a constant in your loop this optimization will not occur, as an example the following code will not remove the bounds checks
|         static int SampleArrayConstantBoundsCheckRemoval() {             int foo2 = 0;             int size = foo.GetUpperBound(0) ; for (int i = 0; i <>                 foo2 = foo[ i ];             }             return foo2;         } | 
|         static int TestLocalArrayBoundsCheckRemoval() {             int[] Test = new int[10000]; 00000000  push        edi  00000001  push        esi  00000002  mov         edx,2710h 00000007  mov         ecx,7915982Ah 0000000c  call        FFB21FF0 00000011  mov         esi,eax             int total = 0; 00000013  xor         edi,edi             int size = Test.GetUpperBound(0); 00000015  mov         ecx,esi 00000017  xor         edx,edx 00000019  cmp         dword ptr [ecx],ecx 0000001b  call        79266720 00000020  mov         edx,eax             for(int i=0;i 00000022  xor         eax,eax 00000024  test        edx,edx 00000026  jle         0000003A 00000028  mov         ecx,dword ptr [esi+4]                 total |= Test[ i ]; 0000002b  cmp         eax,ecx 0000002d  jae         0000003F 0000002f  or          edi,dword ptr [esi+eax*4+8]             for(int i=0;i 00000033  add         eax,1 00000036  cmp         eax,edx 00000038  jl          0000002B             }             return total; 0000003a  mov         eax,edi 0000003c  pop         esi  0000003d  pop         edi  0000003e  ret              0000003f  call        792B4541 00000044  int         3    | 
| Listing 15: Manual hoisting causes Array Bounds Check Removal to fail | 
As we can see the manual hoisting caused the optimization to fail. So manual hoisting is not always a good thing as it can break some JIT patterns such as this. This optimization will also work for writes.
|         static int [] TestLocalArrayBoundsCheckWrite() {             int[] Test = new int[10000];             for(int i=0;i                 Test[ i ] = i;             }             return Test; } | 
|         static int [] TestLocalArrayBoundsCheckWrite() {             int[] Test = new int[10000]; 00000000  mov         edx,2710h 00000005  mov         ecx,7915982Ah 0000000a  call        FFB21FF0 0000000f  mov         ecx,eax             for(int i=0;i 00000011  xor         edx,edx 00000013  mov         eax,dword ptr [ecx+4] 00000016  test        eax,eax 00000018  jle         00000025                 Test[ i ] = i; 0000001a  mov         dword ptr [ecx+edx*4+8],edx             for(int i=0;i 0000001e  add         edx,1 00000021  cmp         eax,edx 00000023  jg          0000001A             }             return Test; 00000025  mov         eax,ecx 00000027  ret              | 
| Listing 16: Bounds Check Removal for writes | 
As we can see, it again removed all of the bounds checking. Of course as we have seen earlier, this will only work for an array that is created in the same method. If we move the creation of the array out of the method this optimization will no longer occur. This leads us to our third rule.
Try to keep creation and initialization of arrays within the same method as the JIT can often times remove the bounds checks during your initialization.
Another oddity
Were you one of the people I said “good catch” to earlier? Remember those odd “Dummy” variables were using in the first set of examples
The JIT is extremely smart in some optimizations. For example it realizes if you are doing calculations and never use the result, it will remove the code for you. It does however have a slight problem when dealing with loops. Let’s try removing those Dummy calls from our previous code and see what happens. Let’s take a look at only one of them since we have already shown that they produce identical results.
|         static void Test1NoHoist() {             int total = 0; for (int i = 0; i <>                 total|=i;             } } | 
|               static void Test1NoHoist() {             int total = 0; 00000000  xor         edx,edx for (int i = 0; i <> 00000002  mov         eax,dword ptr ds:[022B1EC4h] 00000007  mov         eax,dword ptr [eax+4] 0000000a  test        eax,eax 0000000c  jle         00000015 0000000e  add         edx,1 00000011  cmp         eax,edx 00000013  jg          0000000E 00000015  ret              | 
| Figure 17: Our method without the dummy value being set | 
I have color coded the example as with the rest of the example. You will notice that the blue section is missing. The JIT optimizer has correctly realized that we were in fact never using our total variable after we performed all of the calculation on it. It has however failed to realize that by taking out our code, it has in fact left a loop which does nothing. Nothing super interesting here, this is just something I came across while writing this.
Summary
We have looked through quite a few JIT optimizations in this post and have made a few rules which that can help us in situations where measuring becomes very difficult as we need to know what to measure against. Let’s go back through our rules, keep in mind that these rules are not version agnostic so using 1.x, mono, or rotor you will likely get different results!
1) If you wish to use a method call for your stop condition that does anything more complex than simply returning a variable or is not inlinable for other reasons such as it being virtual, you should hoist it manually in order by placing it in the first part of your for loop to make it explicit to the JIT that you do not wish the method to be called on every iteration.
2) When dealing with items that are enumerable as opposed to dealing with arrays or collections; prefer foreach to a for loop as the enumerator will often offer a much faster way of enumerating than using an index.
3) If you are dealing with array access in a highly performant area and the bounds checks being in the loop are too much overhead. You will have to handle your iteration in unsafe code to remove the bounds checks (in other words, it is impossible to write performant looping code in VB.NET)
4) Try to keep creation and initialization of arrays within the same method as the JIT can often times remove the bounds checks during your initialization.
Hopefully by following these rules you can save yourself some time the next time you are optimizing code as many of these rules deal with things that are difficult to measure, but remember measurement is key.
I hope you enjoyed reading about these optimizations as much as I enjoyed writing about them.
Reference :- http://codebetter.com/blogs/gregyoung/archive/2006/06/11/146343.aspx
![Idea [I]](http://codebetter.com/emoticons/emotion-55.gif) ;
; 
