C# Collection : Behind the Scene

Author
Recent Posts

Atul Sharma

An extrovert by nature, Atul has been sharing his opinions and views on various social network platforms. He always had been thinking about having his own platform and here it is. Creative and tech-savvy Atul has a good sense of humor and when he is not programming or writing blogs or traveling, you can find him cooking, Netflixing or sleeping.

1) Array-Based Collection –

Before going further, let us first understand the Array.

This is the basic collection and we are talked about this in Chapter #1, so we will just relook some of its characteristics and a bit more –

An array is a type-safe collection – Having type safety in a collection is great.
Memory allocation is contiguous – This makes the iteration faster.
Size of an Array is fixed – Most of the things are decided at run time so it could not be a good option.
To increase the size of an array, we will have to copy that to another bigger array – Another issue when we don’t know the size in the beginning.

Memory Allocation –

List uses Array to store the items. When it needs to increase the size, it creates a new array with double size and it is known as Capacity. This memory allocation and expansion is true for ArrayList as well.

How does List<T> work?

Default Capacity is 4. which is evident from this code.

using System;
using System.Collections;
using System.Collections.Generic;

namespace ArrayBasedCollectionsBasic
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] sValues = new string[6];
            ArrayList sVals = new ArrayList();

            Array.IndexOf(sValues, "Atul1");

            List<string> sVals1 = new List<string>();

            for (int i = 0; i < 17; i++)
            {
                sVals.Add("Atul" + i);
                Console.WriteLine($"Item Count {sVals.Count} - Capacity {sVals.Capacity}");

            }

            Console.WriteLine("Deleting");
            for (int i = sVals.Count - 1; i >= 0; i--)
            {
                sVals.RemoveAt(i);
                Console.WriteLine($"Item Count {sVals.Count} - Capacity {sVals.Capacity}");
            }
            Console.WriteLine("Hello World!");
        }
    }
}

So, when we are adding up to four items, its capacity remains at 4. As soon as the 5^th item is requested to insert, .NET CLR copies this whole array to a new array with a size of 8. Old memory allocation is freed later. Now for the addition of the 9th item, it does the same thing and creates a new array of size 16 which gives the capacity as 16. This process goes on and on.

This output window proves this

Doubling the size is part of the optimal memory allocation algorithm by .NET CLR. If it increases the size by 1, it will be an overload on the processor and memory as it will have to copy and recreate and destroy again and again for each new item added.

From our analysis, it is evident that even if we remove the items from the List/ArrayList, its capacity remains the same.

Conclusion –

Lists are better than Array, as there is no size restriction and they are strongly typed as well.
Increase of Array, size has to be done by the programmer while for List and ArrayList .NET CLR takes care of this.
By Default, insertion in List/ArrayList is done in the last.
Inserting the item, at any middle index, will be costly.
1. If after insertion, it remained within the capacity, additional work of shifting items will have to happen
2. If after insertion in middle, goes beyond the capacity, then it will indulge in the work of new collection creation along with shifting of elements.

2) LinkedList Collection

As we see the last point in the List, inserting in the middle, creates a big problem and performance hit and for this we get LinkedList.

Based on the structure, it is evident that

Searching in LinkedList is costly.
Adding items in the middle is faster.

3) Hash Table Based Collections

Before getting further in these collections, let us first try to understand the Hash Set. What are they and how are they calculated?

Here hash value is calculated based on any certain logic. It is because of GetHashCode() method of System.Object base class.

For this, illustration, I have used ITEM%4 to get my hash. And this how items will be stored.

For search, The first Hash of the item will be calculated and then it will get it.

Supported Collections are HashTable and HashSet.

Conclusion –

Searching is very fast, almost O(1)
No duplicate or null key is allowed.
This is not sorted so any order of insertion or searching is the same.

4) Binary Tree Based Collection

Now next category of collections uses the binary tree for storing items. SortedDictionary<T, T> and SortedSet<T> are in this category.

Without getting in to further details of Binary tree , we can easily say that any collection implementing this will have better Search, Insert, Delete operation than any other collection.

Characteristics of this type of collections are –

Key cant be null and should be unique in SortedDictionary <X, Y> and SortedList<T, T>
Search for these collections is faster than any other collection.

With this, I assume I have given conceptual details of C# Collection implementation and I hope this will help you next time when you think about type, speed and memory.

Happy Coding !!

1) Array-Based Collection –

Memory Allocation –

How does List<T> work?

Conclusion –

2) LinkedList Collection

3) Hash Table Based Collections

Conclusion –

4) Binary Tree Based Collection

Share this:

You may like these too...