Understanding C# Collection

C# Collections – Behind the Scene: A Must for Every C# Programmers

Atul Sharma

In this article, we saw which C# Collection should we use in which scenario. That is all about the business requirements and implementation, not giving much attention to performance. So in this article, we will learn more about the internal implementation of collections. Yes, it is not too much, to be a good programmer, you need to have some basic level of understanding of how that is implemented.

All C# collections are implemented using some basic data structure in the background so based on this, I have categorized them into 4 parts as

  1. Array-Based Collections
  2. LinkedList Collection
  3. Hash Table Based Collections
  4. Binary Tree-Based Collection

Here we will look in more detail and see how they may impact performance and what care should we take for each of them?

1) Array-Based Collection –

Before going further, let us first understand the Array.

This is the basic collection and we are talked about this in Chapter #1, so we will just relook some of its characteristics and a bit more –

  1. An array is a type-safe collection – Having type safety in a collection is great.
  2. Memory allocation is contiguous – This makes the iteration faster.
  3. Size of an Array is fixed – Most of the things are decided at run time so it could not be a good option.
  4. To increase the size of an array, we will have to copy that to another bigger array – Another issue when we don’t know the size in the beginning.
See also  Events & Delegates in C#

To address the problem mentioned in #3 and #4, ArrayList was introduced. An ArrayList can store as many as elements we want so NO SIZE restriction.

But it doesn’t have the good characteristics of Array as mentioned in #1.

So now we have a new collection List<T>, it has all the good things from Array and ArrayList, i.e.

  • It is type-safe.
  • NO Size restrictions.

So, let us examine and learn more about List<T>.

Memory Allocation –

List uses Array to store the items. When it needs to increase the size, it creates a new array with double size and it is known as Capacity. This memory allocation and expansion is true for ArrayList as well.

How does List<T> work?

Default Capacity is 4. which is evident from this code.

using System;
using System.Collections;
using System.Collections.Generic;

namespace ArrayBasedCollectionsBasic
{
    class Program
    {
        static void Main(string[] args)
        {
            string[] sValues = new string[6];
            ArrayList sVals = new ArrayList();

            Array.IndexOf(sValues, "Atul1");

            List<string> sVals1 = new List<string>();

            for (int i = 0; i < 17; i++)
            {
                sVals.Add("Atul" + i);
                Console.WriteLine($"Item Count {sVals.Count} - Capacity {sVals.Capacity}");

            }

            Console.WriteLine("Deleting");
            for (int i = sVals.Count - 1; i >= 0; i--)
            {
                sVals.RemoveAt(i);
                Console.WriteLine($"Item Count {sVals.Count} - Capacity {sVals.Capacity}");
            }
            Console.WriteLine("Hello World!");
        }
    }
}

So, when we are adding up to four items, its capacity remains at 4. As soon as the 5th item is requested to insert, .NET CLR copies this whole array to a new array with a size of 8. Old memory allocation is freed later. Now for the addition of the 9th item, it does the same thing and creates a new array of size 16 which gives the capacity as 16. This process goes on and on.

See also  Implement Multiple Inheritance in C#

This output window proves this

Resizing of ArrayList/ List

Doubling the size is part of the optimal memory allocation algorithm by .NET CLR. If it increases the size by 1, it will be an overload on the processor and memory as it will have to copy and recreate and destroy again and again for each new item added.

From our analysis, it is evident that even if we remove the items from the List/ArrayList, its capacity remains the same.

Conclusion –

  1. Lists are better than Array, as there is no size restriction and they are strongly typed as well.
  2. Increase of Array, size has to be done by the programmer while for List and ArrayList .NET CLR takes care of this.
  3. By Default, insertion in List/ArrayList is done in the last.
  4. Inserting the item, at any middle index, will be costly.
    1. If after insertion, it remained within the capacity, additional work of shifting items will have to happen
    2. If after insertion in middle, goes beyond the capacity, then it will indulge in the work of new collection creation along with shifting of elements.

2) LinkedList Collection

As we see the last point in the List, inserting in the middle, creates a big problem and performance hit and for this we get LinkedList.

Based on the structure, it is evident that

  1. Searching in LinkedList is costly.
  2. Adding items in the middle is faster.
See also  Angular Tutorial for Beginners – Part 4 : Interfaces

3) Hash Table Based Collections

Before getting further in these collections, let us first try to understand the Hash Set. What are they and how are they calculated?

Here hash value is calculated based on any certain logic. It is because of GetHashCode() method of System.Object base class.

For this, illustration, I have used ITEM%4 to get my hash. And this how items will be stored.

Hash Table Implementation

For search, The first Hash of the item will be calculated and then it will get it.

Supported Collections are HashTable and HashSet.

Conclusion –
  1. Searching is very fast, almost O(1)
  2. No duplicate or null key is allowed.
  3. This is not sorted so any order of insertion or searching is the same.

4) Binary Tree Based Collection

Now next category of collections uses the binary tree for storing items. SortedDictionary<T, T> and SortedSet<T> are in this category.

Simple Binary Tree
Simple Binary Tree

Without getting in to further details of Binary tree , we can easily say that any collection implementing this will have better Search, Insert, Delete operation than any other collection.

Characteristics of this type of collections are –

  1. Key cant be null and should be unique in SortedDictionary <X, Y> and SortedList<T, T>  
  2. Search for these collections is faster than any other collection.

With this, I assume I have given conceptual details of C# Collection implementation and I hope this will help you next time when you think about type, speed and memory.

Happy Coding !!