C# String Interning for Efficient String Comparison

Atul Sharma

Source code used in this article is available here for experiment

When it comes to string comparison, We must think about performance in terms of memory and time. Sometimes, lack of concepts and basic understanding lead to performance penalties.

In this article, we are going to cover String Interning – A very important feature of .NET Framework in perspective of string comparison. I will cover the following topics to make it a one-stop article for string interning and it’s benefits and associated issues –

  • Introduction to String Interning
    • String intern at Compile time
    • String intern at Run time
  • Methods in String Interning
  • Performance Analysis with and without String interning (time and memory)
  • Issues with String Interning

Introduction to String Interning

If we have multiple instances of the same string literal in an assembly, Common Language Run time (CLR) retains only one instance of that variable and frees up other memory allocations. Internally, CLR maintains one table known as Intern Pool, which stored the single instance of all unique strings in the assembly.

String Interning at Compile Time

Code Example
            string myName = "Atul";
            string YourName = "Atul";
            string name1 = "A" + "t" + "u" + "l";
            string name2 = "A" + "tul";
            
            Console.WriteLine(object.ReferenceEquals(myName, YourName));
            Console.WriteLine(object.ReferenceEquals(myName, name1));
            Console.WriteLine(object.ReferenceEquals(myName, name2));

In this example, first of all, string variable myName is created. Then YourName string variable is created, it refers to the same myName variable (Though memory is allocated to this variable, but never used). On the similar line, name1 and name2 variables are created and referenced to myName variable. That is approved by the output of the program as all of them are coming as true, means all variables are referring to the same memory location.

See also  Measuring Code Quality with Visual Studio

By default, On compile time all unique strings are created once and their reference is returned to the new string variables having the same value. Please note that C# is a case-sensitive language so Atul and ATUL will be treated as different string literals.

Important Catch – For string interning, one string has to be created to be interned it means in above example YourName, name1 and name2 variables will be created but since their values were found in Intern Pool, so those memory references (used by YourName, name1 and name2 variables ) will NOT be referenced anymore and will get cleaned in next garbage collection run.

String Interning at Run Time

As we saw, on compile time string interning runs by default, but for runtime scenario, it has a separate story. If We run the following code example, we get all output as false.

            StringBuilder sb1 = new StringBuilder("A");
            StringBuilder sb2 = new StringBuilder("Atul");
            StringBuilder sb3 = new StringBuilder();

            string name3 = string.Format("{0}", "Atul");
            string name4 = string.Format($"{"Atul"}");

            string name5 = sb1.Append("t").Append("ul").ToString();
            string name6 = sb2.ToString();
            string name7 = sb3.Append("Atul").ToString();

            Console.WriteLine(object.ReferenceEquals(myName, name3));
            Console.WriteLine(object.ReferenceEquals(myName, name4));
            Console.WriteLine(object.ReferenceEquals(myName, name5));
            Console.WriteLine(object.ReferenceEquals(myName, name6));
            Console.WriteLine(object.ReferenceEquals(myName, name7));

and here we get in the trouble. Still, each string literal has the same value but they are stored separately. If there are too many string variables with the same value, it can lead to huge memory issue.

To rescue us from this scenario, we can use string.Intern to get similar intern pool behavior at run time. Code implementation is as below

            string name31 = string.Intern(string.Format("{0}", "Atul"));
            string name41 = string.Intern(string.Format($"{"Atul"}"));
            string name51 = string.Intern(name5);
            string name61 = string.Intern(sb2.ToString());
            string name71 = string.Intern(sb3.ToString());

            Console.WriteLine(object.ReferenceEquals(myName, name31));
            Console.WriteLine(object.ReferenceEquals(myName, name41));
            Console.WriteLine(object.ReferenceEquals(myName, name51));
            Console.WriteLine(object.ReferenceEquals(myName, name61));
            Console.WriteLine(object.ReferenceEquals(myName, name71));

and here we get the output as all true. Amazed… ?? Yes, now all string variables a referenced from the same memory location where I had created by first variable myName and memory location used by name31, name41, name51, name61 and name71 will be freed up in as in Garbage Collection process.

See also  Implement Multiple Inheritance in C#

IsInterned and Intern Methods

With in the string interning family, we get two methods, String.IsInterned and String.Intern

string.IsInterned returns the string which it refers to after interning.

Caution – Do NOT get confused with the name of the method, it does NOT return boolean.

string.Intern method also return the string which interned string it refers to.

Difference between string.IsInterned and string.Intern is that first returns a null value if that string is not interned while later (string.Intern) creates a new entry in the intern pool and returns that reference.

Let us examine the code and verify the facts said above with output –

            Console.WriteLine("String Interne methods ...");
            Console.WriteLine("IsInterned Static");
            Console.WriteLine(string.IsInterned(YourName));
            Console.WriteLine(string.IsInterned(name1));
            Console.WriteLine(string.IsInterned(name2));
            Console.WriteLine(string.IsInterned(name3));

            Console.WriteLine("IsInterned Dynamic");
            Console.WriteLine(string.IsInterned(name3));
            Console.WriteLine(string.IsInterned(name4));
            Console.WriteLine(string.IsInterned(name5));
            Console.WriteLine(string.IsInterned(name6));
            Console.WriteLine(string.IsInterned(name7));

            Console.WriteLine("string.IsInterned");
            Console.WriteLine(string.IsInterned(name31));
            Console.WriteLine(string.IsInterned(name41));
            Console.WriteLine(string.IsInterned(name51));
            Console.WriteLine(string.IsInterned(name61));
            Console.WriteLine(string.IsInterned(name1 + "Sharma"));
            Console.WriteLine(string.IsInterned(name71));
            
            Console.WriteLine("string.Intern");
            Console.WriteLine(string.Intern(name31));
            Console.WriteLine(string.Intern(name41));
            Console.WriteLine(string.Intern(name51));
            Console.WriteLine(string.Intern(name61));
            Console.WriteLine(string.Intern(name71));
            Console.WriteLine(string.Intern(name1 + "Sharma"));

And here we get the output –

Here we see, all strings are already interned so in string.IsInterned is returning the same string value as Atul but one as null for code at Line # 20 i.e. – Console.WriteLine(string.IsInterned(name1 + “Sharma”));

Since here I am using run time (concatenating with name1 + “Sharma”) string and it is not in the intern pool so it is returning NULL. Had I used it as “Sharma” (as hard-coded, so making it compile time variable), then it would have created one entry in the intern pool and returned the “Sharma” from intern pool. So this proves our statement that if string.Interned is passing any un-interned (new string literal) string then it will return null.

But on the other side, when I had used code at line # 29 i.e. Console.WriteLine(string.Intern(name1 + “Sharma”)); See the output in last line

Even if it is not in intern pool, it will create a new entry and return that reference value.

So, now we understood the clear difference between both methods.

Performance Comparison with and without String Interning

To evaluate the scenario, I am going to compare two string values with and without string interning and then will discuss the time and memory used in both scenarios. I will be doing that for 100 Million times For that purpose, I have this code –

static void CompareWithStringIntern()
        {
            Console.WriteLine("CompareWithStringIntern()");
            string source = "Atul";
            string target = string.Intern(string.Format($"{"Atul"}"));
            bool isEqual = false;
            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int i = 0; i < 100000000; i++)
            {
                if (source == target)
                    isEqual = true;
                else
                    isEqual = false;
            }
            sw.Stop();
            Console.WriteLine($"Time - {sw.ElapsedTicks}");
            Console.WriteLine($"Memory - {GC.GetTotalMemory(true)}");
        }

        static void CompareWithoutStringIntern()
        {
            Console.WriteLine("CompareWithoutStringIntern()");
            string source = "Atul";
            string target = string.Format($"{"Atul"}");
            bool isEqual = false;
            Stopwatch sw = new Stopwatch();
            sw.Start();
            for (int i = 0; i < 100000000; i++)
            {
                if (source == target)
                    isEqual = true;
                else
                    isEqual = false;
            }

            sw.Stop();
            Console.WriteLine($"Time - {sw.ElapsedTicks}");
            Console.WriteLine($"Memory - {GC.GetTotalMemory(true)}");

        }

And here is the output –

Performance evaluation of String Comparison With and Without string interning

Conclusion –

And here it is evident from the output that in comparison execution time, we get a huge boost (more than 2 times).

See also  Dynamics 365 : What is ‘Customer’ data type?

While in memory we see the difference as with Intern it is consuming slightly more memory because all memories are in single scope and garbage collection will run once. In an actual real-life scenario, unused memory (created while instantiating) will be cleaned up in several GC executions and will go through all generations.

Hence, it is good practice to use string interning for string comparison.

Issues with String Interning

As we saw here, the scope of the Intern pool is assembly, domain so they may not get cleaned during Garbage Collection and CLR will have to clear that.

The second issue refers to the fact mentioned in Important catch i.e. memory will still be allocated to new string but will be abandoned as soon as it gets one entry in Intern Pool.

With this, I hope that I could explain the concept clearly and it doesn’t leave any confusion. Should you have any doubt, Please write and source code used in this article is available here for experiment

Reference –

https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=netframework-4.7.2