- RISE FROM THE ASHES – Sanjib Nandi - November 1, 2021
- साथ में मेरे दोस्त खड़े थे - August 1, 2021
- BenchmarkDotNet: Advanced Features - June 20, 2021
Source code used in this article is available here for experiment
When it comes to string comparison, We must think about performance in terms of memory and time. Sometimes, lack of concepts and basic understanding lead to performance penalties.
In this article, we are going to cover String Interning – A very important feature of .NET Framework in perspective of string comparison. I will cover the following topics to make it a one-stop article for string interning and it’s benefits and associated issues –
- Introduction to String Interning
- String intern at Compile time
- String intern at Run time
- Methods in String Interning
- Performance Analysis with and without String interning (time and memory)
- Issues with String Interning
Introduction to String Interning
If we have multiple instances of the same string literal in an assembly, Common Language Run time (CLR) retains only one instance of that variable and frees up other memory allocations. Internally, CLR maintains one table known as Intern Pool, which stored the single instance of all unique strings in the assembly.
String Interning at Compile Time
Code Example
string myName = "Atul";
string YourName = "Atul";
string name1 = "A" + "t" + "u" + "l";
string name2 = "A" + "tul";
Console.WriteLine(object.ReferenceEquals(myName, YourName));
Console.WriteLine(object.ReferenceEquals(myName, name1));
Console.WriteLine(object.ReferenceEquals(myName, name2));
In this example, first of all, string variable myName is created. Then YourName string variable is created, it refers to the same myName variable (Though memory is allocated to this variable, but never used). On the similar line, name1 and name2 variables are created and referenced to myName variable. That is approved by the output of the program as all of them are coming as true, means all variables are referring to the same memory location.
By default, On compile time all unique strings are created once and their reference is returned to the new string variables having the same value. Please note that C# is a case-sensitive language so Atul and ATUL will be treated as different string literals.
Important Catch – For string interning, one string has to be created to be interned it means in above example YourName, name1 and name2 variables will be created but since their values were found in Intern Pool, so those memory references (used by YourName, name1 and name2 variables ) will NOT be referenced anymore and will get cleaned in next garbage collection run.
String Interning at Run Time
As we saw, on compile time string interning runs by default, but for runtime scenario, it has a separate story. If We run the following code example, we get all output as false.
StringBuilder sb1 = new StringBuilder("A");
StringBuilder sb2 = new StringBuilder("Atul");
StringBuilder sb3 = new StringBuilder();
string name3 = string.Format("{0}", "Atul");
string name4 = string.Format($"{"Atul"}");
string name5 = sb1.Append("t").Append("ul").ToString();
string name6 = sb2.ToString();
string name7 = sb3.Append("Atul").ToString();
Console.WriteLine(object.ReferenceEquals(myName, name3));
Console.WriteLine(object.ReferenceEquals(myName, name4));
Console.WriteLine(object.ReferenceEquals(myName, name5));
Console.WriteLine(object.ReferenceEquals(myName, name6));
Console.WriteLine(object.ReferenceEquals(myName, name7));
and here we get in the trouble. Still, each string literal has the same value but they are stored separately. If there are too many string variables with the same value, it can lead to huge memory issue.
To rescue us from this scenario, we can use
string name31 = string.Intern(string.Format("{0}", "Atul"));
string name41 = string.Intern(string.Format($"{"Atul"}"));
string name51 = string.Intern(name5);
string name61 = string.Intern(sb2.ToString());
string name71 = string.Intern(sb3.ToString());
Console.WriteLine(object.ReferenceEquals(myName, name31));
Console.WriteLine(object.ReferenceEquals(myName, name41));
Console.WriteLine(object.ReferenceEquals(myName, name51));
Console.WriteLine(object.ReferenceEquals(myName, name61));
Console.WriteLine(object.ReferenceEquals(myName, name71));
and here we get the output as all true. Amazed… ?? Yes, now all string variables a referenced from the same memory location where I had created by first variable myName and memory location used by name31, name41, name51, name61 and name71 will be freed up in as in Garbage Collection process.
IsInterned and Intern Methods
string.IsInterned returns the string which it refers to after interning.
Caution – Do NOT get confused with the name of the method, it does NOT return boolean.
string.Intern method also return the string which interned string it refers to.
Difference between string.IsInterned and string
Let us examine the code and verify the facts said above with output –
Console.WriteLine("String Interne methods ...");
Console.WriteLine("IsInterned Static");
Console.WriteLine(string.IsInterned(YourName));
Console.WriteLine(string.IsInterned(name1));
Console.WriteLine(string.IsInterned(name2));
Console.WriteLine(string.IsInterned(name3));
Console.WriteLine("IsInterned Dynamic");
Console.WriteLine(string.IsInterned(name3));
Console.WriteLine(string.IsInterned(name4));
Console.WriteLine(string.IsInterned(name5));
Console.WriteLine(string.IsInterned(name6));
Console.WriteLine(string.IsInterned(name7));
Console.WriteLine("string.IsInterned");
Console.WriteLine(string.IsInterned(name31));
Console.WriteLine(string.IsInterned(name41));
Console.WriteLine(string.IsInterned(name51));
Console.WriteLine(string.IsInterned(name61));
Console.WriteLine(string.IsInterned(name1 + "Sharma"));
Console.WriteLine(string.IsInterned(name71));
Console.WriteLine("string.Intern");
Console.WriteLine(string.Intern(name31));
Console.WriteLine(string.Intern(name41));
Console.WriteLine(string.Intern(name51));
Console.WriteLine(string.Intern(name61));
Console.WriteLine(string.Intern(name71));
Console.WriteLine(string.Intern(name1 + "Sharma"));
And here we get the output –
Here we see, all strings are already interned so in
Since here I am using run time (concatenating with name1 + “Sharma”) string and it is not in the intern pool so it is returning NULL. Had I used it as “Sharma” (as hard-coded, so making it compile time variable), then it would have created one entry in the intern pool and returned the “Sharma” from intern pool. So this proves our statement that if string
But on the other side, when I had used code at line # 29 i.e. Console.WriteLine(string.Intern(name1 + “Sharma”)); See the output in last line
Even if it is not in intern pool, it will create a new entry and return that reference value.
So, now we understood the clear difference between both methods.
Performance Comparison with and without String Interning
To evaluate the scenario, I am going to compare two string values with and without string interning and then will discuss the time and memory used in both scenarios. I will be doing that for 100 Million times For that purpose, I have this code –
static void CompareWithStringIntern()
{
Console.WriteLine("CompareWithStringIntern()");
string source = "Atul";
string target = string.Intern(string.Format($"{"Atul"}"));
bool isEqual = false;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100000000; i++)
{
if (source == target)
isEqual = true;
else
isEqual = false;
}
sw.Stop();
Console.WriteLine($"Time - {sw.ElapsedTicks}");
Console.WriteLine($"Memory - {GC.GetTotalMemory(true)}");
}
static void CompareWithoutStringIntern()
{
Console.WriteLine("CompareWithoutStringIntern()");
string source = "Atul";
string target = string.Format($"{"Atul"}");
bool isEqual = false;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 100000000; i++)
{
if (source == target)
isEqual = true;
else
isEqual = false;
}
sw.Stop();
Console.WriteLine($"Time - {sw.ElapsedTicks}");
Console.WriteLine($"Memory - {GC.GetTotalMemory(true)}");
}
And here is the output –
Conclusion –
And here it is evident from the output that in comparison execution time, we get a huge boost (more than 2 times).
While in memory we see the difference as with Intern it is consuming slightly more memory because all memories are in single scope and garbage collection will run once. In an actual real-life scenario, unused memory (created while instantiating) will be cleaned up in several GC executions and will go through all generations.
Hence, it is good practice to use string interning for string comparison.
Issues with String Interning
As we saw here, the scope of the Intern pool is assembly, domain so they may not get cleaned during Garbage Collection and CLR will have to clear that.
The second issue
With this, I hope that I could explain the concept clearly and it doesn’t leave any confusion. Should you have any doubt, Please write and source code used in this article is available here for
Reference –
https://docs.microsoft.com/en-us/dotnet/api/system.string.intern?view=netframework-4.7.2