Java Strings
Java Strings are probably the most common objects you will work with so it is worth considering them. Although they are so simple, used improperly they can impact system behaviour.
String basics
The String
is designed to represent text whereby each String
object is actually an immutable array of chars. The JVM treats them somewhat differently to other objects as this post will explain.
// creates a new string object
String fasterBike = "BT Ultra";
// creates two new string objects
String fastBike = new String("Cervelo T5");
// does not create a new object!
String copyOfFasterBike = "BT Ultra";
The first method shows the standard way commonly used to create new String objects. This is the first special thing about String objects - they do not have a standard constructor.
The second method creates a new String
but in most cases it should be avoided. (note: I can’t imagine why one would do it this way) Why, because it creates the String
“Cervelo T5” in the correct way and then it passes that String into the String constructor thus creating yet another object. It is simply best to create the String using the first method.
The third method is worth considering. Remember, the String is a special case as far as the JVM is concerned. String objects are immutable and the JVM will keep track of the text contained within each String object. When it finds that two objects contain the same text it will reuse the first object (in memory) rather than create a new object. Thus, in this case, copyOfFasterBike
is simply referencing fasterBike
.
Strings are immutable
There are a number of reasons why immutability is attractive in a range of situations. Immutable objects are inherently simple, they maintain one single state and they can be reused as many times as possible without having to worry about changed state. Also, by their nature immutable objects are thread safe as by definition multiple threads can not change their state. This is an easy approach to thread safety.
The String is an immutable object. where subsequent changes (i.e. concatenation) gives rise to a new object being created. Consider this following code:
String fasterBike = "BT Ultra";
String fastBike = "Cervelo T5";
String concatenatedString = fastBike + " is fast but " + fasterBike + " is faster";
This is inefficient code. It has created five objects when the program should have created just three String objects. (count them!) The reason why this happens is that String objects are immutable - they can’t be changed once created. Although in this naive example there is practically no performance hit, imagine if the concatenatedString
was created inside a loop over 1,000 items.
Use StringBuilder for concatenation
Remember, String
objects are immutable so we need a way of efficiently concatenating text. For this we use the StringBuilder
object which provides a massive performance improvement. Try running the code below.
/**
* An example of why StringBuffer is more efficient for concatenating Strings.
*/
public class StringConcatenation {
public static void main(String[] args) {
long startTime = System.currentTimeMillis();
// Very slow
String slowString = ""; //create the object by auto boxing 0
for (int i = 0; i <= 100000; i++ ) {
slowString += "test"; //i;
}
long slowEndTime = System.currentTimeMillis();
long durationSlow = (slowEndTime - startTime);
System.out.println("Slow: duration: " + durationSlow + ", sum: " + slowString);
// much faster
startTime = System.currentTimeMillis();
StringBuilder fastStringBuilder = new StringBuilder(""); //create the object by auto boxing 0
for (long i = 0; i <= 100000; i++ ) {
fastStringBuilder.append("test");
}
String fastString = fastStringBuilder.toString();
long fastEndTime = System.currentTimeMillis();
long durationFast = (fastEndTime - startTime);
System.out.println("Number of objeects created: " + Integer.MAX_VALUE);
System.out.println("Fast: duration: " + durationFast + ", sum: " + fastString);
System.out.println("Fast code is : " + (durationSlow - durationFast) + "ms faster ");
}
}
To give an extreme example of concatenating 100,000 time the StringBuffer provides a 17768ms improvement over naive concatenation method. In many applications this won’t be a problem however for a high traffic web application where both CPU and memory is at a premium the StringBuilder
technique is important.
In performance terms, using the String
concatenation operator (+=) repeatedly to concatenate n String
objects requires time quadratic in n. (Bloch, 2008). Just have a look at the results from the sample code above to see for yourself. When two strings are concatenated, the contents of both are copied.
Note, previously StringBuffer
was used for String
concatenation. StringBuilder
is preferable because the former was synchronised and thus less than ideal for most situations.
Avoid using Strings for other purposes
Knowing that a String is in fact an array of primitive chars we know that a String object is in fact not a primitive of the Java language. This is quite different to how Strings are represented in Javascript for example. JavaScript treats string literals (text wrapped in a quite marks (“ “)) as a primitive string type which makes working with strings easy in JavaScript.
Because this is not the case in Java, where possible String objects should not be used for purposes other than representing text. That is, they should not be used for other value types. Java’s unique handling of the String objects and the fact that we simply can not compare strings in the same way as we can primitives (i.e. using the ‘==’ operator) is not ideal.
Let’s consider how we receive data from databases, networks and other systems. Often they comes in the form of text. For example, consider this JSON:
{
id: "10",
name: "Bobby Digitial"
}
It would be better to convert the ID to either a long or int primitive rather than leave it as String. This will make working with the value far easier. Comparing becomes easy as no compare()
method is required. What about boolean (binary values) that have either an “on” or “off” state? Would converting to a boolean primitive (or object) be easier to work with? I think so.
String objects also do not work particularly well for compound keys when used directly. That is, a compound key being made up of multiple values. For example this is not ideal:
String myCompoundKey = id + "_" + SomeClass.class.getName();
There are a couple of problems here. Firstly, as discussed earlier String
concatenation is slow. Secondly, working with the key in this fashion will require text parsing which is generally quite slow. In a simple example like this it is no problem but for 100s or 1000s of records it’s not good. Thirdly, the String
class is marked as final
and it can not be extended. With this in mind, how will you add any system specific functionality to this key? It would be better to write a class which represents the key. This class would aggregate the two values and provide a number of standard methods to work with the values (e.g. toString()
, equals()
, compareTo()
) with which the program can work with.
Conclusion
The String
class is best used for representing text. Although this class is a cornerstone of many programs is can cause performance issues when not used properly. Sometimes it best to replace a string variable with primitive types, enums or aggregates as appropriate.
« LinkedList vs ArrayList in Java
Nice Sort Algorithm Animation »