Chicken Scratches

When to use ToArray()

Some thoughts on when to return a List<> object and when to return
an array in C#.

The analogy: strings

Many times in C#, a function that needs to build a string from
constituent parts will use a StringBuilder internally, and then
on the last line of the function call ToString() on the
StringBuilder object. This is because strings in C# are
immutable. That is, once they are set then can never be
changed. Code that looks like it is changing a string in place
is usually actually allocating a new string to hold the new
value. Obviously, for a function that builds a string
incrementally by constantly appending to it, this can lead to
lots of reallocation, slowing down performance.

Here’s an example of how it’s normally done:

    public string GimmeString(int num)
    {
        StringBuilder sb = new StringBuilder("Initiating countdown: ");

        for (int i = num; i > 0; i--)
        {
            sb.Append(string.Format("{0}...", i));
        }
        sb.Append("Blastoff!\n");

        return sb.ToString();
    }
    

This is a pretty obvious design pattern. It’s rare you would
want to return the StringBuilder object itself, so almost always
it is converted to a string before returning. The problem gets
trickier, however, when we try to use a similar design pattern
for arrays.

Arrays

Arrays in C# are not immutable, at least not in the way strings
are. It could be said they are “partially immutable” (my own
made-up term). You can swap out elements in the array as much as
you like, but the array’s length is fixed. Therefore, if you
need to build up an array from constituent parts, it makes sense
to use a List<> object for doing
the work.

The analogy then becomes, in SAT terms:

StringBuilder : string :: List<> : array

The question then
becomes whether or not to call ToArray() on the List<> before returning
it. In this case, the answer’s not as obvious. Let’s examine the
pros and cons of each approach:

Reasons to call ToArray()

  • If the returned value is not meant to be modified, returning
    it as an array makes that fact a bit clearer.
  • If the caller is expected to perform many non-sequential
    accesses to the data, there can be a performance benefit to an
    array over a List<>.
  • If you know you will need to pass the returned value to a
    third-party function that expects an array.
  • Compatibility with calling functions that need to work with
    .NET version 1 or 1.1. These versions don’t have the List<> type (or any generic types, for
    that matter).

Reasons not to call ToArray()

  • If the caller ever does need to add or remove
    elements, a List<> is absolutely required.
  • The performance benefits are not necessarily guaranteed,
    especially if the caller is accessing the data in a sequential
    fashion. There is also the additional step of converting from
    List<> to array, which takes processing time.
  • The caller can always convert the list to an array themselves.

ToArray() or not ToArray()? That is the question.

Based on these points, it seems to make the most sense as a
general rule to simply return the List<> object directly, rather than
converting it to an array before returning. Let me know if you
disagree.

Here’s an example:

    // A contrived example. Similar to Python's "range" function, but only
    // supports positive step.
    public List<int> GimmeInts(int start, int end, int step)
    {
        List<int> ret = new List<int>();

        for (int i = start; i < end; i += step)
        {
            ret.Add(i);
        }
        // Here you could have:
        // return ret.ToArray();
        return ret;
    }
    
%d bloggers like this: