TOC
LINQ:

Grouping data: the GroupBy() Method

So far, we have worked mostly with lists of data. We have sorted it, limited it and shaped it into new objects, but one important operation is still missing: Grouping of data. When you group data, you take a list of something and then divide it into several groups, based on one or several properties. Just imagine that we have a data source like this one:

var users = new List<User>()
{
    new User { Name = "John Doe", Age = 42, HomeCountry = "USA" },
    new User { Name = "Jane Doe", Age = 38, HomeCountry = "USA" },
    new User { Name = "Joe Doe", Age = 19, HomeCountry = "Germany" },
    new User { Name = "Jenna Doe", Age = 19, HomeCountry = "Germany" },
    new User { Name = "James Doe", Age = 8, HomeCountry = "USA" },
};

A flat list of user objects, but it could be interesting to group these users on e.g. their home country or their age. With LINQ, this is very easy, even though the use of the GroupBy() method can be a bit confusing in the beginning. Let's have a look at how it works:

using System;    
using System.Collections.Generic;    
using System.Linq;    

namespace LinqGroup    
{    
    class Program    
    {    
        static void Main(string[] args)    
        {    
            var users = new List<User>()    
            {    
                new User { Name = "John Doe", Age = 42, HomeCountry = "USA" },    
                new User { Name = "Jane Doe", Age = 38, HomeCountry = "USA" },    
                new User { Name = "Joe Doe", Age = 19, HomeCountry = "Germany" },    
                new User { Name = "Jenna Doe", Age = 19, HomeCountry = "Germany" },    
                new User { Name = "James Doe", Age = 8, HomeCountry = "USA" },    
            };    
            var usersGroupedByCountry = users.GroupBy(user => user.HomeCountry);    
            foreach(var group in usersGroupedByCountry)    
            {    
                Console.WriteLine("Users from " + group.Key + ":");    
                foreach(var user in group)    
                        Console.WriteLine("* " + user.Name);
            }    
        }    

        public class User    
        {    
            public string Name { get; set; }    

            public int Age { get; set; }    

            public string HomeCountry { get; set; }    
        }    
    }    
}

The resulting output will look something like this:

Users from USA:
* John Doe
* Jane Doe
* James Doe
Users from Germany:
* Joe Doe
* Jenna Doe

The example might seem a bit long, but as you'll soon realize, most of it is just preparing the data source. Remember that all of the data could just as well come from an XML document or a database - it's just easier to demonstrate with an object data source that you can use as-is.

The interesting part is when we create the usersGroupedByCountry variable. We make it by calling the GroupBy() method on our data source, supplying the parameter we want to group the data by. In this case, I want the users grouped by their home country, so that's the property I supply to the GroupBy() method. The result is an object with a Key property, which holds the value of the property we grouped by (HomeCountry, in this case), as well as all the objects which belong to the group. We use that in the next lines to iterate over the groups we just created, and for each group, we print the Key (HomeCountry) and then iterate and print all the User objects from the group.

Custom group keys

As you can see, grouping by an existing property is easy-peasy, but as you might have come to know by now, the LINQ methods are very flexible. It's just as simple to create your own, custom groups, based on whatever you like - an example of that could be the following, where we create groups based on the first two letters of the user's name:

using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqGroup
{
    class Program
    {
        static void Main(string[] args)
        {
            var users = new List<User>()
            {
                new User { Name = "John Doe", Age = 42, HomeCountry = "USA" },
                new User { Name = "Jane Doe", Age = 38, HomeCountry = "USA" },
                new User { Name = "Joe Doe", Age = 19, HomeCountry = "Germany" },
                new User { Name = "Jenna Doe", Age = 19, HomeCountry = "Germany" },
                new User { Name = "James Doe", Age = 8, HomeCountry = "USA" },
            };
            var usersGroupedByFirstLetters = users.GroupBy(user => user.Name.Substring(0, 2));
            foreach(var group in usersGroupedByFirstLetters)
            {
                Console.WriteLine("Users starting with " + group.Key + ":");
                foreach(var user in group)
                    Console.WriteLine("* " + user.Name);
            }
        }

        public class User
        {
            public string Name { get; set; }

            public int Age { get; set; }

            public string HomeCountry { get; set; }
        }
    }
}

We simply call the Substring() method on the name, to get the two first letters, and then LINQ creates the groups of users based on it. The result will look something like this:

Users starting with Jo:
* John Doe
* Joe Doe
Users starting with Ja:
* Jane Doe
* James Doe
Users starting with Je:
* Jenna Doe

So as you can see, we are free to call a method inside the GroupBy() method - in fact, we can do pretty much whatever we want in there, as long as we return something that LINQ can use to group the items. We can even create a method which returns a new piece of information about the item and then use it to create a group, as we do in the next example:

using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqGroup
{
    class Program
    {
        static void Main(string[] args)
        {
            var users = new List<User>()
            {
                new User { Name = "John Doe", Age = 42, HomeCountry = "USA" },
                new User { Name = "Jane Doe", Age = 38, HomeCountry = "USA" },
                new User { Name = "Joe Doe", Age = 19, HomeCountry = "Germany" },
                new User { Name = "Jenna Doe", Age = 19, HomeCountry = "Germany" },
                new User { Name = "James Doe", Age = 8, HomeCountry = "USA" },
            };
            var usersGroupedByAgeGroup = users.GroupBy(user => user.GetAgeGroup());
            foreach(var group in usersGroupedByAgeGroup)
            {
                Console.WriteLine(group.Key + ":");
                foreach(var user in group)
                    Console.WriteLine("* " + user.Name + " [" + user.Age + " years]");
            }
        }

        public class User
        {
            public string Name { get; set; }

            public int Age { get; set; }

            public string HomeCountry { get; set; }

            public string GetAgeGroup()
            {
                if (this.Age < 13)
                    return "Children";
                if (this.Age < 20)
                    return "Teenagers";
                return "Adults";
            }
        }
    }
}

Notice how I have implemented a GetAgeGroup() method on the User class. It returns a string that defines the age group of the user and we simply call it in the GroupBy() method to use it as a group key. The result will look like this:

Adults:
* John Doe [42 years]
* Jane Doe [38 years]
Teenagers:
* Joe Doe [19 years]
* Jenna Doe [19 years]
Children:
* James Doe [8 years]

I choose to implement the GetAgeGroup() method on the User class, because it might be useful in other places, but sometimes you just need a quick piece of logic to create the groups, not to be re-used elsewhere. In those situations, you are free to supply the logic directly to the GroupBy() method, as a lambda expression, like this:

var usersGroupedByAgeGroup = users.GroupBy(user =>
            {
                if (user.Age < 13)
                    return "Children";
                if (user.Age < 20)
                    return "Teenagers";
                return "Adults";
            });

The result is of course the same!

Grouping by a composite key

So far, the keys of our groups have just been a single value, e.g. a property or the result of a method call. However, you are free to create your own keys which contain several values - these are called composite keys. A usage example could be if we wanted to group our users based on both their home country and their age, like this:

using System;
using System.Collections.Generic;
using System.Linq;

namespace LinqGroup2
{
    class Program
    {
        static void Main(string[] args)
        {
            var users = new List<User>()
            {
                new User { Name = "John Doe", Age = 42, HomeCountry = "USA" },
                new User { Name = "Jane Doe", Age = 38, HomeCountry = "USA" },
                new User { Name = "Joe Doe", Age = 19, HomeCountry = "Germany" },
                new User { Name = "Jenna Doe", Age = 19, HomeCountry = "Germany" },
                new User { Name = "James Doe", Age = 8, HomeCountry = "USA" },
            };

            var usersGroupedByCountryAndAge = users.GroupBy(user => new { user.HomeCountry, user.Age });
            foreach(var group in usersGroupedByCountryAndAge)
            {
                Console.WriteLine("Users from " + group.Key.HomeCountry + " at the age of " + group.Key.Age + ":");
                foreach (var user in group)
                    Console.WriteLine("* " + user.Name + " [" + user.Age + " years]");
            }
        }

        public class User
        {
            public string Name { get; set; }

            public int Age { get; set; }

            public string HomeCountry { get; set; }

        }
    }
}

Notice the syntax we use in the GroupBy() method - instead of supplying a single property, we create a new anonymous object, which contains the HomeCountry and the Age properties. LINQ will now create groups based on these two properties and attach the anonymous object to the Key property of the group. We are free to use both properties when we iterate over the groups, as you can see. The result will look something like this:

Users from USA at the age of 42:
* John Doe [42 years]
Users from USA at the age of 38:
* Jane Doe [38 years]
Users from Germany at the age of 19:
* Joe Doe [19 years]
* Jenna Doe [19 years]
Users from USA at the age of 8:
* James Doe [8 years]

As always, we have used the LINQ Method syntax through this article, but allow me to supply you with a comparison example on how it could be done with the LINQ Query syntax:

// Method syntax
var usersGroupedByCountryAndAge = users.GroupBy(user => new { user.HomeCountry, user.Age });
// Query syntax
var usersGroupedByCountryAndAgeQ = from user in users group user by new { user.HomeCountry, user.Age } into userGroup select userGroup;

Summary

As you can probably see from the examples in this article, the GroupBy() method of LINQ is extremely powerful. It really allows you to use your data in new ways, with very little code. Previously this would either be very cumbersome or require a relational database, but with LINQ, you can use whatever data source you'd like and still get the same, easy-to-use functionality.