Source code here, PDF version of this article here
Technical specs
Our code is produced using Visual Studio 2008 Team Suite, C# 3.0, .NET Framework 3.5.
We created a Windows project, Console Application. To use LINQ features you have to make sure of two things:
1. System.Core must be included in your project’s References
2. using System.Linq; must be in your using statements (at the beginning of the source code)
We are going to explore some of the most useful LINQ operators, using a List<int> for most of our examples.
Note that all LINQ queries don’t work inline: they return a new IEnumerable, which you have to store somewhere. This is especially useful when using multithreading, because the old collection you are manipulating remains intact during the entire execution of your query.
Plus, the IEnumerable returned by a LINQ query is like an “abstract object”, it is not materialized (lazy evaluation, does it ring a bell?); if you want to concretize your query (which means execute the computations in it and effectively store the result) you have to use ToList() or an equivalent (ToDictionary(), ToArray()). Calling ToList()materializes the query storing it in memory.
LINQ operators
A complete list of LINQ operators follows:
|
Operator type |
Operator name |
|
Aggregation |
Aggregate, Average, Count, LongCount, Max, Min, Sum |
|
Conversion |
Cast, OfType, ToArray, ToDictionary, ToList, ToLookup, ToSequence |
|
Element |
DefaultIfEmpty, ElementAt, ElementAtOrDefault, First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault |
|
Equality |
EqualAll |
|
Generation |
Empty, Range, Repeat |
|
Grouping |
GroupBy |
|
Joining |
GroupJoin, Join |
|
Ordering |
OrderBy, ThenBy, OrderByDescending, ThenByDescending, Reverse |
|
Partitioning |
Skip, SkipWhile, Take, TakeWhile |
|
Quantifiers |
All, Any, Contains |
|
Restriction |
Where |
|
Selection |
Select, SelectMany |
|
Set |
Concat, Distinct, Except, Intersect, Union |
Most of the operators should be familiar if you have ever worked with a relational database writing queries in SQL.
Range
Our first LINQ query will be about initializing a list. In the paragraph about C# 3.0 language extensions, we saw that we can initialize a list like this:
List<int> l = new List<int>() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
With LINQ, the initialization can be a lot less painful:
List<int> l =
Enumerable.Range(1, 10)
.ToList();
The operator Range generates a sequence of integral numbers within a specified range: it takes in input the first integer value of the sequence (1 in our case) and how many contiguous elements have to be in the sequence (10 in our case). The return value of Enumerable.Range is an IEnumerable, so if we want a List<int> we’ll have to use the LINQ operator ToList(). Since Enumerable.Range returns an IEnumerable of integers, the list returned by ToList() will be made of integers. Our list, l, will be made up of ten integers, from 1 to 10, as we wanted.
Remember that, thanks to the local variable type inference, we may also write:
var l =
Enumerable.Range(1, 10)
.ToList();
and the result is exactly the same.
Foreach
Suppose that we want to print our list of integers. To do so, we can use the LINQ operator ForEach: ForEach performs the specified action on each element of the sequence on which is called. The specified action can be a static function, an anonymous delegate, or (better) a lambda expression.
Using the static function your code will look like this:
static void PrintInt(int i)
{
Console.WriteLine(i);
}
static void Main(string[] args)
{
List<int> l = Enumerable.Range(1, 10).ToList();
l.ForEach(PrintInt);
}
Using an anonymous delegate:
static void Main(string[] args)
{
List<int> l = Enumerable.Range(1, 10).ToList();
l.ForEach(
delegate(int i)
{
Console.WriteLine(i);
}
);
}
Using a lambda expressions:
static void Main(string[] args)
{
List<int> l = Enumerable.Range(1, 10).ToList();
l.ForEach(i => Console.WriteLine(i));
}
The lambda expression we used, i => Console.WriteLine(i), takes an integer as input (i) and prints it on the Console, just like the static function or the anonymous delegate did, but with less code!
Aggregate
If we want to synthesize our list in only one element, then the Aggregate operator is exactly what we need. This operator takes two parameters:
· A seed, that is the starting value of our computation
· An accumulator function which takes
o the current seed value
o a list element
and computes the next value of the seed
Aggregate applies the accumulator function over the sequence (e.g., our List<int>); the initial accumulator value is the specified seed.
This operator can be used, for example, to find the maximum element of the list:
var max = l.Aggregate(int.MinValue, (seed, val) => Math.Max(seed, val));
The starting value of our computation (the initial seed) is int.MinValue, chose because any integer element will be greater than this.
The accumulator function is a lambda expression:
(seed, val) => Math.Max(seed, val)
Given the current seed value and a list element (called val), the next seed value will be the maximum between them: Math.Max(seed, val).
Let’s see another possible use of Aggregate:
var positives = l.Aggregate(true, (seed, val) => seed && (val > 0));
Positives contains a Boolean value, indicating whether or not all the elements in l are greater than zero. Why is that? The starting seed value is true; the accumulator function is the following lambda expression:
(seed, val) => seed && (val > 0)
You can see that, as soon as any list element is less-equal than zero, the seed becomes false forever (because false && “whatever” = false!). If the list does contain only positive integers the seed will remain always true.
Select
The Select operator projects each element of a sequence into a new form: it takes in input a transformation function, which will be applied to all the elements of the sequence, thus generating a new sequence (given as output). The new sequence’s elements can have a type completely different from the one of the initial sequence’s elements.
Let’s start with a very simple example: suppose that we want to create a new List<int>, containing the original elements increased by 1. Our LINQ query will be the following:
var l1 = l.Select(val => val + 1).ToList();
As you can see, the lambda expression used is:
val => val + 1
which takes an integer (val) in input and return as output the same integer increased by one (val + 1). After Select we have to call ToList() to concretize the query, because Select returns an IEnumerable.
But with Select we can do much more complicate stuff:
var l2 = l.Select(val => new { item = val, itemstr = val.ToString() }).ToList();
The above query produces a list (thanks to ToList(), otherwise it would be an IEnumerable), which elements have an anonymous type (C# 3.0 language extensions) made up by an integer called item and a string called itemstr:
val => new { item = val, itemstr = val.ToString() }
The integer contained in item is the same as in the original list, while in itemstr we put the string representation of such integer. What if we’d like to print this list?
l2.ForEach(i => Console.WriteLine(i.item + " – " + i.itemstr));
In the lambda expression used, the compiler knows the type of i, so when we write “i. ” the IntelliSense kicks in and suggests the public field of such type (item and itemstr).
Order By
Orderby’s purpose is pretty clear and simple: this operator sorts the elements of a sequence in ascending order according to a key. The key corresponding to each element is produced by a transformation function, the only input parameter of Orderby.
If we want to sort lexicographically our list (using the field itemstr), the query we need is the following:
var orderedList = l2.OrderBy(i => i.itemstr).ToList();
Given an element of the list (i) we extrapolate its key (itemstr), which is used for the sorting process. The resulting IEnumerable is then converted in a List with ToList() and saved in orderedList.
Group By
GroupBy groups the elements of a sequence according to a specified key selector function. Let’s see an example of this operator:
var groups = l2
.OrderBy(i => i.item % 3)
.GroupBy(i => i.item % 3)
.ToList();
What does this query do? It groups the elements contained in l2 according to the key extracted by the lambda expression i => i.item % 3. This means that each element is associated to a key, which is its “modulo 3” value. What is the type of groups?
Groups is a list of IGrouping elements; each of these elements is therefore a group. A group contains the group key (modulo 3 in our example) and all the elements in that group (that is, all the elements which modulo 3 is equal to the group key). We can print the groups with the following query:
groups.ForEach(
group => Console.WriteLine(
group.Key + " -> " +
group.Aggregate("", (str, i) => str + i.item + ", ")));
In this query you can a Foreach on the groups, which prints each group on the Console; the printing of the group elements, though, is a little tricky, and requires another query (specifically an Aggregate). The key of the group, instead, is easily accessible with group.Key.
You can see the result of this query in the following picture:
Each element is in the right group.
But there is an elephant in the room! All this time you may have been wondering why there is an OrderBy in our query:
var groups = l2
.OrderBy(i => i.item % 3)
.GroupBy(i => i.item % 3)
.ToList();
We need the OrderBy if we want our results to be sorted according to the groups’ keys. Removing the OrderBy from the query produces the following result:
As you can see, the results are not ordered according to the group key, since the “0” group is the last one.
Other operators (All, Any, Where, Skip, Take)
There are many other LINQ operators, and they are all very useful. You will find that they are not hard to learn; just in case, let’s see (very briefly this time!) other five operators.
All
This operator determines whether all elements of a sequence satisfy a condition (given as input, for example with a lambda expression). Let’s see a simple example:
var allPositives = l2.All(i => i.item > 0);
The above query determines whether all elements in l2 are positive integers (the condition is i.item > 0).
Any
This operator determines whether any element of a sequence satisfies a condition (given as input, for example with a lambda expression). Let’s see a simple example:
var anyPositive = l2.Any(i => i.item > 0);
The above query determines whether any element in l2 is a positive integer (the condition is i.item > 0).
Where
This operator filters a sequence of values based on a predicate (given as input, for example with a lambda expression). Let’s see a simple example:
var onlyPositives = l2.Where(i => i.item > 0);
The above query produces an IEnumerable containing only the positive elements of l2 (the condition is i.item > 0).
Skip
This operator bypasses a specified number of elements in a sequence and then returns the remaining items. Example:
var fromThird = l2.Skip(3);
The above query skips the first three elements in l2.
Take
This operator returns a specified number of contiguous elements from the start of a sequence. Example:
var first3 = l2.Take(3);
This query takes the first three elements in l2.
Manipulating infinite sequences
In this paragraph we are going to see how LINQ query operators can be used to manipulate infinite sequences (in particular we are going to consider lists).
Let’s start by writing an infinite sequence:
static IEnumerable<int> NaturalNumbers()
{
int n = 0;
while (true)
{
yield return n;
n++;
}
}
The sequence IEnumerable<int> NaturalNumbers “contains” (in a lazy evaluation kind of way) the natural numbers. Of course, no PC is capable of containing in memory all natural numbers: this collection is not materialized and it won’t be until someone requires it! The yield keyword works like a “pause button”: it makes possible keeping the collection frozen until someone needs the next element. At the beginning, the only element ready for use is the first, 0; all the others are waiting to be asked for.
This mechanism is very powerful, since it allows you to manipulate without any problem infinite sequences; when you want to, you can iterate the sequence as long as you need (just not forever!).
How can we use such sequence? How can we manipulate it?
var naturals = Naturals();
naturals does not occupy any memory: it’s not a List, it’s only an IEnumerable. Obviously, you can’t do this:
var naturals = Naturals().ToList();
otherwise you would go into infinite loop trying to materialize all the elements of naturals (which are infinite)!
We can use all the LINQ operators we want on the infinite sequence we built; the only constraint is that, at the end of our manipulations, we take only a finite number of values from the sequence. For example, suppose that we want to isolate the first ten multiples of 27; our code will look like this:
var multiples27 = naturalNumbers.Where(nat => nat % 27 == 0);
var first10multiples = multiples27.Skip(2).Take(10).ToList();
first10multiples.ForEach(n => Console.WriteLine(n));
In multiples27 we have the infinite sequence (not materialized, since we rightly didn’t called ToList()) containing the multiples of 27. This sequence has been built with a Where query on naturalNumbers, filtering only the values which modulo 27 was 0. Then we skipped the first two values (we do not want 0 and 27) and took the following 10 values. At this point (only after using a Take query!) we can materialize our query with ToList(), and then print the values on the Console; we can do so because after Take our sequence has a finite number of values. The result can be seen in the following picture:
This example, though nice, was pretty useless; let’s make something more interesting, like building the Fibonacci sequence. The Fibonacci sequence definition follows:
We can put the Fibonacci sequence in an infinite sequence with this code:
static IEnumerable<int> FibonacciNumbers()
{
int prev = 1, prevprev = 0;
while (true)
{
int curr = prev + prevprev;
yield return curr;
prevprev = prev;
prev = curr;
}
}
We specified the first two values of the Fibonacci sequence (0 and 1); then, at each step, we compute the next value (sum of prev and prevprev) and update both prevprev and prev (shifting them ahead along the sequence).
var fibo = FibonacciNumbers();
As we know, fibo is just an IEnumerable, it is not materialized, and we cannot call ToList() on it. We can call ToList() only after a Take query:
var fibo21 = fibo.Take(21).ToList();
fibo21.ForEach(n => Console.WriteLine(n));
This code takes the first 21 values of the Fibonacci sequence, and then prints them on the Console:
If you are wondering why you should care about infinite sequences, let me tell you that there are many uses for them, even in videogames:
· as random number generators
· as perlin noise generators
· as checkpoint generators
· as generators of enemies from a spawning point
· …