Flexible Reporting with LINQ and C# 4.0 dynamic keyword

July 10, 2011

It’s commonly very difficult to question business people about reporting requirements. It’s not really their fault either – they just can’t know exactly what they want until they’re trying to answer a question and can’t easily do it with the reports you’ve given them.

This is why it’s good to make reports as flexible and updateable as possible, but with as little developer required to update the reports as possible.

If you’re operating in an environment where all database access must be via stored procedures, this is a really big problem. It’s really unlikely that the changes requested by business can be implemented with the same stored procedure you naïvely created for your first attempt. I’ve seen scenarios where a database has stored procedures with the suffixes GetReport, GetReport2, GetReport3, GetReport4, etc. Yuck.

Even if you’re using Entity Framework, LINQ to SQL, or some other data layer framework that enables more free-form access to the database, you can’t always ensure that all report queries will result in good execution costs and actually be performant.

This is why it can sometimes be a good idea to perform a very basic database query (via stored procedure if necessary) to get a base set of data, and then perform more conditional operations on it in memory with LINQ. It’s a pain to do a “Name Contains” filter in a stored procedure (especially if there are a dozen other options) but with LINQ it’s no big deal.

1
2
3
4

IEnumerable data = GetBaseData();
if (!String.IsNullOrEmpty(nameFilter))
data = data.Where(d => d.Name.IndexOf(nameFilter, StringComparison.OrdinalIgnoreCase) >= 0);

This is really great for simple filters, but gets difficult when we want to do more complex grouping and aggregating functions, such as grouping by Hourly/Daily/Weekly/Monthly and/or by other data points.

The remainder of this article will show how this can be done with static code, and then how we can drastically increase the maintainability of this same code by employing the dynamic keyword introduced in C# 4.0.

The .NET 3.5 Way

Let’s consider the following data type. It represents a simplified sort of reporting data you might see from an ad platform that is utilized on the web and also on mobile devices (iOS and Android).

1
2
3
4
5
6
7
8
9

public class ReportData
{
public DateTime Date { get; set; }
public string Platform { get; set; }
public string Size { get; set; }
public int Views { get; set; }
public int Clicks { get; set; }
}

We want the ability to do any of these things independently or together:

  • Group Date by Hour, Day, Week, Hour, or an overall Summary
  • Group all platforms together
  • Group all sizes together

Of course, no matter what we do, we will always sum the Views and Clicks.

Let’s start with some code to generate quite a bit of random data and then show it to us:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

class Program
{
static void Main(string[] args)
{
List data = CreateRandomData();
Report("Raw Data, Top 10", data.Take(10));
Console.ReadLine();
}
private static List CreateRandomData()
{
List data = new List();
Random rand = new Random();
DateTime start = new DateTime(2011,1,1);
DateTime end = new DateTime(2011, 7, 1);
for (DateTime dt = start; dt < end; dt = dt.AddHours(1))
{
foreach (string platform in new string[] { "Web", "iOS", "Android" })
{
foreach (string size in new string[] { "Standard", "Enhanced" })
{
int views = rand.Next(0, 50);
int clicks = rand.Next(0, views / 3);
data.Add(new ReportData
{
Date = dt,
Platform = platform,
Size = size,
Views = views,
Clicks = clicks
});
}
}
}
Console.WriteLine("Generated {0} Sample Rows", data.Count);
return data;
}
private static void Report(string label, IEnumerable data)
{
Console.WriteLine();
Console.WriteLine(label);
string format = "{0:MM/dd/yyyy HH:mm} {1,-8} {2,-8} {3,6} {4,6}";
Console.WriteLine(format, "Date ", "Platform", "Size", "Views", "Clicks");
foreach (ReportData d in data)
Console.WriteLine(format,d.Date, d.Platform, d.Size, d.Views, d.Clicks);
}
}

This results in the following output. (The Views and Clicks will be different each time because they are generated randomly.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14

Generated 26064 Sample Rows
Raw Data, Top 10
Date Platform Size Views Clicks
01/01/2011 00:00 Web Standard 15 2
01/01/2011 00:00 Web Enhanced 34 1
01/01/2011 00:00 iOS Standard 36 2
01/01/2011 00:00 iOS Enhanced 20 5
01/01/2011 00:00 Android Standard 44 8
01/01/2011 00:00 Android Enhanced 29 4
01/01/2011 01:00 Web Standard 45 9
01/01/2011 01:00 Web Enhanced 30 7
01/01/2011 01:00 iOS Standard 25 5
01/01/2011 01:00 iOS Enhanced 47 9

So how would we do this filtering statically? For each item we want to group by, we have to do a combination GroupBy-Select that groups on everything ELSE.

Here is the example for Platform and Size, which are a lot simpler than for Date.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

private static IEnumerable ProcessVersion1(IEnumerable data, DateMode dateMode, bool platform, bool size)
{
if(platform)
{
data = data.GroupBy(d => new
{
Date = d.Date,
Size = d.Size,
})
.Select(g => new ReportData
{
Date = g.Key.Date,
Platform = "All",
Size = g.Key.Size,
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
}
if (size)
{
data = data.GroupBy(d => new
{
Date = d.Date,
Platform = d.Platform,
})
.Select(g => new ReportData
{
Date = g.Key.Date,
Platform = g.Key.Platform,
Size = "All",
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
}
// Date to come later
}

So to group all the Platforms together, we must group by Date and Size, then emit new data items that have “All” for the Platform, emit the Date and Size from the grouping key, and sum the Views and Clicks. To group all Sizes together, we must group by Date and Platform, then emit new data items that have “All” for the Size, emit the Date and Platform from the grouping key, and sum the Views and Clicks.

In short, to group an axis together, you must (for the most part) focus on all of the OTHER axes.

It gets worse for a Date grouping:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84

public enum DateMode
{
Hourly,
Daily,
Weekly,
Monthly,
Summary,
}
private static IEnumerable ProcessVersion1(IEnumerable data, DateMode dateMode, bool platform, bool size)
{
// Platform and Size, handled previously
switch (dateMode)
{
case DateMode.Hourly:
// No need to modify the data, we're assuming it comes from
// the database already grouped by Hour
break;
case DateMode.Daily:
data = data.GroupBy(d => new
{
Date = d.Date.Date,
Platform = d.Platform,
Size = d.Size
})
.Select(g => new ReportData
{
Date = g.Key.Date,
Platform = g.Key.Platform,
Size = g.Key.Size,
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
break;
case DateMode.Weekly:
data = data.GroupBy(d => new
{
Date = d.Date.Date.AddDays(-((int)d.Date.Date.DayOfWeek)),
Platform = d.Platform,
Size = d.Size
})
.Select(g => new ReportData
{
Date = g.Key.Date,
Platform = g.Key.Platform,
Size = g.Key.Size,
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
break;
case DateMode.Monthly:
data = data.GroupBy(d => new
{
Date = new DateTime(d.Date.Year, d.Date.Month, 1),
Platform = d.Platform,
Size = d.Size
})
.Select(g => new ReportData
{
Date = g.Key.Date,
Platform = g.Key.Platform,
Size = g.Key.Size,
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
break;
case DateMode.Summary:
data = data.GroupBy(d => new
{
Platform = d.Platform,
Size = d.Size
})
.Select(g => new ReportData
{
Date = DateTime.MinValue,
Platform = g.Key.Platform,
Size = g.Key.Size,
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
break;
}
return data;
}

Can you imagine how much of a mess it would be if we had to add a new property to this mix? If we were to add another property that was similar to Platform or Size, we would need to reference it twice for each of the 4 date cases (in the GroupBy and in the Select) and then twice each for the existing Platform and Size groupings, not to mention its own grouping logic. That’s 12 edit points needed to add one property!

That simply is not maintainable, and there must be a better way.

One answer to the Stack Overflow question Dynamic LINQ GroupBy Multiple Columns suggests a way that would make this still possible without using dynamic code. It involves making an EntryGrouper class that implements IEquatable and serves as the grouping key for the GroupBy operations. While this method should work, it is by no means short, and would have to be re-implemented for every new use case. The use of the dynamic keyword I present can be reused over and over.

Introducing the dynamic keyword

C# 4.0 introduces the dynamic keyword (and indeed a whole new dynamic runtime) that enables us to declare any properties we want on an object, and it is not evaluated until runtime.

Now, this isn’t completely magical – you could cast any object to a dynamic and it would compile, but at runtime there must be some mechanism to provide the values and methods you try to use.

Most of the time, we will use System.Dynamic.ExpandoObject to accomplish this for us. ExpandoObject by itself is another statically-typed object that we cannot declare random properties on. Only when we assign it to a dynamic variable does it get this power, but under the hood, the ExpandoObject essentially stores values in a Dictionary, and we can exploit that for our purposes.

By itself, ExpandoObject does not override GetHashCode() or Equals() from the base object, and LINQ’s GroupBy method needs this in order to do its work. So first, we must create a wrapper that enables us to use the power of the dynamic keyword, compute equivalence and a hash code, and enable us to get back at the original data for the Select afterwards.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

public class DynamicHashWrapper
{
public dynamic Value { get; private set; }
private Dictionary dict;
private HashSet keys;
private static IEqualityComparer> keyComparer = HashSet.CreateSetComparer();
public DynamicHashWrapper(dynamic value)
{
this.Value = value;
dict = new Dictionary(value as IDictionary);
keys = new HashSet(dict.Keys);
}
private int? hashCode;
public override int GetHashCode()
{
if (!this.hashCode.HasValue)
{
int code = 0;
foreach (var entry in dict.OrderBy(p => p.Key))
code ^= entry.Value.GetHashCode();
this.hashCode = code;
}
return this.hashCode.Value;
}
public override bool Equals(object obj)
{
if (!(obj is DynamicHashWrapper))
return false;
DynamicHashWrapper other = obj as DynamicHashWrapper;
return keyComparer.Equals(this.keys, other.keys)
&& this.GetHashCode() == other.GetHashCode();
}
}

After creating a dynamic variable, we can wrap it with this class. It does the following:

  • Stores the dynamic value itself so we can retrieve its properties back later.
  • Converts the dynamic ExpandoObject to a Dictionary that we can enumerate.
  • Enables computation of a hash code by XOR-ing the values of the dictionary.
  • Determines if two values are equal by ensuring that the dictionary sizes are equal and that the hash codes are equal.

With this addition to the dynamic object, we can now re-implement our data processing routine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

private static IEnumerable ProcessVersion2(IEnumerable data, DateMode dateMode, bool platform, bool size)
{
return data
.GroupBy(d =>
{
dynamic key = new ExpandoObject();
switch (dateMode)
{
case DateMode.Hourly: key.Date = d.Date; break;
case DateMode.Daily: key.Date = d.Date.Date; break;
case DateMode.Weekly: key.Date = d.Date.Date.AddDays(-((int)d.Date.DayOfWeek)); break;
case DateMode.Monthly: key.Date = new DateTime(d.Date.Year, d.Date.Month, 1); break;
case DateMode.Summary: key.Date = DateTime.MinValue; break;
}
if (!platform)
key.Platform = d.Platform;
if (!size)
key.Size = d.Size;
return new DynamicHashWrapper(key);
})
.Select(g => new ReportData
{
Date = g.Key.Value.Date,
Platform = platform ? "All" : g.Key.Value.Platform,
Size = size ? "All" : g.Key.Value.Size,
Views = g.Sum(d => d.Views),
Clicks = g.Sum(d => d.Clicks)
});
}

Now we have a single GroupBy and a single Select. By the way, grouping and selecting only once means we should get a performance improvement because we aren’t generating a bunch of intermediate objects each time we do a grouping – we create one dynamic grouping key and spin through the collection only once. Of course, the dynamic object probably has its own overhead from not being statically compiled, but the improvement in maintainability is WELL worth that small setback.

As a comparison, this new implementation takes 33 lines of code, whereas the original was 111 lines. Who wants to try to maintain a 111-line method!?

Now we can try a bunch of different groupings and see them in action.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

static void Main(string[] args)
{
List data = CreateRandomData();
Report("Raw Data, Top 10", data.Take(10));
Report("Group Platform", Process(data, DateMode.Hourly, true, false).Take(10));
Report("Group Size", Process(data, DateMode.Hourly, false, true).Take(10));
Report("Group Platform & Size", Process(data, DateMode.Hourly, true, true).Take(10));
Report("Group All by Day", Process(data, DateMode.Daily, true, true).Take(10));
Report("Group All by Week", Process(data, DateMode.Weekly, true, true).Take(10));
Report("Group All by Month", Process(data, DateMode.Monthly, true, true).Take(10));
Report("Group All Summary", Process(data, DateMode.Summary, true, true).Take(10));
Report("Summary By Types", Process(data, DateMode.Summary, false, false).Take(10));
Console.ReadLine();
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93

Generated 26064 Sample Rows
Raw Data, Top 10
Date Platform Size Views Clicks
01/01/2011 00:00 Web Standard 15 3
01/01/2011 00:00 Web Enhanced 45 5
01/01/2011 00:00 iOS Standard 9 1
01/01/2011 00:00 iOS Enhanced 6 1
01/01/2011 00:00 Android Standard 0 0
01/01/2011 00:00 Android Enhanced 7 0
01/01/2011 01:00 Web Standard 33 7
01/01/2011 01:00 Web Enhanced 1 0
01/01/2011 01:00 iOS Standard 28 7
01/01/2011 01:00 iOS Enhanced 25 5
Group Platform
Date Platform Size Views Clicks
01/01/2011 00:00 All Standard 24 4
01/01/2011 00:00 All Enhanced 58 6
01/01/2011 01:00 All Standard 105 14
01/01/2011 01:00 All Enhanced 49 8
01/01/2011 02:00 All Standard 133 13
01/01/2011 02:00 All Enhanced 36 6
01/01/2011 03:00 All Standard 95 16
01/01/2011 03:00 All Enhanced 94 25
01/01/2011 04:00 All Standard 58 14
01/01/2011 04:00 All Enhanced 78 3
Group Size
Date Platform Size Views Clicks
01/01/2011 00:00 Web All 60 8
01/01/2011 00:00 iOS All 15 2
01/01/2011 00:00 Android All 7 0
01/01/2011 01:00 Web All 34 7
01/01/2011 01:00 iOS All 53 12
01/01/2011 01:00 Android All 67 3
01/01/2011 02:00 Web All 55 0
01/01/2011 02:00 iOS All 42 10
01/01/2011 02:00 Android All 72 9
01/01/2011 03:00 Web All 53 11
Group Platform & Size
Date Platform Size Views Clicks
01/01/2011 00:00 All All 82 10
01/01/2011 01:00 All All 154 22
01/01/2011 02:00 All All 169 19
01/01/2011 03:00 All All 189 41
01/01/2011 04:00 All All 136 17
01/01/2011 05:00 All All 142 17
01/01/2011 06:00 All All 150 15
01/01/2011 07:00 All All 181 33
01/01/2011 08:00 All All 143 22
01/01/2011 09:00 All All 208 34
Group All by Day
Date Platform Size Views Clicks
01/01/2011 00:00 All All 3756 552
01/02/2011 00:00 All All 3370 479
01/03/2011 00:00 All All 3537 503
01/04/2011 00:00 All All 3298 469
01/05/2011 00:00 All All 3605 573
01/06/2011 00:00 All All 3617 518
01/07/2011 00:00 All All 3193 450
01/08/2011 00:00 All All 3485 513
01/09/2011 00:00 All All 3629 504
01/10/2011 00:00 All All 3628 533
Group All by Week
Date Platform Size Views Clicks
12/26/2010 00:00 All All 3756 552
01/02/2011 00:00 All All 24105 3505
01/09/2011 00:00 All All 24585 3495
01/16/2011 00:00 All All 24618 3553
01/23/2011 00:00 All All 24875 3471
01/30/2011 00:00 All All 24845 3634
02/06/2011 00:00 All All 24528 3391
02/13/2011 00:00 All All 24561 3467
02/20/2011 00:00 All All 24280 3621
02/27/2011 00:00 All All 24614 3507
Group All by Month
Date Platform Size Views Clicks
01/01/2011 00:00 All All 109101 15593
02/01/2011 00:00 All All 98057 14078
03/01/2011 00:00 All All 108231 15308
04/01/2011 00:00 All All 106527 15286
05/01/2011 00:00 All All 110096 15742
06/01/2011 00:00 All All 106354 14982
Group All Summary
Date Platform Size Views Clicks
01/01/0001 00:00 All All 638366 90989
Summary By Types
Date Platform Size Views Clicks
01/01/0001 00:00 Web Standard 105615 14983
01/01/0001 00:00 Web Enhanced 107740 15448
01/01/0001 00:00 iOS Standard 107203 15417
01/01/0001 00:00 iOS Enhanced 106457 15117
01/01/0001 00:00 Android Standard 104772 14840
01/01/0001 00:00 Android Enhanced 106579 15184

Summary and Code Download

With the static alternatives, we get a lot of spaghetti code that is impossible to maintain, and even if it was, the many GroupBy/Select iterations required for multiple axes result in unnecessary object creation, loops through the data collection, and eventual garbage collection.

Using the dynamic keyword and a small, reusable trick to enable the dynamic object to generate meaningful hash codes, we can loop through the data collection once. Our results are just as good, and the next time someone visits this code, they’re much less likely to screw it up.

The code in this article is fairly fragmented. DynamicReporting.cs - Click here to download the full source code.


Comments: