Monday, August 1, 2016

yield async - Oh the troubles

None of this really should come as anything new for you but as we know, c# has intrinsic support for the trusted Iterator Pattern baked right into the language (supported by lots of compiler magic). Basically it allows us to use the blood of unicorns yield return|break keywords to write methods that "appear" to be sequential executed code but are actually boiled away into a compiler generated type that has an equivalent state machine to your sequential logic. Pretty neat and having written more than a few iterators when dinosaurs roamed the datacenter .Net 1.0 was a thing, it's rather nice to have around. This isn't a new thing but still sexy when you think about it. Basically it allows us to 1. do potentially rather complex processing on arbitrary sequences of values (in .Net this sequence concept has always been represented by the IEnumerator interface) and 2. defer those steps until requested while, and here's the really important part IMHO, without requiring the entirety of the sequence to be present in memory at the same time. This is absolutely crucial for high performance or multiuser systems.

Here's a trite example that shows how the syntax works:

public IEnumerable<int> IterateFoos()
{
  foreach (var i in Enumerable.Range(1, 1000000))
  {
    yield return i;
  }
}

While the above code doesn't use anything of the sort when it comes to complexity in the iteration of items (which I've stated is nice to have) the real power here is that there's not 1000000 integers aggregated and then returned to the consumer (technically there's not even one integer until the result is iterated but that's beside the point). What if there was 10 million items? How about 100 million. How about a billion? And let's be fair, an integer is a pretty small object and as it's a value type, isn't (usually) allocated on the heap but do you really want millions or billions of them around before you start doing something with them?

Let's fast forward now to the new hotness: Async. This may or may not be something you're as familiar with but I feel it's rather safe to safe that continuation and asynchronous workflows the future of high level programming for a while to come. If you've not really got a great handle on them I will direct you to Stephen Cleary's excellent primer. To be fair, continuation based programming wasn't exactly impossible but it certainly wasn't easy to follow and was fairly easy to make mistakes when using. When TPL came along it certainly as a HUGE benefit to programmers but there was still the problem of ugly syntax and exception handling could be an annoying deal but was clearly better. Well the twin keywords async / await are similar for the programmer (not in why and how but in the type of syntax abstraction) in that they enabled you to clearly write a method in a form that is similar to a sequential method (including all that exception handling!) but wherein continuations could be built from your code when an await was encountered (the async keyword is there simply to give the compiler the hint to understand what to do with the syntax tree).

Another trite example:

public async Task<string> DoWorkReportAsync()
{
  var count = await DoCountAsync();
  return $"{count} items found";
}

For anyone that's done the whole APM BeingXXX/EndXXX style programming, this really is a similar type of compiler voodoo akin to what you see in iterators. In the above example what this is basically saying to  the caller will be returned a promise (a Future in patterns talk) to return this stirng report (that's the intention of Task<String> return type). Internally what we've got is code that boils down to running a task with a continuation on the result that creates a string report that is eventually able to fulfill the original promise to the caller. Capiche?

Good as I don't want to get bogged down in this discussion with fundamentals.

So it doesn't take long before someone is going to try this though:

public async Task<IEnumerable<String>> IterateDoWorkReportsAsync()
{
  foreach (var i in Enumerable.Range(1, 1000000))
  {
    var count = await DoCountAsync();
    yield return return $"{count} items found";
  }
}

Seems totally logical here, right? Hey I want to perform some work, generally involving a long wait time (think some type of IO where a thread sitting around blocking for results is Teh Sux though this is just as applicable for a scatter gather pattern as well), and once we get that result, use the iterator to handle each item, potentially with the complex logic, to the consumer.  Well I'm going to tell you that doesn't work. And I am not just referring to the fact it doesn't compile (which it doesn't) but more importantly that it doesn't even make conceptual sense (though your underlying wants are spot on).

So before I dig further into why this isn't even wrong, can we somehow make this compile, and likely in the process end up with some Frankenstein's monster of mishmashed parts, that will get the job done? Well you might be tempted to do this:

public async Task<IEnumerable<String>> IterateDoWorkReportsAsync()
{
  var results = new List<String>();
  foreach (var i in Enumerable.Range(1, 1000000))
  {
    var count = await DoCountAsync();
    results.Add($"{count} items found");
  }
  return results;
}

or even worse this:
public IEnumerable<String> IterateDoWorkReportsAsync()
{
  foreach (var i in Enumerable.Range(1, 1000000))
  {
    var count = DoCountAsync().Result;
    yield return $"{count} items found");
  }
}

and you will find it works. As in you get an awaited call (or in the 2nd example, a stalled thread so WTF even bother?) that then vomits up a big fat set of data that's iterated in RAM and you know...perhaps that's good "enough". I mean, if you have a problem where the set is by definition constrained to be small, it's honestly nothing you need to worry about.  Yeah it'd be nice if we could return an iterator for them but it's just really sidestepping the issue, right?

No. It's actually not sidestepping anything. You're actually building something else (which may be perfectly fine, after all, there's multiple ways to solve a problem and our job as engineers is to deal with things like ROI). This all boils down to your original assumptions and statement of intention.

In the original code examples, it states, "I want a state machine that performs this logic on a source, one at a time, when asked for it by the consumer" and then the async example in turn states, "At some point in the future, I will resume the intention of this code block and eventually make good on the future promise of a result". That is absolutely NOT what the code in the following examples is stating whatsoever. It says, "AT SOME POINT IN THE FUTURE I GIVE YOU A SEQUENCE AND THAT'S IT". Granted, the caller isn't aware of all that but I'm discussing this from the perspective of you the author. So what we need to do is flip the script, proverbially, and use the correct form of statement of intent (really this is just an example of Intention Revealing Interface from DDD) and you get something like this instead:

public IEnumerable<Task<String>> IterateDoWorkReports()
{
  foreach (var i in Enumerable.Range(1, 1000000))
  {
    var task = DoWorkReportAsync();
    yield return task;
  }
}
private async Task<String> DoWorkReportAsync() { var count = await DoCountAsync(); results.Add($"{count} items found"); }

See the intention revealing difference? I'm using an iterator here to invert the control to the consumer where we're the individual results we're returning is a future (the Task) that eventually will fulfill the promise of the result. In addition, by separating concerns (state machine iterator from the asynchronous logic) I'm going to go out on a limb here and state the code is probably going to be a hell of a lot more understandable and maintainable in the future (especially at 3AM when that logic is far far more complex). Also a subtle point is the names on the public API. It's not an async suffic because it's returning a Task when you ask it to, pretty much immediately (just as long as it takes for the creation and enqueue to take.

By doing this, it allows your callers also to be in control of wonderful things like potential concurrency because you can absolutely make decisions like run a partitioning algorithm to chunk sets of Tasks and then await them (or some of them - see future posts on some interesting ways you can process work as it finishes instead of in-order) or how many to run (remember we're an iterator and our execution is deferred so the consumer can stop at any point).

Now I am going to leave you our intrepid readers to find the limitation in the above outlined approach here as a homework exercise. I'll be back soon in the future post to show you how to bridge the gaps and where the icky parts still are, or at least are until they close this hole language features.

Wednesday, October 28, 2015

Making Entity Framework Not Suck-Overview

OK this is going to assume that you are at least familiar with Entity Framework (if not then really climb out from under a rock and go read up). In short, this is ORM and Query Provider wrapped up in one for use with managing the interaction of model code with a relational datastore. This isn't the only full(ish) featured ORM for .Net nor the first nor anything of a claim of "the best", whatever that actually means. However it is common and well supported (as a Microsoft product) but after a while, there's going to be a point where nontrivial solutions are going to run right up to the (frequent) areas where they didn't think out any extension points.  In this series we'll be specifically looking to add tools to your kit that enable more flexible and supple designs and perhaps open up your thoughts to OO patterns that help manage complexity (in the context of a solution of using EF).

It's assumed we're dealing with an Entity Framework v6.1 or higher environment throughout this series. (v7 is a complete ground up rewrite so I'm sure we'll get our own new gotchas to cover).


  1. DomainEvents to the rescue for entity change notifications.
    AKA
    Making My Code That Cares About Orthogonal Concerns, Orthogonal
  2. DomainEvents Redux. Understand what's changed, not just when something's changed.
  3. CodeContracts, ISupportInitialize, and EF

Wednesday, September 16, 2015

THE BLOG IS DEAD (again)! LONG LIVE THE BLOG!

So it's been more than a few years now that I've left the warm comfy confines of Microsoft at the mothership working as a Technology Architect at AdsBI. Unknown, but admittedly should have been expected, they'd eventually notice  I didn't work there still and shut down access to my MSDN blog. :)

So here we are with a brand new shiny blog that I can thrill you bore you to death with! Likely this will eventually branch out to more than just technology but if so, I will be sure to come up with a right brain-left brain tagging mechanism to denote.

Cheers!