Here's a trite example that shows how the syntax works:
public IEnumerable<int> IterateFoos() { foreach (var i in Enumerable.Range(1, 1000000)) { yield return i; } }
While the above code doesn't use anything of the sort when it comes to complexity in the iteration of items (which I've stated is nice to have) the real power here is that there's not 1000000 integers aggregated and then returned to the consumer (technically there's not even one integer until the result is iterated but that's beside the point). What if there was 10 million items? How about 100 million. How about a billion? And let's be fair, an integer is a pretty small object and as it's a value type, isn't (usually) allocated on the heap but do you really want millions or billions of them around before you start doing something with them?
Let's fast forward now to the new hotness: Async. This may or may not be something you're as familiar with but I feel it's rather safe to safe that continuation and asynchronous workflows the future of high level programming for a while to come. If you've not really got a great handle on them I will direct you to Stephen Cleary's excellent primer. To be fair, continuation based programming wasn't exactly impossible but it certainly wasn't easy to follow and was fairly easy to make mistakes when using. When TPL came along it certainly as a HUGE benefit to programmers but there was still the problem of ugly syntax and exception handling could be an annoying deal but was clearly better. Well the twin keywords async / await are similar for the programmer (not in why and how but in the type of syntax abstraction) in that they enabled you to clearly write a method in a form that is similar to a sequential method (including all that exception handling!) but wherein continuations could be built from your code when an await was encountered (the async keyword is there simply to give the compiler the hint to understand what to do with the syntax tree).
Another trite example:
public async Task<string> DoWorkReportAsync()
{ var count = await DoCountAsync(); return $"{count} items found"; }
For anyone that's done the whole APM BeingXXX/EndXXX style programming, this really is a similar type of compiler voodoo akin to what you see in iterators. In the above example what this is basically saying to the caller will be returned a promise (a Future in patterns talk) to return this stirng report (that's the intention of Task<String> return type). Internally what we've got is code that boils down to running a task with a continuation on the result that creates a string report that is eventually able to fulfill the original promise to the caller. Capiche?
Good as I don't want to get bogged down in this discussion with fundamentals.
So it doesn't take long before someone is going to try this though:
public async Task<IEnumerable<String>> IterateDoWorkReportsAsync() { foreach (var i in Enumerable.Range(1, 1000000)) { var count = await DoCountAsync(); yield return return $"{count} items found"; } }
Seems totally logical here, right? Hey I want to perform some work, generally involving a long wait time (think some type of IO where a thread sitting around blocking for results is Teh Sux though this is just as applicable for a scatter gather pattern as well), and once we get that result, use the iterator to handle each item, potentially with the complex logic, to the consumer. Well I'm going to tell you that doesn't work. And I am not just referring to the fact it doesn't compile (which it doesn't) but more importantly that it doesn't even make conceptual sense (though your underlying wants are spot on).
So before I dig further into why this isn't even wrong, can we somehow make this compile, and likely in the process end up with some Frankenstein's monster of mishmashed parts, that will get the job done? Well you might be tempted to do this:
public async Task<IEnumerable<String>> IterateDoWorkReportsAsync() { var results = new List<String>(); foreach (var i in Enumerable.Range(1, 1000000)) { var count = await DoCountAsync(); results.Add($"{count} items found"); } return results; }
or even worse this:
public IEnumerable<String> IterateDoWorkReportsAsync() { foreach (var i in Enumerable.Range(1, 1000000)) { var count = DoCountAsync().Result; yield return $"{count} items found"); } }
and you will find it works. As in you get an awaited call (or in the 2nd example, a stalled thread so WTF even bother?) that then vomits up a big fat set of data that's iterated in RAM and you know...perhaps that's good "enough". I mean, if you have a problem where the set is by definition constrained to be small, it's honestly nothing you need to worry about. Yeah it'd be nice if we could return an iterator for them but it's just really sidestepping the issue, right?
No. It's actually not sidestepping anything. You're actually building something else (which may be perfectly fine, after all, there's multiple ways to solve a problem and our job as engineers is to deal with things like ROI). This all boils down to your original assumptions and statement of intention.
In the original code examples, it states, "I want a state machine that performs this logic on a source, one at a time, when asked for it by the consumer" and then the async example in turn states, "At some point in the future, I will resume the intention of this code block and eventually make good on the future promise of a result". That is absolutely NOT what the code in the following examples is stating whatsoever. It says, "AT SOME POINT IN THE FUTURE I GIVE YOU A SEQUENCE AND THAT'S IT". Granted, the caller isn't aware of all that but I'm discussing this from the perspective of you the author. So what we need to do is flip the script, proverbially, and use the correct form of statement of intent (really this is just an example of Intention Revealing Interface from DDD) and you get something like this instead:
public IEnumerable<Task<String>> IterateDoWorkReports() { foreach (var i in Enumerable.Range(1, 1000000)) { var task = DoWorkReportAsync(); yield return task; }}private async Task<String> DoWorkReportAsync() { var count = await DoCountAsync(); results.Add($"{count} items found"); }
See the intention revealing difference? I'm using an iterator here to invert the control to the consumer where we're the individual results we're returning is a future (the Task) that eventually will fulfill the promise of the result. In addition, by separating concerns (state machine iterator from the asynchronous logic) I'm going to go out on a limb here and state the code is probably going to be a hell of a lot more understandable and maintainable in the future (especially at 3AM when that logic is far far more complex). Also a subtle point is the names on the public API. It's not an async suffic because it's returning a Task when you ask it to, pretty much immediately (just as long as it takes for the creation and enqueue to take.
By doing this, it allows your callers also to be in control of wonderful things like potential concurrency because you can absolutely make decisions like run a partitioning algorithm to chunk sets of Tasks and then await them (or some of them - see future posts on some interesting ways you can process work as it finishes instead of in-order) or how many to run (remember we're an iterator and our execution is deferred so the consumer can stop at any point).
Now I am going to leave you our intrepid readers to find the limitation in the above outlined approach here as a homework exercise. I'll be back soon in the future post to show you how to bridge the gaps and where the icky parts still are, or at least are until they close this hole language features.
No comments:
Post a Comment