David Boike's Blog

Using a SmartThings sensor in "garage door" mode with Home Assistant

2020-12-23T00:00:00.000Z

I started my home automation journey with SmartThings but it didn’t take long to feel constrained by the things it couldn’t do, so I switched to Home Assistant.

One thing SmartThings did well was their door/window/multipurpose sensor. (I purchased the few I have from Amazon but while I’m writing this they’re unavailable there—not sure why.)

These sensors are a standard magnetic reed switch (the little part is just a magnet) paired with a temperature sensor and accelerometer. One thing you could do with the SmartThings software was chuck the magnet part and use the sensor in “garage door” mode. You affix the sensor to the inside of the garage door and then SmartThings interprets the tilting of the sensor from vertical to horizontal as the door raises as open/closed instead of the magnetic switch.

In this post I’ll show how to get the same thing with Home Assistant to see the status of a garage door, mailbox, etc.

The SmartThings sensor

When you pair one of these sensors with Home Assistant you get four associated entities:

A binary_sensor for the accelerometer
A power sensor for the battery level
A temperature sensor
A binary_sensor named ias_zone

I have no idea what ias_zone means but what it really means the magnetic switch. So when you’re using the sensor without the magnet on something like a garage door, this is always going to be open. So the first thing I would do after pairing this device is to disable the ias_zone entity, since it’s useless.

Nowhere in the state or attributes for these entities is anything related to the position or orientation of the sensor:

However, this information is conveyed through events.

zha_event

The sensor will broadcast x-axis, y-axis, and z-axis values as separate events. I don’t know exactly how these values are defined, but it really doesn’t matter. All we need to do is install the device in the desired location and then measure.

First, let’s see the events in action:

Go to Developer Tools > Events.
Under Listen to events enter zha_event and click Start Listening.
Raise or lower the garage door.
Watch a few events come in.
Click Stop Listening.

Here’s an example event from my garage door:

{
    "event_type": "zha_event",
    "data": {
        "device_ieee": "28:6d:97:00:01:0a:a5:7a",
        "unique_id": "28:6d:97:00:01:0a:a5:7a:1:0xfc02",
        "device_id": "cd84799cc2948d8febdd5f87d12245e7",
        "endpoint_id": 1,
        "cluster_id": 64514,
        "command": "attribute_updated",
        "args": {
            "attribute_id": 18,
            "attribute_name": "x_axis",
            "value": 50
        }
    },
    "origin": "LOCAL",
    "time_fired": "2020-12-23T03:33:56.688618+00:00",
    "context": {
        "id": "91f5d475b1a27f3738ac474c87e70b2b",
        "parent_id": null,
        "user_id": null
    }
}

The important parts are:

One of the ids, either device_ieee, unique_id, or device_id. I prefer device_ieee because the same value is easily visible on the device info page.
The data.args.attribute_name, in this case x_axis.
The data.args.value which, in this case, is the value of the x coordinate.

Measuring the door

We want to measure the extremes of the X/Y/Z values when the door is all the way up and all the way down. To do that we can create helpers.

Go to Configuration > Helpers.
Click Add Helper.
Click Number.
Enter values:
- Name = Garage X
- Ignore icon, this is only temporary
- Minimum value = -1200
- Maximum value = 1200
Click Create.
Create two more helpers for Garage Y and Garage Z.

If you’re using the default auto-generated dashboard they’ll look like this:

Now to fill them with data, I created an automation:

alias: Set Garage XYZ
description: ''
trigger:
  - platform: event
    event_type: zha_event
    event_data:
      device_ieee: '28:6d:97:00:01:0a:a5:7a'
condition: []
action:
  - choose:
      - conditions:
          - condition: template
            value_template: '{{ trigger.event.data.args.attribute_name == "x_axis" }}'
        sequence:
          - service: input_number.set_value
            data:
              value: '{{ trigger.event.data.args.value }}'
            entity_id: input_number.garage_x
      - conditions:
          - condition: template
            value_template: '{{ trigger.event.data.args.attribute_name == "y_axis" }}'
        sequence:
          - service: input_number.set_value
            data:
              value: '{{ trigger.event.data.args.value }}'
            entity_id: input_number.garage_y
      - conditions:
          - condition: template
            value_template: '{{ trigger.event.data.args.attribute_name == "z_axis" }}'
        sequence:
          - service: input_number.set_value
            data:
              value: '{{ trigger.event.data.args.value }}'
            entity_id: input_number.garage_z
    default: []
mode: queued
max: 20

Some notes:

In the trigger section, the device_ieee needs to match the value for your sensor so that we filter out other sensors.
The actions section contains 3 conditions to match either x_axis, y_axis, or z_axis and then to set the appropriate helper value. Make sure the entity_id for each call to the input_number.set_value service matches the ones you created.
I set the mode to queued and the max to a relatively high value of 20. I don’t know how quickly these will get processed but I know that they come in at least 3 at a time, one event for each access. I don’t really want to discard the X coordinate event because the Z coordinate bumped it out of the queue.

Once this automation was in place, I raised and lowered the garage door. Here are two screenshots, along with the value of each helper that I added in red:

What does “open” mean?

Now that we’ve got numbers, we need to figure out how to define open and closed. The numbers won’t be exact each time, so we need one of the axes to have some daylight between them so that the transition will be completely obvious.

In this case, X is completely out as the values are only a couple dozen apart. Either Y or Z are a good candidate though, as both are about 1000 apart between the fully-open and fully-closed states.

You can’t just assume though. Depending on where you place the sensor, it could be completely different. The garage door sensor goes from being vertical when closed to horizontal (pointing down) when open. On the mailbox door, however, I affixed the sensor pointing “up” when the door was open, and when you close the mailbox the sensor is vertical but upside-down. The result is that the axis you choose and the threshold for open/closed will be different in every situation.

In the garage door case, I decided to go with the Z axis, which is partially a coin flip between Y/Z, but since garage doors go “up” and “down”, Z-axis seems to make sense to me.

As for the value, if for some reason the garage door is halfway open, I still consider that open. So rather than take the halfway point of ~500 I instead chose to define open as being any value greater than 200.

Open helper

Armed with this information, I can create a new helper of type Toggle, and call it Garage Door Open. I recommend the icon mdi:garage.

I can disable the automation we just used, or delete it, along with all the numeric helpers.

This new automation will set the state of the Garage Door Open helper:

alias: Set Garage State
description: ''
trigger:
  - platform: event
    event_type: zha_event
    event_data:
      device_ieee: '28:6d:97:00:01:0a:a5:7a'
condition:
  - condition: template
    value_template: '{{ trigger.event.data.args.attribute_name == "z_axis" }}'
action:
  - choose:
      - conditions:
          - condition: template
            value_template: '{{ trigger.event.data.args.value > 200 }}'
        sequence:
          - service: input_boolean.turn_on
            data: {}
            entity_id: input_boolean.garage_door_open
    default:
      - service: input_boolean.turn_off
        data: {}
        entity_id: input_boolean.garage_door_open
mode: queued
max: 20

Again, note that:

The device_ieee must match your device.
The entity_id (used twice) must match the Toggle helper.
We’re still filtering on one axis’s events, so the queue makes sure we don’t lose important events.

Now, you can raise and lower the garage door and watch the value of the helper flip back and forth. And now that there’s a helper, we can use the value in other automations.

For instance, I get a notification when my mailbox is opened, but ONLY when the front door is locked and the garage door is closed. Otherwise the person opening the mailbox is not the mail carrier, it’s a member of the family.

I also have an automation that announces there is someone at the front door, but only if the front door and garage door are closed.

A finishing touch

One problem is that because the helper is an input_boolean it’s always going to be editable on any dashboard you include it on, because anything input_* is designed in Home Assistant to be editable, period, while sensors are read-only.

If you wanted to display the value on a dashboard as read-only, you would need to create a template binary sensor, which basically means a sensor that gets its data from another entity according to a template. You could also keep one of the XYZ axis helpers from before and define a template binary sensor based on the numeric value.

However, I have found that it can be useful to be able to switch the helper to the incorrect state for a time, so that I can test what an automation would do with the garage door open, without actually needing to open the garage.

Wait, I live in Minnesota

I mentioned that I use one of these sensors in the mailbox, so that I can tell when the mail has been delivered. However, I live in Minnesota and it’s just a few days before Christmas.

That means it’s cold.

The mail call automation worked for just a few abnormally warm days, but then the temperature dipped and the automation stopped working. The Energizer CR2450 Lithium batteries I’ve got (and all lithium batteries generally) are not a fan of the cold weather. These Panasonic batteries boast an operating range of -30°C to +60°C (or -22ºF to +140ºF) but I have not tried them yet.

Summary

This post showed how to capture the X/Y/Z axis values of a SmartThings door/window multipurpose sensor and interpret those values as open or closed for a garage door, mailbox door, or other application where the magentic reed switch is not viable.

It’s possible the same procedure could be used on other types of sensors that include an accelerometer as well.

I have not tested it yet, but I wonder if this would be useful for applications like:

French patio doors, or casement windows that crank out. Unlike a garage/mailbox door, the orientation of the sensor relative to gravity never changes, so I don’t know if the reported coordinates would vary enough to trigger on.
Sliding porch doors, where the sensor would only slide left or right, but never really change orientation.

If you try any of those, let me know!

Distance learning alerts with Google Calendar, Alexa, and Home Assistant

2020-12-03T00:00:00.000Z

My kids have been distance learning since March. When Fall came, we picked the Distance Choice option in our local school district. We wanted to support teachers in any way we could, we both have jobs that enable us to work from home, and we figured if cases continued to get worse then all students would be distance learning eventually and then our kids wouldn’t have to make that transition mid-year.

For what it’s worth, we were right. After Thanksgiving, everyone in our district is distance learning too.

My oldest is in 3rd grade, my youngest is in kindergarten. The district is pretty on top of things. Both kids have school-issued iPads, have sync lessons over Zoom primarily on Mondays and Wednesdays, and get and submit assignments through SeeSaw on async days.

Two of the biggest problems at the start were schedule and Zoom links. Both kids have different schedules of when they need to be on Zoom, and needed an easy way to know which Zoom link they needed to join. I also didn’t want to be typing Zoom meeting IDs into an iPad keyboard that I was reading off my computer or phone. I needed a way to be able to manage schedule and links from my computer, make them available to each kid, and then make sure their butt was in the seat at the appropriate time.

I have now brought all this together by combining Google Calendar and Alexa with the power of Home Assistant.

Google Calendar

The first thing I did, even before I had Home Assistant, was to set up a Google Calendar for each kid.

I set up a new Google Calendar for each kid on my personal Google account, and that’s where I manage their schedule. The appropriate class Zoom URL goes in the Location. In some cases such as “Specialists” there are different links for Music, Tech, P.E., etc. which are in a Google Doc provided by the school. In those cases I just use the link to the Google Doc. Luckily, my daughter in 3rd grade is old enough to read and figure out some of those things without my help. With my son in Kindergarten I have to be a little more explicit.

Each kid has a Google sign-in through the school, so I manage the calendar and then I share it with them.

There was a trick to that, as at first the calendar didn’t want to show up on their device. Google has a semi-secret link for Sync Settings which I had to visit on each of their iPads before the calendar would sync to the iOS Calendar app.

With the calendar, whenever I get communication from the school that affects the schedule, I can make changes to the kid’s Google calendar and the changes are automatically synced to them. So as long as they know to be on their tablet, they can figure out where to go.

The challenge is making sure they’re in front of the tablet.

Alexa

Each kid has an Alexa device near them. One is a 3rd-generation Echo Dot, the other is a Sonos One with Alexa enabled.

The natural thing to try was Alexa reminders. If the teacher says “Be back on Zoom after lunch at 1:00” then my kid would just say “Alexa, remind me to be back on Zoom at 1:00.”

That worked for my eldest. For the kindergartener, it was a shit-show. He doesn’t understand time yet. I would hear him outside my office door in the afternoon saying “Alexa, remind me to be back on Zoom at 9:50.” Except it was already after noon so instead of being reminded to be back on Zoom with his teacher at 1:50pm, my wife and I would get a reminder while watching TV long after the children had been put to bed.

So I tried scheduling reminders, but make no mistake, this was a giant pain in the butt. The only one that was really valuable was “Sign on for morning meeting” because it happened every weekday. The rest were impossible to keep synced with the calendar. Did I mention that there’s still the concept of a “Day 1” through “Day 5” in the school schedule that relates to when the kids have class with different specialists? You can’t set this up with Alexa when every recurring reminder takes about 150 taps in the Alexa app.

Enter Home Assistant.

Home Assistant

The reason I decided to get Raspberry Pi running Home Assistant wasn’t for this problem. I was (actually still am) running SmartThings, but I’ve started to get cranky with the lack of flexibility that SmartThings allows. That’s really a topic for another post.

After doing the basic HomeAssistant setup, this is what I had to set up.

First, I installed the File Editor from the Home Assistant Add-On Store. This is useful for editing Home Assisstant’s configuration.yaml file.

In the main menu, go to Supervisor > Add-on Store.
Under Official add-ons find and click on File editor and click INSTALL. When install is done, it goes to teh details page for the add-on.
Click START.
Now you have File editor in the main menu.

Next, I set up the Google Calendar Event which is bundled with Home Assistant, but just has to have configuration added. As part of that process, a google_calendars.yaml file was created, which includes details on all of my calendars, including one for each kid. The result is that I now have a binary sensor (something that is either on or off) for each calendar.

Next, I needed to be able to have Home Assistant command a specific Alexa device to say a specific phrase on command. This…isn’t really supported, but it is possible.

HACS & Alexa

For all the crazy things that aren’t direclty supported by Home Assistant, there’s HACS, which is short for Home Assistant Community Store. To install HACS you need a GitHub account (check), and access to the Home Assistant filesystem. For the latter, I added the Samba share add-on, the same way I added the File editor. That allowed me to open the file system from my computer, and would be another way to edit the configuration.yaml file as well.

After installing HACS, a HACS menu item appears in the Home Assistant main menu. From HACS > Integrations, I installed the Alexa Media Player component according to the installation instructions. This is not official so Amazon could cut off access at any time…hopefully not until this distance learning fiasco is long over.

I believe Alexa also needs a TTS (text-to-speech) service registered in order to translate your words to speech. At any rate, I have this in my configuration.yaml and I’m unwilling to remove it to see if it’s really required or not:

1
2
3

# Text to speech
tts:
  - platform: google_translate

Now, to test Alexa’s ability to say stuff:

Go to Developer Tools > Services.
In the Service dropdown, select notify.alexa_media_DEVICE_NAME. One of these is generated for each of your Alexa devices. So for instance, the one in my office is notify.alexa_media_office. The auto-complete is key here.

In the Service Data editor, enter this:

1
2
3

data:
  type: tts
message: Hi there, I can make Alexa do my bidding.

Click Call Service.

No words can describe how giddy I was when this worked.

Automation

Now to bring it all together. Go to Configuration > Automations and create a new automation using the + in the lower-right. Then press the SKIP button as this is not a simple “turn off the lights” style automation.

These are the settings for my daughter Ellie’s notifications. If I don’t mention a field, leave it blank.

Triggers:
- Trigger type: State
- Entity: calendar.ellie_school
- To: on
Conditions: None
Actions:
- Action type: Call service
- Service: notify.alexa_media_david_s_2nd_sonos_one_second_edition
  - This is not the nicest name, but this is the device in the dining room where my daughter is set up.
- Service data: As shown below

data:
  type: tts
message: >-
  {{ state_attr('calendar.ellie_school', 'description') if
  state_attr('calendar.ellie_school', 'description') != '' else
  state_attr('calendar.ellie_school', 'message') }}

I have another similar automation set up for my son, where the name of the calendar (in both the trigger and the script) and the device name in the Service are customized to him.

The Google Calendar integration exposes the calendar as an entity, similar to a sensor, and the properties of the current calendar entry as state attributes, similar to the temperature and humidity that the sensor would return. The message for Alexa to play is constructed from the Description field of the calendar event, or if that is missing, from the event title. That way I can have a title like “Morning Meeting” but have Alexa say something more useful like “Ellie, time to sign on to morning meeting.”

Caveats

The Google Calendar integration has some limitations that limit how I can use it.

The integration gets its data from Google by polling every 5-15 minutes, which is no surprise. My initial assumption was that on each poll, Home Assistant would download the next hour or so of events, and then dutifully trigger the sensor precisely at the beginning of every event in that range, until the data is updated by the next poll.

That’s just not at all how it works.

Remember that the calendar is surfaced like a sensor, and really more of a binary sensor, which is triggered when it turns on. That means if you have back to back events (like Reading directly after Number Corner) the sensor will transition from “on” to “on” and, as a result, the automation won’t be triggered.

The sensor, along with its collection of state attributes, is also the only place any data is stored that comes from Google. So it can only store data for one event at a time.

In practice, this means that I need to make sure events that are adjoining in real life are separated on the calendar by about 15 minutes, so that between events a poll of the server can get the next event’s data and reset the sensor to off until the next event starts.

I’ve also noticed that the Alexa notifications don’t happen right at the appointed time and can take up to about 45 seconds to happen. I ensured that all devices involved use a network time server and that clock drift is not an issue.

I’m not sure about the root cause. It could be some sort of refresh interval within Home Assistant. It could be that it takes some time for the text of the message to be translated to speech before it’s played. It could be many things.

Because I don’t want my kids to be consistently late, I set the calendar appointments to start a minute or two before the true start time.

The sensor attributes include an offset_reached that is supposed to be used to trigger before the true event start time. Unfortunately, it requires including !!-2 in the event title in order to have the offset reached 2 minutes before start time, and I thought that would confuse my kids. I wish there was a calendar-level setting so that all events on that calendar used a specific offset.

I get the feeling that this integration was meant for situations like not running certain automations when you’re on vacation, where these minute timing details wouldn’t matter.

Summary

With Home Assistant, Google Calendar, and Alexa, I’m able to manage both my kids’ distance learning schedules, so that they have easy access to the Zoom links they need for school, and to be notified by Alexa when they need to use them.

This isn’t how I thought I’d be using Home Assistant when I ordered the Raspberry Pi. I haven’t even set up Z-Wave or ZigBee integration yet. But a few nights of wondering what I could get away with resulted in something that makes managing a nearly impossible situation a lot easier.

In the future I plan to use the Google Calendar integration more, for stuff like turning on the lights in my office in the morning automatically, but only on weekdays when I’m not on vacation.

Path.Combine() isn't as cross-platform as you think it is

2020-06-17T00:00:00.000Z

I started using .NET pretty close to the beginning, in either 2002 or 2003. It’s hard to accurately remember things that happened before I had kids.

Ever since that time, using Path.Combine() has been a best practice. You shouldn’t just concatenate paths together with \\ after all, one day it’s possible that .NET could be cross-platform and then all that Windows-specific code will be broken! With each passing year, I grew less and less convinced that cross-platform .NET would ever happen but dutifully continued using Path.Combine() anyway.

Well now with .NET Core, cross-platform .NET is a reality, but as it turns out, Path.Combine() isn’t quite the cross-platform panacea I feel I was promised.

In this article, I’ll tell you what to look out for when using Path.Combine() on multiple platforms so you won’t get burned the same way I was.

Background

The point of Path.Combine() is pretty simple on the surface.

Let’s say you have a base path C:\base\path and you want to add the filename myfile.txt to it.

1 2	var basePath = @"C:\base\path"; var filename = "myfile.txt";

You could just concatenate the strings:

1	var fullPath = basePath + "\\" + filename;

Or now that we have string interpolation you could concatenate it this way:

1	var fullPath = $"{basePath}\\{filename}";

But that’s bad when we enter the realm of cross-platform because if I were executing this on macOS or Linux, or anything UNIX-like, my path separator would be different:

1 2	var basePath = "/Users/david"; var filename = "myfile.txt";

And so, using either of the concatenation options above, my result would be /Users/david\myfile.txt. That will not end the way you want it to.

That’s where Path.Combine() comes in. Instead of string concatenation, you call this instead:

1
2
3

var fullPath = Path.Combine(basePath, filename);
// Windows Result: C:\base\path\myfile.txt
// Mac/Linux Result: /Users/david/myfile.txt

So all we have to do is use Path.Combine() and our apps will be 100% ready to run cross-platform. Hooray!

If only it were that simple.

Problem 1: Multi-segment paths

Turns out, a lot of people use Path.Combine() wrong and there’s no feedback to tell you it’s wrong.

At a basic level, Path.Combine(a,b) simply concatenates a and b with whatever the local path separator is, as determined by Path.DirectorySeparatorChar. You can kind of think of it like this:

public static string Combine(string path1, string path2)
{
    return path1 + Path.DirectorySeparatorChar + path2;
}

There is absolutely zero checking for whether those two parameters contain existing directory separator characters for any platform. No sort of cross-platform normalization of directory separators going on there.

So what happens if you do this?

1	Path.Combine(basePath, @"a\b\c");

Keeping with our same basePath values for each platform above, for Windows you get C:\base\path\a\b\c which works great. But everywhere else, you get /Users/david/a\b\c which is not what you’re angling for.

But lots of developers do this, because there’s really no hint anywhere that a multi-segement path as one of those parameters is a bad idea. Let’s take a look at the method signature, with the xmldoc that defines what you get in Intellisense:

/// Combines two strings into a path.</summary>
/// The first path to combine.</param>
/// The second path to combine.</param>
/// 
/// The combined paths. If one of the specified paths is a zero-length string,
/// this method returns the other path. If path2 contains an absolute path,
/// this method returns path2.
/// </returns>
public static string Combine(string path1, string path2);

Now, the first parameter is usually an established path that’s known to exist, so I have no qualms about path1 here. But path2 is extremely misleading. The definition of a path is a potentially really long string containing multiple directory names. That’s clearly not what is expected. Perhaps path2 should be renamed to pathSegment or something else, but path2 and the totally unhelpful parameter description “The second path to combine” are the exact opposite of what the method implementation expects.

The only real clue that something could be amiss (short of looking at the source code and understanding what it does…or reading this post) is that the Combine method has additional overloads that accept more parameters…

1
2
3

public static string Combine(string path1, string path2, string path3)
public static string Combine(string path1, string path2, string path3, string path4)
public static string Combine(params string[] paths)

…but really, these all continue the sins of the first.

So instead of this…

1	Path.Combine(basePath, @"a\b\c");

…we should really be using this instead:

1	Path.Combine(basePath, "a", "b", "c");

But unfortunately, it’s pretty common to see a lot more of the former than the latter.

Problem 2: Windows is too forgiving

I’ve seen Path.Combine(…) used as sort of a low-rent version of Server.MapPath(string path) method, a staple of my (thankfully long-over) ASP.NET Web Forms days.

For those not familiar, Server.MapPath(string path) is part of the System.Web assembly and its purpose is to return a physical path that corresponds to a specific virtual path. So if you start out with a path from a web request, like /path/to/file.html, then Server.MapPath(…) understands what the root folder of the website is, as well as (if I recall correctly) any virtual directories set up in IIS as well. So then if your webroot is C:\inetpub\wwwroot and your virtual path is /path/to/my-file.txt, then Server.MapPath("/path/to/my-file.txt") will return that the file physically lives at C:\inetpub\wwwroot\path\to\my-file.txt.

All well and good, but living in HttpServerUtility in the monolithic System.Web assembly meant tight coupling to IIS. If you were building something with a different web framework, you didn’t have that.

So now, if you Google aspnetcore MapPath, what do you get? My first search result says what?

It says use Path.Combine(webRoot, "test.txt").

OK, that works. What if your controller action is a catch-all like this?

string webRoot;

[HttpGet("{*filePath}")]
public async Task Get(string filePath)
{
    var physicalPath = Path.Combine(webRoot, filePath);
    // Do stuff
}

If you try accessing something a few directories deep, you’ll end up with effectively this:

1	var physicalPath = Path.Combine(@"C:\root", "virtual/path-to/file.html");

And the result is: C:\root\virtual/path/to/file.html. That’s right, you’ll get mixed path separators.

But because Windows is too forgiving, File.Exists() on this path will return true, and you can happily return a FileResponse using that path. Maybe if it were a little more strict, people would get the memo that you aren’t supposed to have existing delimiters in Path.Combine() parameters.

For the record, the next few search results right at this moment:

A StackOverflow question where the top-rated and accepted answer points out IWebHostEnvironment to get the root directory but not how to safely combine paths. That answer links to a different answer on the same page which…uses Path.Combine().
Another StackOverflow question, same focus on how to get the web root and same use of Path.Combine().
Blog post that ignores the “combine” part entirely.
A DZone scrape of the article in #4.
Another blog post that ignores combining.
Anotehr blog post that uses Path.Combine().
An aspnetcore GitHub issue titled “Server.MapPath in AspNetCore” that ends with “We don’t have plans to implement this.”

You get the idea? How many developers would search farther than this? Maybe I’ll get lucky and this post will crack the top 5 and help somebody out. Maybe that person is you!

Problem 3: Root paths

Consider these two examples:

1 2	Console.WriteLine(Path.Combine(Environment.CurrentDirectory, "\\abc\\abc")); Console.WriteLine(Path.Combine(Environment.CurrentDirectory, "/abc/abc"));

The Path.Combine(…) method has some kinda-sorta nods to trying to maybe a little bit be cross-platform, but it doesn’t work out too well in practice. In an internal IsPathRooted() method, a check is made to see if the first character of the second parameter is a directory separator or volume separator character.

On Windows, \\ is considered the primary directory separator, while / is considered an alternate directory separator. So the result is this:

1
2
3

// Windows
\abc\abc
/abc/abc

The beginning character was taken to represent the “root” of a filesystem, and so the first parameter wasn’t used at all. The answer in both cases was whatever the second path was.

Now here’s the result on my Mac:

1 2	/Users/david/testapp/bin/Debug/netcoreapp3.1/\abc\abc /abc/abc

Well, that’s interesting. On macOS (and I assume on Linux as well, though I did not check) the primary directory separator AND the alternate directory separator are both / and the character \\ is never considered, ever.

This is a bit of a corner-case, but still, drastically different results from code executing on different platforms. All the more reasons that Path.Combine() parameters should not be allowed to contain directory separators of any kind.

Perhaps one day I’ll get around to writing a Roslyn analyzer to make that a compile-time error.

Summary

For a method that was created more than a decade before the framework was made cross-platform, it’s kind of amazing that Path.Combine(…) works at all. It does was it does, but you need to be aware of its idiosyncracies if you plan to use it in a cross-platform application or library.

There are really three basic, interrelated rules of thumb to keep in mind:

The first parameter of Path.Combine(…) should be thought of as a base path, and you should always be absolutely sure that path already exists on the system.
Every other parameter (because there are multiple overloads for different numbers of parameters) should not contain any path separator characters, from any platform.
When using Path.Combine(…) with user input, arbitrary inputs from a web request, or basically anything that isn’t a literal string, you should take care to split it apart based on all the different platform-specific directory separator characters (in practice, / and \\) and then feed the results of that into Path.Combine(params string[] paths).

One example of how to do #3 is this method:

using System.IO;
using System.Linq;

public static class CrossPlatform
{
    public static string PathCombine(string basePath, params string[] additional)
    {
        var splits = additional.Select(s => s.Split(pathSplitCharacters)).ToArray();
        var totalLength = splits.Sum(arr => arr.Length);
        var segments = new string[totalLength + 1];
        segments[0] = basePath;
        var i = 0;
        foreach(var split in splits)
        {
            foreach(var value in split)
            {
                i++;
                segments[i] = value;
            }
        }
        return Path.Combine(segments);
    }

    static char[] pathSplitCharacters = new char[] { '/', '\\' };
}

Unfortunately, all the string splitting and then recombining allocates a lot more memory and will be quite a bit slower than Path.Combine(…) on a hot path, but more performant code will be inherently less readable and may need to re-implement some of the base assumptions that you take for granted in Path.Combine().

Overriding the NServiceBus ConversationId

2020-04-15T00:00:00.000Z

UPDATE: Starting in NServiceBus version 7.4 you can create a new ConversationId using sendOptions.StartNewConversation(). No more need to create a custom pipeline behavior as I explain here.

The purpose of the ConversationId header included with every NServiceBus message is to relate a whole bunch of messages together as all having started from the same action. It’s generally a very bad idea to mess with the ConversationId in a message handler, so if you try, you’ll get this exception:

System.Exception: Cannot set the NServiceBus.ConversationId header to ‘9203ecb1-d2ed-46eb-ae99-fbeb7a5db387’ as it cannot override the incoming header value (‘a1c91a87-2db9-493f-a638-ab9d016a1305’).

But there are some times when it might be a good idea to override this id, if you know what you’re doing. This article shows you how to do that.

What `ConversationId` is for

When you click in a web application (for example) a message gets sent. This is the very first message in the “conversation” so a new ConversationId is generated. From that point on, every message that is sent as a result of that original message (from message handlers sending or publishing still more messages) copies the same ConversationId from the incoming message.

When these messages get successfuly processed, we can send a copy of them to an auditing store, like ServiceControl.

Then ServiceInsight can query our auditing store for messages having the same ConversationId, and with that information build a flow diagram like this:

Or, a sequence diagram like this:

The problem

The problem is when you use a never-ending saga, something like the CustomerHasBecomePreferred saga I wrote about in Death to the batch job, or the sample scheduler saga I wrote in my last post. There’s never an event to say “This is the start of a new conversation, please come up with a new ID.”

If you try to look at a saga like this using ServiceInsight, the diagrams would get larger and more complex the longer the saga lived, and it wouldn’t take long for the diagrams to become completely unusable.

Solution

Let’s take another look at the exception if we try to change the ConversationId within a message handler. This time I’ll include a couple lines from the stack trace.

System.Exception: Cannot set the NServiceBus.ConversationId header to '9203ecb1-d2ed-46eb-ae99-fbeb7a5db387' as it cannot override the incoming header value ('a1c91a87-2db9-493f-a638-ab9d016a1305').
   at NServiceBus.AttachCausationHeadersBehavior.SetConversationIdHeader(IOutgoingLogicalMessageContext context, IncomingMessage incomingMessage)
   at NServiceBus.AttachCausationHeadersBehavior.Invoke(IOutgoingLogicalMessageContext context, Func`2 next)
   ...

I include the first couple lines of the stack trace because that’s a clue for how to get around this quandary. Specifically, the AttachCausationHeadersBehavior where the method takes an IOutgoingLogicalMessageContext.

This is a pipeline behavior, one of many built into NServiceBus that do things to messages as they’re either processed (incoming behaviors) or sent out (outgoing behaviors).

In this case, IOutgoingLogicalMessageContext tells us that we’re operating on the part of the outgoing message pipeline where we have a logical message—in other words, we’re still dealing with a class and haven’t serialized the message to bytes to send to the message transport yet.

We can operate later in the pipeline by creating our own behavior operating on the IOutgoingPhysicalMessageContext.

public class ModifyConversationIdBehavior : Behavior
{
    public const string OverrideHeader = "Temp.OverrideConversationId";

    public override Task Invoke(IOutgoingPhysicalMessageContext context, Func next)
    {
        // If a temporary override header has been set, move THAT value into the real header
        if(context.Headers.TryGetValue(OverrideHeader, out string overridingConversationId))
        {
            context.Headers[Headers.ConversationId] = overridingConversationId;
            context.Headers.Remove(OverrideHeader);
        }

        // Execute the rest of the pipeline
        return next();
    }
}

We also have to register this new pipeline behavior when we configure the endpoint containing the scheduler saga:

1	endpointConfiguration.Pipeline.Register(new ModifyConversationIdBehavior(), "Modifies the ConversationId of an outgoing message if necessary.");

Now, from wherever point you want to cut the conversation into two (in my scheduler saga, it’s the point where the scheduler fires off a new execution of the task) you can do this:

var command = new WhateverCommand();

var sendOptions = new SendOptions();
sendOptions.SetHeader(ModifyConversationIdBehavior.OverrideHeader, Guid.NewGuid().ToString());
await context.Send(command, sendOptions);

When intercepted by the behavior the value stored in the OverrideHeader will override the value copied from the previous message in the chain, effectively starting a brand new conversation.

Summary

Overriding ConversationId isn’t something to be done lightly, as you can break your auditing and message visualizations. That’s why the NServiceBus API tries to prevent you from doing it. But with a framework as extensible as NServiceBus, there’s almost always a way to break the rules, and pipeline behaviors are a common outlet for well-meaning rule-breakers to do just about anything you can dream up.

For more on useful behaviors, you might want to check out my post Infrastructure soup on the Particular Software blog.

Creating a scheduler with NServiceBus

2020-04-15T00:00:00.000Z

Sometimes a system needs a kind of rudimentary scheduler to tell it when it’s time to do things. Run a weekly report. Poll a legacy system for changes every night. Run some script every 4 hours. The examples are plentiful but usually quite boring.

In a message-based system with NServiceBus, starting these tasks is usually as simple as sending a command, but something still has to send that command on the right schedule. We could use a Windows Scheduled Task, but that feels gross for a couple of reasons. First, we have to create a dedicated app just to spin up a send-only endpoint, send one command, and then die. Second, it feels wrong somehow that using scheduled tasks would take an important part of the system (the schedule) and place it entirely outside the system, complicating the deployment, especially if we (the developers) don’t have direct access to the production infrastructure, either because of our sysadmins or because there is no infrastructure because we’re using Platform-as-a-Service in the cloud.

So we’re dealing with NServiceBus and time, and NServiceBus is supposed to be able to model time through the use of sagas. So couldn’t we implement a simple scheduler using a saga?

Well sure of course we could. But should we? It’s a classic it depends scenario. So let’s delve a little further. I’ll explain how you could do it, and then we’ll be in a better place to talk about whether that’s even a good idea in the first place.

The saga

Remember that a saga is basically a message-driven state machine (see the saga basics tutorial) where the state is stored (usually in a database of some kind) between messages. Some of those messages can have a delay (see the timeouts tutorial) to wake it up at some point in the future.

Ideally, a scheduler saga would be able to handle multiple schedules, otherwise, a saga is actually quite a bit of code for something so simple, especially if you have to create multiple of them. Sagas are hard to use as a singleton process anyway—they want to use a CorrelationId to tell between different instances of the saga that all have independent data.

We can take advantage of that - each saga instance will be for a different schedule, and the CorrelationId will be the type of message we want to send when the schedule comes due.

First, let’s look at the message we’ll send to start the scheduler saga:

using System;
using System.Collections.Generic;
using System.Text;
using NServiceBus;

namespace Messages
{
    public class StartSchedule : ICommand
    {
        public string CommandTypeFullName { get; set; }
        public TimeSpan Interval { get; set; }
        public WeeklySchedule Weekly { get; set; }

        public StartSchedule() { }

        public StartSchedule(Type type, TimeSpan interval)
            : this()
        {
            CommandTypeFullName = type.FullName;
            Interval = interval;
        }

        public StartSchedule(Type type, WeeklySchedule weekly)
        {
            CommandTypeFullName = type.FullName;
            Weekly = weekly;
        }
    }

    public class WeeklySchedule
    {
        public DayOfWeek[] DaysOfWeek { get; set; }
        public TimeSpan TimeOfDay { get; set; }
    }
}

This allows us two variations on schedule. Either we can do a weekly schedule, on multiple days if necessary but all at the same time of day, by sending a message like this:

await endpoint.Send(new StartSchedule(typeof(DoFirstThingWeekly), new WeeklySchedule
{
    DaysOfWeek = new [] { DayOfWeek.Friday },
    TimeOfDay = new TimeSpan(12, 0 ,0)
}));

Or, we can do a regular time span, like every 8 hours for instance, like this:

1	await endpoint.Send(new StartSchedule(typeof(DoSecondThingEvery8Hours), TimeSpan.FromHours(8)));

These calls should be made from your application’s startup. This way, the saga will be “reminded” of the schedule every single time your app starts up, and also covers the use cases of bootstrapping the first run or changing the schedule later on.

Now let’s get to the actual saga, which I’ll attempt to mark up with comments:

using System;
using System.Linq;
using System.Threading.Tasks;
using NServiceBus;
using Messages;

public class Scheduler : Saga,
    IAmStartedByMessages,
    IHandleTimeouts
{
    public class ScheduleData : ContainSagaData
    {
        public string CommandTypeFullName { get; set; }
        public TimeSpan Interval { get; set; }
        public WeeklySchedule Weekly { get; set; }
        public DateTime NextRun { get; set; }
    }

    protected override void ConfigureHowToFindSaga(SagaPropertyMapper mapper)
    {
        // The mapper is responsible for finding the saga data based on a message.
        // So in SQL this would generate something like:
        //     select from ScheduleData 
        //     where CommandTypeFullName = @MessageValueOfCommandTypeFullName
        mapper.ConfigureMapping(m => m.CommandTypeFullName)
           .ToSaga(s => s.CommandTypeFullName);
    }

    public Task Handle(StartSchedule message, IMessageHandlerContext context)
    {
        // Either we're starting the saga for the first time, or "reprogramming" it 
        // with new schedule data, so we store the new values and then go to Run() 
        // to do the dirty work
        Data.Interval = message.Interval;
        Data.Weekly = message.Weekly;
        return Run(context);
    }

    public Task Timeout(NextTimeTimeout state, IMessageHandlerContext context)
    {
        // When a timeout message comes due, we just want to see whether 
        // it's time to act
        return Run(context);
    }

    async Task Run(IMessageHandlerContext context)
    {
        var now = DateTime.UtcNow;

        // May be because the app just started up or we're "reprogramming" the saga
        // We don't necessarily want to spring into action unless it's really time.
        if (Data.NextRun > now)
        {
            return;
        }

        // Depending on whether we're doing interval/weekly scheduling, we want to 
        // figure out when the next time to run is.
        if (Data.Interval > TimeSpan.Zero)
        {
            Data.NextRun = now + Data.Interval;
        }
        else if (Data.Weekly != null)
        {
            // Cycle through the next 7 days until we find the next scheduled date
            var start = now.Date.AddDays(1);
            for (var day = start; day < start.AddDays(7); day = day.AddDays(1))
            {
                if (Data.Weekly.DaysOfWeek.Contains(day.DayOfWeek))
                {
                    Data.NextRun = day + Data.Weekly.TimeOfDay;
                }
            }
        }

        // Get a type from the command name. Must be in same assembly as StartSchedule
        var commandType = typeof(StartSchedule).Assembly
            .GetType(Data.CommandTypeFullName);
        
        // Create & send instance of the command. Must be parameterless constructor
        var command = Activator.CreateInstance(commandType) as ICommand;
        await context.Send(command);

        // Request the timeout for the next activation
        await RequestTimeout(context, Data.NextRun);
    }

    public class NextTimeTimeout { }
}

Does this work? Sure! Well, mostly. It’s got some issues, but depending on the project, these may be anything from easy-to-ignore minor details all the way up to showstoppers. Let’s look at each of these “it depends” scenarios.

Clock drift

Arguably the biggest problem with this whole saga implementation are these 4 lines, which at first glance look logically correct:

if (Data.NextRun > now)
{
    return;
}

Because we want to be able to “reprogram” the saga at any time, we want to be able to ignore some messages, like if a StartSchedule message comes in that’s only the result of your app restarting.

Logically, the timeout message should occur after now and should skip this check—after all that’s the point of a timeout.

But what if the timeout is handled by a message transport like Azure Service Bus that has native scheduled messages, and what if the clock on the processing server has drifted such that it is a few seconds behind official “Azure Time”?

The result will be that the message scheduled for 12:00:00 UTC may arrive, as far as the processing server is concerned, at 11:59:57 UTC. At that point, Data.NextRun of noon is 3 seconds in the “future”, so the message will be ignored.

Not only will the scheduled task not fire, but because the next schedule is set at the same time, the scheduler is essentially in limbo until the next time your app starts up to send a StartSchedule that will be after the Data.NextRun time.

One way to try to fix this is to have a bit of tolerance for clock drift in the code, like this:

if (Data.NextRun > now.AddSeconds(-30))
{
    return;
}

This way, the clock can drift up to 30 seconds and still get the behavior we want. However, this saga is now pretty useless for doing anything with sub-minute times with much accuracy. In the cases where I’ve done this, I only care that the task happens once a day, or maybe every 8 hours, and I’m not too fussed if it happens 30 seconds off its scheduled time. I just want it done.

Reprogramming is limited

In this saga, you can reprogram the timer, but only by deploying new code. Because we’re storing the next run time and not taking any action before that time, you basically can’t reprogram it to anything faster than what it was doing before. If it’s not going to run until 8 hours from now, and you reprogram it to be an hourly timer, then sorry, it’s not going to run for 8 hours, but then it will run for every hour after that.

This becomes more of a problem if you accidentally have a deployment set a next run time on the order of days/months, and then want it to be hourly for some reason.

It would be possible to change the code to always recalculate the next run time on receipt of StartSchedule, but then the code can get much more complex, and you run the risk of upsetting the schedule if for some reason you encounter more frequent app restarts.

In my situation, my schedules are completely arbitrary and I have no intention of changing them…ever. So no need to worry.

Auditing

If you’re auditing messages to ServiceControl, and want to be able to view diagrams in ServiceInsight, then this scheduler has a bit of a problem because all the messages coming from this saga are going to be viewed as part of the conversation because they all share the same ConversationId header. So there wouldn’t be any way to look at the message flow from just the most recent execution, or the one that happened 3 days ago. The diagrams would get bigger and more complex, and it wouldn’t take very long for the diagrams to become unusable.

This would be really frustrating if each schedule kicks off some process, such as a file import, with many substeps that you’d want to be able to visually debug later.

Well, there are ways around such things, like changing the ConversationId so that the conversations are separate. It’s not possible to directly set the ConversationId within a handler though.

This post was getting pretty long, and changing the ConversationId has uses outside of this scheduler saga, so I wrote a separate post on overriding the NServiceBus ConversationId. If you create the behavior outlined in that article, then you can change the saga code a bit.

With the behavior outlined in that article in place, you can change this code in the saga:

1
2
3

// Create an instance of the command and send it. Must be parameterless constructor
var command = Activator.CreateInstance(commandType) as ICommand;
await context.Send(command);

To this:

// Create an instance of the command and send it. Must be parameterless constructor
var command = Activator.CreateInstance(commandType) as ICommand;

var sendOptions = new SendOptions();
sendOptions.SetHeader(ModifyConversationIdBehavior.OverrideHeader, Guid.NewGuid().ToString());
await context.Send(command, sendOptions);

Now every command sent out by the scheduler will be a fresh conversation you can look at individually in ServiceInsight.

Of course, if you’re not auditing messages at all, then you don’t need to do anything like that.

Date and time is just hard

This is a super-simple scheduler, doing only weekly and interval scheduling. For me, that’s all I need. If you wanted to do any of the fancy things your calendar app can do, like:

Repeat every N days
Repeat every N weeks
Repeat every N months
Repeat on the Nth Tuesday of the month
Repeat every N years
Ending after N occurrences
Ending on X date
Anything to do with time zones
Anything where you care about Daylight Saving
Anything where you care about leap days
Anything involving Easter

…well then you’re out of luck. Date and time are hard, and if you don’t believe me, go read some blog posts by Matt Johnson-Pint. I don’t recommend trying to extend the code to handle any of these other scenarios. It’s not worth it. Keep reading and pick a different option.

Scheduling alternatives

I’ve presented a simple scheduler saga, and a bunch of problems it might cause you, all of which you can address to some varying degree. But is it worth it?

If you’re running a small project with very simple scheduling needs, and none of the sections above gives you pause, then you’ll probably be fine. Otherwise, you might want to think about one of these alternatives, any of which you can use to send an NServiceBus message.

Quartz.NET is a full-featured scheduler library for .NET. It can be complex, but it can do anything you need it to.
Hangfire is a way to perform background processing on .NET without a Windows service or separate process. It can handle recurring jobs using a CRON schedule and persists the jobs in a database, so you don’t have to worry about app restarts mucking up your schedule.
Azure Functions has a timer trigger that runs a CRON schedule.

Summary

A bad consultant will say “it depends” and leave it at that. A good one will say “it depends” but then tell you the things it depends on.

My aim in writing this article is not so much to share the code for the scheduler saga, but to highlight all the “it depends” around whether or not it’s a good tool for your particular job.

I do use this exact scheduler saga in a project. It’s a small project, more of a proof of concept, that runs as a single NServiceBus endpoint embedded in an Azure App Service run using Visual Studio Azure credits. I’m already using NServiceBus in this project, and so I’m looking for reliable scheduling (i.e. not System.Timers.Timer) with the lowest possible barrier to entry.

None of the caveats in this article apply to me, so I’m happy to use it. Though I must advmit, if I had to do it over, I would strongly consider using an Azure Functions timer trigger instead.

Creating a RavenDB cluster in Docker

2019-06-11T00:00:00.000Z

At Particular we support a bunch of different technologies, so it seems there’s no end to the infrastructure software I might have to use on any given day. SQL Server, RabbitMQ, MongoDB, MySQL, PostgreSQL, MariaDB, even (shudder) Oracle.

I don’t want all that crap installed on my machine. In fact, I don’t want to install infrastructure on my machine again, like…ever.

So when I needed to work with a RavenDB cluster, I Dockerized it, and here’s how I did it. Maybe it’s not perfect, maybe it could be better? If you think so, let me know! I feel like I stumbled through this, but the result appears to work well.

Docker networking is fun

NOTE:: It’s helpful to know that I run Windows on macOS with Parallels, and that my Windows host file contains a hostos entry that is always addressable to the Mac, so I can use that like a localhost except it’s locahost on macOS, not on the Windows virtual machine. I hope to blog more about this in the near future.

When dealing with Docker and networking, it seems if you’re not already a network engineer (which I am not) you’re already at a bit of a disadvantage.

There are two pretty easy modes of operation:

If you declare mapped ports, you can talk to the application on those ports on localhost without much fuss.
If you give the containers names, the containers can talk to each other using those names.

In either case, all the Docker stuff gets its own little island, and you have very defined bridges (the exposed ports) onto that island.

But a RavenDB cluster has a few different wrinkles. One RavenDB server will normally communicate on port 8080 (HTTP) and port 38888 (TCP), and need to communicate both externally and amongst themselves, but you can’t use the same address for both. This becomes a problem when the Raven cluster gives its internal addresses to the client, which then wants to verify that they’re all alive and can’t even find an address.

Let me give an example. If you create containers named raven1, raven2, and raven3 and set them up as a cluster, raven1 can see and talk to raven2 and raven3, but then reports those names to the client, in this case code running in Visual Studio, and the Windows environment has no idea how to resolve raven1.

The Raven team knew this (they are much better at network engineering than me) and provided configuration options to deal with it by providing environment variables:

RAVEN_ServerUrl - The internal port 8080 address. This is always http://0.0.0.0:8080, the 0s mean that it can respond on any host name you throw at it. It’s always port 8080 because this is local to the container - nothing else will be vying for this.
RAVEN_ServerUrl_Tcp - Same deal but for the TCP port. Always tcp://0.0.0.0:38888.
RAVEN_PublicServerUrl - This is the external address for the 8080 address, or in other words, how you’d get onto the Docker island. Here I provide one of the following, one for each node:
- http://hostos:8080
- http://hostos:8081
- http://hostos:8082
RAVEN_PublicServerUrl_Tcp - Same deal but for TCP. Either:
- tcp://hostos:38888
- tcp://hostos:38889
- tcp://hostos:38890

With this setup of public/private URLs, Raven reports its server topology using the public URLs, which my code is able to look up, and everything just works.

Docker compose

So given the networking aspects above, running docker-compose up --detach with the contents below in docker-compose.yml ramps up the 3 server nodes:

version: '3'
services:
  raven1:
    container_name: raven1
    image: ravendb/ravendb
    ports:
      - 8080:8080
      - 38888:38888
    extra_hosts:
      - "hostos:10.211.55.2"
    environment:
      - RAVEN_Security_UnsecuredAccessAllowed=PublicNetwork
      - RAVEN_Setup_Mode=None
      - RAVEN_License_Eula_Accepted=true
      - "RAVEN_ServerUrl=http://0.0.0.0:8080"
      - "RAVEN_PublicServerUrl=http://hostos:8080"
      - "RAVEN_ServerUrl_Tcp=tcp://0.0.0.0:38888"
      - "RAVEN_PublicServerUrl_Tcp=tcp://hostos:38888"
  raven2:
    container_name: raven2
    image: ravendb/ravendb
    ports:
      - 8081:8080
      - 38889:38888
    extra_hosts:
      - "hostos:10.211.55.2"
    environment:
      - RAVEN_Security_UnsecuredAccessAllowed=PublicNetwork
      - RAVEN_Setup_Mode=None
      - RAVEN_License_Eula_Accepted=true
      - "RAVEN_ServerUrl=http://0.0.0.0:8080"
      - "RAVEN_PublicServerUrl=http://hostos:8081"
      - "RAVEN_ServerUrl_Tcp=tcp://0.0.0.0:38888"
      - "RAVEN_PublicServerUrl_Tcp=tcp://hostos:38889"
  raven3:
    container_name: raven3
    image: ravendb/ravendb
    ports:
      - 8082:8080
      - 38890:38888
    extra_hosts:
      - "hostos:10.211.55.2"
    environment:
      - RAVEN_Security_UnsecuredAccessAllowed=PublicNetwork
      - RAVEN_Setup_Mode=None
      - RAVEN_License_Eula_Accepted=true
      - "RAVEN_ServerUrl=http://0.0.0.0:8080"
      - "RAVEN_PublicServerUrl=http://hostos:8082"
      - "RAVEN_ServerUrl_Tcp=tcp://0.0.0.0:38888"
      - "RAVEN_PublicServerUrl_Tcp=tcp://hostos:38890"

Just a few other notes:

extra_hosts defines my macOS host entry on each of the Docker containers as well. Essentially this gives network traffic a way to get off the Docker island and then return.
There are extra environment variables that make sure I don’t have to go through a lot of Raven setup mumbo-jumbo on each server node. You can look up exactly what they do in the RavenDB documentation.

Setting up the cluster

Running docker-compose only gets you so far. When it’s complete you get 3 Raven nodes that aren’t connected in any way, and don’t even have a license applied. In order to set up a cluster you must have a license, and it must be applied only to the node you intend to be the leader. The remaining nodes are then joined to the already-licensed leader and are allotted a number of assigned cores from the license’s maximum limit. Because a (free) development license allows up to 3 cores, that’s 1 core per node.

So I actually have a bash script (remember I’m on a Mac) that runs docker-compose and then executes a series of curl commands to configure the cluster.

First, I apply the license to raven1. This is back to using localhost becuase I execute it on the Mac:

1 2	echo "Applying licenses..." curl 'http://localhost:8080/admin/license/activate' -H 'Content-Type: application/json; charset=UTF-8' --data-binary '{LICENSE_JSON}' --compressed ;

You’ll need to provide your own LICENSE_JSON stripped of any prettified whitespace. However instead of copying my script and trying to do this manually, the easiest way is to do it using RavenDB Studio in Chrome, while watching with Chrome developer tools. On the Network tab, you can take any request, right-click, and get a bunch of options. On Windows, you can copy a request as PowerShell or cURL.

So pick your scripting poison, and then just remove any irrelevant headers. The RavenDB server doesn’t really care what your user agent is.

Next I want to tell raven1, the cluster leader, that it only gets to use 1 core, in order to leave 2 cores remaining for the rest of the cluster:

1
2

echo "Assigning 1 core for leader node..."
curl 'http://localhost:8080/admin/license/set-limit?nodeTag=A&newAssignedCores=1' -X POST -H 'Content-Type: application/json; charset=utf-8' -H 'Content-Length: 0' --compressed ;

And lastly, I want to join raven2 and raven3 to the cluster as watcher nodes, allotting only 1 assigned core for each. Again, I discovered these URLs using the Chrome network tools. Note that the url-encoded url parameter uses hostos as the host. I don’t know why but using raven2 and raven3 didn’t work for me. This is also why my compose file needed to specify the extra_hosts parameter:

echo "Adding raven2 to the cluster..."
curl 'http://localhost:8080/admin/cluster/node?url=http%3A%2F%2Fhostos%3A8081&watcher=true&assignedCores=1' -X PUT -H 'Content-Type: application/json; charset=utf-8' -H 'Content-Length: 0' --compressed

echo "Adding raven3 to the cluster..."
curl 'http://localhost:8080/admin/cluster/node?url=http%3A%2F%2Fhostos%3A8082&watcher=true&assignedCores=1' -X PUT -H 'Content-Type: application/json; charset=utf-8' -H 'Content-Length: 0' --compressed

So putting it all together, assuming Docker is already running on my Mac, here is the script that launches my cluster for me:

echo "Running docker-compose up"
docker-compose up --detach

sleep 2

echo "Applying license..."
curl 'http://localhost:8080/admin/license/activate' -H 'Content-Type: application/json; charset=UTF-8' --data-binary '{LICENSE_JSON}' --compressed ;

echo "Assigning 1 core for leader node..."
curl 'http://localhost:8080/admin/license/set-limit?nodeTag=A&newAssignedCores=1' -X POST -H 'Content-Type: application/json; charset=utf-8' -H 'Content-Length: 0' --compressed ;

echo "Adding raven2 to the cluster..."
curl 'http://localhost:8080/admin/cluster/node?url=http%3A%2F%2Fhostos%3A8081&watcher=true&assignedCores=1' -X PUT -H 'Content-Type: application/json; charset=utf-8' -H 'Content-Length: 0' --compressed

echo "Adding raven3 to the cluster..."
curl 'http://localhost:8080/admin/cluster/node?url=http%3A%2F%2Fhostos%3A8082&watcher=true&assignedCores=1' -X PUT -H 'Content-Type: application/json; charset=utf-8' -H 'Content-Length: 0' --compressed

The result is this in the Cluster view in Raven Studio: A 3-node cluster with one Leader node and two Watcher nodes:

Summary

So that’s how you create a 3-node RavenDB cluster in Docker containers. Hopefully it will be useful to somebody. Probably that somebody will be me 6 months from now when I google it and find this post.

By no means do I find this perfect. If you can do better, please use the Edit button at the top of this post and send me a PR!

Sure, you can just use RabbitMQ

2017-12-13T00:00:00.000Z

Note: This post was adapted from an answer I originally posted to a Stack Overflow question.

People ask (frequently) why they need NServiceBus. “I’ve got RabbitMQ and that has built-in Pub/Sub,” they might say. “Isn’t NServiceBus just a wrapper around RabbitMQ? I could probably write that in less than a weekend. After all, how hard could it be?

Well sure, you can definitely just use pure RabbitMQ. I’ll even help you get started writing that wrapper. You just have to keep a couple things in mind.

First you should read Enterprise Integration Patterns cover to cover and make sure you understand it well. It is 736 pages, and a bit dry, but extremely useful information. It also wouldn’t hurt to become an expert in all the peculiarities of RabbitMQ.

Messaging

Then you just have to decide how you’ll define messages, how to define message handlers, how to send messages and publish events. Before you get too far you’ll want a good logging infrastructure. You’ll need to create a message serializer and infrastructure for message routing. You’ll need to include a bunch of infrastructure-related metadata with the content of each business message. You’ll want to build a message dequeuing strategy that performs well and uses broker connections efficiently, keeping concurrency needs in mind.

Next you’ll need to figure out how to retry messages automatically when the handling logic fails, but not too many times. You have to have a strategy for dealing with poison messages, so you’ll need to move them aside so your handling logic doesn’t get jammed preventing valid messages from being processed. You’ll need a way to show those messages that have failed and figure out why, so you can fix the problem. You’ll want some sort of alerting options so you know when that happens. It would be nice if that poison message display also showed you where that message came from and what the exception was so you don’t need to go digging through log files. After that you’ll need to be able to reroute the poison messages back into the queue to try again. In the event of a bad deployment you might have a lot of failed messages, so it would be really nice if you didn’t have to retry the messages one at a time.

Transactions

Since you’re using RabbitMQ, there are no transactions on the message broker, so ghost messages and duplicate entities are very real problems. You’ll need to code all message handling logic with idempotency in mind or your RabbitMQ messages and database entities will begin to get inconsistent. Alternatively you could design infrastructure to mimic distributed transactions by storing outgoing messaging operations in your business database and then executing the message dispatch operations separately. That results in duplicate messages (by design) so you’ll need to deduplicate messages as they come in, which means you need well a well-defined strategy for consistent message IDs across your system. Be careful, as anything dealing with transactions and concurrency can be extremely tricky.

Workflow

You’ll probably want to do some workflow type stuff, where an incoming message starts a process that’s essentially a message-driven state machine. Then you can do things like trigger an action once 2 required messages have been received. You’ll need to design a storage system for that data. You’ll probably also need a way to have delayed messages, so you can do things like the buyer’s remorse pattern. RabbitMQ has no way to have an arbitrary delay on a message, so you’ll have to come up with a way to implement that.

You’ll probably want some metrics and performance counters on this system to know how it’s performing. You’ll want some way to be able to have tests on your message handling logic, so if you need to swap out some dependencies to make that work you might want to integrate a dependency injection framework.

Documentation

Because these systems are decentralized by nature it can get pretty difficult to accurately picture what your system looks like. If you send a copy of every message to a central location, you can write some code to stitch together all the message conversations, and then you can use that data to build message flow diagrams, sequence diagrams, etc. This kind of living documentation based on live data can be critical for explaining things to managers or figuring out why a process isn’t working as expected.

Speaking of documentation, make sure you write a whole lot of it for your message queue wrapper, otherwise it will be pretty difficult for other developers to help you maintain it. Of if someone else on your team is writing it, you’ll be totally screwed when they get a different job and leave the company. You’re also going to want a ton of unit tests on the RabbitMQ wrapper you’ve built. Infrastructure code like this should be rock-solid. You don’t want losing a message to result in lost sales or anything like that.

So if you keep those few things in mind, you can totally use pure RabbitMQ without NServiceBus.

Hopefully, when you’re done, your boss won’t decide that you need to switch from RabbitMQ to Azure Service Bus or Amazon SQS.

Introduction to NServiceBus

2017-03-17T00:00:00.000Z

If you’re looking at my blog right now (hint: you are) then you might already know that I am the author of Learning NServiceBus which focused on NServiceBus 4, and Learning NServiceBus - Second Edition which focused on NServiceBus 5.

Now NServiceBus 6 has been released, and with the change to a fully async API, quite a bit has changed. In fact, enough has changed that if you tried to use my latest book, which is called Learning NServiceBus, to actually, you know, learn NServiceBus, it might not be the greatest experience in the world.

The problem is, writing books sucks. A lot. So I’m not doing that anymore.

What’s wrong with books?

To publish a book you spend countless hours, writing, editing, rewriting, editing, proofing, proofing some more, then proofing again. By the time you get done and you get the physical copy of the book in your hands, you really don’t want to look at that content probably ever again. So when the first version of my book was published, and my copies arrived in the mail, here’s what I did: I cracked one open just to make sure there were actually words in it, and then I closed it and put every single copy on my bookshelf.

I seriously did not want to look at those words one more time. I stopped.

But NServiceBus didn’t stop. NServiceBus just kept going. By the time my book was published, NServiceBus 4 had already been out for a month and a half. Within another two months, NServiceBus 4.1 was out. Then 4.2, and so on. Each release made my book a little more wrong.

Just a bit over a year after I published, NServiceBus 5 was released, making my book completely wrong. I figured that, at the very least, updating the book to a new major version would require much less effort than went into the first version.

Well, I was wrong. Granted, I didn’t just update the code samples and call it a day, because I wanted the book to be the best it could possibly be. Although I didn’t keep track of hours, I’m fairly certain that the updated took every bit as much effort as the one that came before.

And here we are again. Now that NServiceBus 6 has been released, the second book is also completely wrong. Putting words to actual paper is just no good way to actually educate people how to use a software product that’s changing and evolving.

So now that I have the advantage of working direclty for the makers of NServiceBus, we’ve decided to go in a bit of a different direction. I’ve been working on a new Introduction to NServiceBus tutorial, hosted within our documentation site. That means we’ll be able to keep the tutorial up-to-date and relevant using the same tech we use for our documentation and samples, in a way a paper book could never hope to accomplish.

The new tutorial

I designed the new tutorial as a series of 5 lessons, each designed to be accomplished in a half hour or less. The tutorial teaches the basics of working with NServiceBus but a lot more than that as well. It also covers the messaging theory to help you get your NServiceBus projects off on the right foot.

Each lesson contains an exercise that guides you through each step of creating a sample messaging system in the retail/ecommerce domain. By the end of the tutorial, you’ll have 4 messaging endpoints all exchanging messages with each other, and also learn how to use the tools in the Particular Service Platform to replay messages when the system suffers from temporary failures.

When complete, the system you build will look something like this:

Sound interesting? If so, give the Introduction to NServiceBus tutorial a try, and then ping me on Twitter @DavidBoike to let me know what you think.

The first tutorial is introductory, and introduces the concepts of messaging, sending messages, and using Publish/Subscribe. I would love to build additional tutorials in the future that will extend on this, showing how to organize NServiceBus projects, centralize repeated configuration, and show how to get the most out of NServiceBus Sagas. Let me know if you’d be interested in that as well!

The code isn't everything

2017-02-25T00:00:00.000Z

Since I work from home, I make it a point to go out for lunch at least once per week. It helps to keep me from going a bit nuts staying in the house all day, especially during cold, snowy Minnesota winters.

This week I visited a local fast food joint that happens to have a new loyalty program via a mobile app. You visit eight times and you earn a free combo meal. To log a visit, you tap a button in the app, and it displays a QR code that they scan at the register, or if you forget, the app can scan a barcode on your receipt. I’m … not exactly a fan of QR codes, but this seems like a pretty good use for them. I had never seen their system in action so I decided to check it out.

So after placing my order, I ask the employee at the register if he can scan the QR code in the app, and he looked at me like I was born on Mars.

I looked around the restaurant. On every table, there’s a table tent advertising the new app. There’s a giant, backlit poster right next to the ordering counter advertising the new app. This is clearly a pretty big push for them.

But he had no idea what I was talking about.

So the poor employee at the register shouts to his manager, who shouts back to do it manually. I didn’t quite hear the exchange, but as I’m watching the screen, I see that he’s trying to give me a free meal. So I protested, as I wasn’t about to accept a free meal I hadn’t earned. I was also starting to grow a little embarrassed as it was lunchtime and there were hungry people waiting to order behind me. I asked him to reverse the discount, and I would just pay for the meal normally and scan the receipt barcode later.

Well that didn’t work either. Even though the barcode was printed on the receipt very clearly, the app refused to scan it. I had to enter the numeric code below the barcode manually.

This got me thinking about the things people – especially software developers – completely forget about in software projects.

It’s far too easy to fall into the trap of writing code for code’s sake, and forgetting that for the code to be successful in solving a business problem, so much more is usually needed. You need usability testing, good documentation, marketing, training, or a hundred other potential other things for software to really be successful.

In the case of my lunch visit, a little bit of employee training would have gone a long way. Hopefully this was just an isolated incident of a new employee just learning the ropes and some miscommunication. Otherwise, a lot of money spent on app development was a wasted investment.

So just remember, the code isn’t everything.

Inbox Few - How I wrangle my email

2017-01-31T00:00:00.000Z

I used to be a practitioner of Inbox Infinite. The incoming messages would just land at the top of my inbox, and as I scrolled down, the date of the email would slowly get closer and closer to the date when I originally opened my Gmail account.

Let’s just say I could scroll down a long damn way.

My primary method of determining if something was handled was whether or not it was read. If I needed to look at it later, mark it as unread. It worked out alright, as a software developer I didn’t get that much email to start with, and work generally happened only during work hours.

When I joined Particular Software it became quickly apparent that how I managed my email was going to have to change.

Particular Software is a 100% dispersed organization with no “home office.” We have people in North America, Europe, and Australia. The sun never goes down on us, and so throughout most of the week, during every hour of the day, someone somewhere is working.

We also do all our work in GitHub, from work on code in our public repositories, right down to discussing changes to our parental leave policy.

This results in lots of notification emails. The problem isn’t just dealing with them in my inbox. I also need to curtail the sheer amount of them that are archived so that I have a remote chance of finding anything else in my email.

So here’s the system I’ve come up with to wrangle my email. I’m not necessarily saying it will work for you, but I’ve found it works for me.

Basic setup and filters

I like to use a native Mail client, not the Gmail web interface, especially since I prefer to blend my personal and work email together into one experience. I actually really like Apple’s built-in Mail app when I’m on my MacBook Pro.

So the first thing I had to turn off is Gmail’s tabbed email system that separates Social/Promotions/Updates/Forums content into different tabs in the Gmail web interface, because it doesn’t do any good from a native client.

Instead, I set up filters to map this kind of stuff to labels/folders in Gmail. You can take advantage of Gmail’s intelligence to identify these messages and direct them where you want.

Using the searches category:social and category:promotions, you can create Gmail filters to move these to separate labels, skip the inbox, and never mark it as important. I direct these to labels called Notifications and Notify, and this cleans up an amazing amount of what would otherwise land in my Inbox.

I frequently skim these folders once per day, and mark them read en masse.

I also have another Gmail filter for advertising that somehow makes it through Gmail’s algorithms. The search is along the lines of from:(A OR B OR C OR ...) and I add to it as necessary. Honestly though, I can’t remember the last time I amended that filter. Gmail seems to be really good at that.

Lastly, I have a filter for from:notifications@github.com that applies a GitHub label, but here I do not skip the Inbox. More on this later.

Inbox Few

It didn’t take long after joining Particular to realize that using Read/Unread status on a message to keep track of what I needed to do with it would not work. I just get too many GitHub notifications.

Particular Software creates a messaging product called NServiceBus, so it made sense to treat my Inbox as a persistent queue. If it’s in my inbox, that means there’s something there for me to do. If there isn’t anything for me to do, it shouldn’t be in my inbox.

I tried Inbox Zero, but found it impossible to get to zero. I’m not sure if that would even be a good thing. If I didn’t have anything to do in my persistent queue, that might be bad for job security?

In any case, I’ve settled on Inbox Few. When I seriously triage my email, I try to tackle small, achievable tasks that I can quickly dispatch, in any order that happens to suit my fancy. Sometimes, this is just typing a quick response on a GitHub issue to voice my opinion. Other times it involves a little more work, but never an hour-long task. If something is too big of a time suck, I leave it alone. It will probably need its own period of undivided attention anyway, so I leave it until I can devote that time to it.

I keep going at this roughly until the scrollbar on my message list disappears and I can see all the remaining messages in one view. This is a highly subjective measure of course. I have my email client set up in the Folders On Left + Message List On Top + Message Preview On Bottom configuration, and I don’t maximize my email client to the whole screen. At this moment, I have 11 items in my Inbox and I’m feeling very good about that.

Once I get down to roughly this scale, what’s left can fit in my head and I can start to make better value judgments about what Big Ticket task is truly worth my time and then start on that. Usually I’ll also take a scan of the GitHub issues assigned to me and take that into consideration.

I also try, whenever possible, to eliminate items where I’m waiting for something. In most GitHub issues, I trust there will be some further activity from one of my teammates that will “wake me up” with a new notification. However, in some situations that isn’t likely to happen, so in those cases I’m quick to use Slack’s reminder feature to point me back to a specific issue URL at some time in the future.

I feel the biggest hole in this process is currently when I expect that someone else will reply to an issue, giving me my wake-up call, but it never happens. This is when I run the risk of forgetting about an issue for an extended period of time, especially if I’m not assigned to it. I’m currently not sure how to address that.

Cleaning up cruft

Sometimes I’ll want to find something useful in my email, and for that I have to rely upon search. It really doesn’t help if there are hundreds of GitHub notifications, ads, or other cruft gumming up everything, turning my signal-to-noise ratio to utter crap.

But on the other hand, I don’t want to get rid of these emails immediately, especially the GitHub notifications. After I archive a message, if someone later responds on that issue, it raises the entire thread back into my Inbox. This way I can see most of the context right there in my email client, and I don’t have to waste half of my day launching into a GitHub browser window only to find out which issue that was anyway.

To deal with this, I use a Google Apps Script to delete email conversations out of Gmail after a certain period.

These scripts are really easy to make. They are stored in Google Drive, so you just go to the link above and click the Start Scripting button. Save it in a Drive folder and you can set it up to run once per night without having to hassle with a crontab or anything.

Here’s my script that archives messages with the Notify and Advertising labels after 30 days, which is plenty long enough to retrieve a coupon the day my wife wants to go shopping. It also archives messages in the GitHub label (remember that from earlier?) after 180 days.

function archive(label, days, trash) {
  var query = 'label:' + label + ' is:read older_than:' + days + 'd';
  var emails = GmailApp.search(query, 0, 500);
  Logger.log('Folder "' + label + '": retrieved ' + parseInt(emails.length, 0) + ' emails to ' + (trash ? 'delete.' : 'archive.'));
  for(var i=0; i
    var batch = emails.slice(i, i + 100);
    Logger.log('...Processed ' + parseInt(batch.length, 0) + ' threads.');
    if(trash) {
      GmailApp.moveThreadsToTrash(batch);
    } else {
      GmailApp.moveThreadsToArchive(batch);
    }
  }
}

function main() {
  archive('notify', 30, true);
  archive('advertising', 30, true);
  archive('github', 180, true);
}

Through the clock-ish icon in the script editor, you can schedule this script to run the main function on a scheduled basis. Mine runs daily between midnight and 1am in order to make sure the batch size doesn’t exceed the 500 allowed by the script. (If you try to make that larger, the script can fail.) When I originally created it, I had to run it manually many many times to get through all the junk in my previous Inbox Infinite.

You may notice the script has an option for archiving messages that I’m not currently using. If you, like me, are currently at Inbox Infinite, you can apply a label to all the messages in your Inbox, and then use that label to archive messages until you get from Inbox Infinite down to at least Inbox Semi-Recent.

Summary

As I said at the outset, this may not work for everybody, but it seems to be working pretty well for me. I always have a persistent task list available to me, which I can manage from my laptop or from my phone in a similar fashion. It also allows me to do high-level triage of “Yep, got it” type things easily from my phone during downtime when I would be bored anyway.

Some have suggested I use Google Inbox instead of all this, especially because the “Snooze” feature might help with the problem of getting a message out of the way but coming back to it later, whether or not anyone has replied.

My problem with that is that I like email and how it’s set up, I just want it to work for me. I don’t see the need to totally reimagine it, nor do I want to give up my native client for a webapp, or have to try out a bunch of native Inbox clients like Boxy.

Basically, what I’ve got now seems to be working out fine. Why would I mess with that?

How to build Gmail's "Undo Send" feature

2015-06-30T00:00:00.000Z

The other day I read this article about how Gmail will finally let you “Undo Send” emails you wish you didn’t send.

Really Google? Really? What took you so long? I mean, I know you’ve been very busy shuttering Google Reader and all that, but offering the ability to undo sending an email within 30 seconds is actually pretty easy to build.

At least, it is with NServiceBus, and specifically, using an NServiceBus Saga. I’ll show you how.

“Undo Send” is really just a specific case of a much more general pattern I’ll call the buyer’s remorse pattern.

Buyer’s remorse pattern

In real life, we might get buyer’s remorse when we buy an expensive car and then realize just how long we’re going to have to make expensive payments on it. In software, we’re not talking about a purchase - we’re really referring to any action that cannot be easily undone. Sending emails fall squarely within this category of problems, along with charging credit cards, which can cause real buyer’s remorse.

Using the buyer’s remorse pattern simply means that instead of immediately sending the email, the software will wait a certain amount of time first, in case the user thinks better about what they just did and wants to back out.

The time delay is the tricky part of implementing buyer’s remorse, but with NServiceBus, it becomes easy.

Step-by-step

I won’t cover how to do buyer’s remorse in your UI. I would assume your user would click the Send button, and then you’d display an alert bar saying “Your message has been sent. (Click to undo)”. For the sake of this example, let’s assume this is a single-page application, and both the “Send” button and “Click to undo” would fire a request to a REST API that could send NServiceBus commands to a back-end service.

Message definitions

Here’s what the commands would look like:

public class SendEmail : ICommand{    public Guid MessageId { get; set; }    public EmailDetails MessageDetails { get; set; }}public class UndoSendEmail : ICommand{    public Guid MessageId { get; set; }}

The MessageId is just a Guid that serves as a unique identifier for the message being sent. That way when we “send” the message, and then subsequently “undo send”, we know we’re talking about the same message.

I’ll leave the EmailDetails class up to you. Actually, that’s the real hard part about sending an email. Classes like System.Net.MailMessage and System.Net.MailAddress are chock full of get-only properties and other gross stuff that don’t make them good candidates to include in messages, plus they probably contain way more information than you really need for your use case anyway. So just create your own containing only get/set properties for the details you need.

We’re also going to need a class to represent the timeout message. This is far from complex:

public class UndoSendTimeout { }

Basic saga structure

We start with the scaffolding of an NServiceBus Saga that handles these messages. An NServiceBus Saga is really just a collection of message handling methods that store some shared state in a database between messages.

public class UndoSendPolicy : Saga,
    IAmStartedByMessages,
    IAmStartedByMessages,
    IHandleTimeouts
{
    protected override void ConfigureHowToFindSaga(SagaPropertyMapper mapper)
    {
        mapper.ConfigureMapping(msg => msg.MessageId)
            .ToSaga(data => data.MessageId);
        mapper.ConfigureMapping(msg => msg.MessageId)
            .ToSaga(data => data.MessageId);
    }
    
    public void Handle(SendEmail message)
    {
        // TODO
    }

    public void Handle(UndoSendEmail message)
    {
        // TODO
    }
    
    public void Timeout(UndoSendTimeout state)
    {
        // TODO
    }

For the moment, let’s ignore UndoSendPolicyData, the class that represents the state stored in the database between messages.

You may find it odd that both the SendEmail command and the UndoSendEmail command are both implemented as IAmStartedByMessages rather than the alternative IHandleMessages, when clearly if the UndoSendEmail occurs, it will happen later in time. It’s important to remember that in an eventually consistent, asynchronous system, it’s possible that SendEmail could be delayed for some reason and UndoSendEmail might actually arrive first!

Because of this possibility, it’s best to use IHandleMessages only for messages sent by the Saga itself!

The last thing to look at is the ConfigureHowToFindSaga method, which teaches the persistence how to look for saga data for each incoming message. For both message types, this is saying “Find a property in the message called MessageId, and try to match that up to some saga data in the database with a matching MessageId.” If none is found, and the message is an IAmStartedByMessages then the Saga will create new data for us.

Now let’s get back to the UndoSendPolicyData class that will store our saga data for us while we’re waiting for the timeout period.

public class UndoSendPolicyData : ContainSagaData
{
    public Guid MessageId { get; set; }
    public EmailDetails MessageDetails { get; set; }
    public bool UndoSend { get; set; }
}

We store the MessageId, so that our saga-finding mapper can match it up later. We also store the message details received from the SendEmail command. Lastly, an UndoSend indicator lets us know if an UndoSendEmail command has been received.

Message handlers

Now let’s start to implement our message handlers, starting with the handler for the SendEmail command:

public void Handle(SendEmail message){    this.Data.MessageId = message.MessageId;    this.Data.MessageDetails = message.MessageDetails;    this.RequestTimeout(TimeSpan.FromSeconds(30));}

When the Saga receives its very first message, the saga data (in this.Data) will be uninitialized, so it’s very important to fill it with information from the incoming command. We don’t actually want to send the email yet, so we request a timeout from the Saga infrastructure so that we can get a UndoSendTimeout reminder in 30 seconds.

Next, let’s handle the UndoSendEmail command:

public void Handle(UndoSendEmail message){    this.Data.MessageId = message.MessageId;    this.Data.UndoSend = true;}

We don’t actually do much of anything here either. We still initialize the MessageId property, because remember that it’s possible for UndoSendEmail to arrive first, meaning the saga data could be uninitialized for this handler as well.

Lastly, we implement the handler for the UndoSendTimeout message:

public void Timeout(UndoSendTimeout state){    if (this.Data.UndoSend == false)    {        SendEmail(this.Data.MessageDetails);    }    this.MarkAsComplete();}

If we haven’t cancelled the send, then we call a method that will build the MailMessage from our message details and dispatch it to the SMTP server. Then, in either case, we call MarkAsComplete(), which will remove the saga data from the database. If the Saga did receive an UndoSendEmail command then the message just goes away, no harm, no foul.

So remember that the messages could arrive in any order. This means that one of the following will happen:

SendEmail arrives, and the user does not undo, so the email is sent 30 seconds later.
SendEmail arrives, and the UndoSendEmail command arrives a bit later, so when the timeout fires, the email is not sent.
SendEmail is delayed and UndoSendEmail arrives first. The saga data is created with UndoSend == true. When SendEmail arrives, it dutifully requests the timeout. 30 seconds later, the timeout is received, and because UndoSend is set, the email is not sent.
SendEmail arrives, and the user stares at their screen for 29 seconds, then finally sends UndoSendEmail too late. The timeout fires, sends the email, and removes the saga data. Finally, the UndoSendEmail arrives, and because it is also an IAmStartedBy message, it recreates the saga data! Oops!

Scenario #4 isn’t really what we had in mind, but it illustrates that we always need to consider what will happen if messages arrive out of their expected order. There are a lot of ways to handle this, including an additional timeout to delay cleaning up the saga for a much longer period (perhaps 24 hours), or a timestamp added to the UndoSendEmail command so that it could be effectively ignored if old enough. I’ll leave implementing one of those as an exercise for the reader.

Full code

Here is the full code for the UndoSendPolicy saga, for those who prefer to see everything at once:

public class UndoSendPolicy : Saga,
    IAmStartedByMessages,
    IAmStartedByMessages,
    IHandleTimeouts
{
    protected override void ConfigureHowToFindSaga(SagaPropertyMapper mapper)
    {
        mapper.ConfigureMapping(msg => msg.MessageId)
            .ToSaga(data => data.MessageId);
        mapper.ConfigureMapping(msg => msg.MessageId)
            .ToSaga(data => data.MessageId);
    }

    public void Handle(SendEmail message)
    {
        this.Data.MessageId = message.MessageId;
        this.Data.MessageDetails = message.MessageDetails;

        this.RequestTimeout(TimeSpan.FromSeconds(30));
    }

    public void Handle(UndoSendEmail message)
    {
        this.Data.MessageId = message.MessageId;
        this.Data.UndoSend = true;
    }

    public void Timeout(UndoSendTimeout state)
    {
        if (this.Data.UndoSend == false)
        {
            SendEmail(this.Data.MessageDetails);
        }
        this.MarkAsComplete();
    }

    private void SendEmail(EmailDetails email)
    {
        // Dispatch the message to the SMTP server
    }
}

public class UndoSendPolicyData : ContainSagaData
{
    public Guid MessageId { get; set; }
    public EmailDetails MessageDetails { get; set; }
    public bool UndoSend { get; set; }
}

public class SendEmail : ICommand
{
    public Guid MessageId { get; set; }
    public EmailDetails MessageDetails { get; set; }
}

public class UndoSendEmail : ICommand
{
    public Guid MessageId { get; set; }
}

public class EmailDetails
{
    // Up to you - create what you need for your use case
}

public class UndoSendTimeout { }

Summary

Buyer’s remorse is a pattern to deal with operations that otherwise cannot be undone by delaying them for a short time while waiting for cancellation. With it we can implement Google’s Undo Send feature, but there are many more uses for it.

Credit card charges are technically reversible, although doing so is messy, as it requires a charge reversal that would appear on the customer’s statement. It’s a lot cleaner to prevent the credit card from having ever been charged by introducing the time delay with the buyer’s remorse pattern.

In fact, with e-commerce use cases, the buyer’s remorse pattern can get a little more interesting. It should always be possible to cancel an order. Just after the order, we can use the buyer’s remorse pattern to prevent accidental orders. After the credit card is charged and before products are shipped, we should be able to cancel the order and refund the payment. Even after products are shipped, we should be able to cancel the order, and provide a refund (perhaps partial) provided that the items are returned.

All of these are great applications for sagas, which give you the ability to model business requirements together with the passage of time. Buyer’s remorse is just a start.

Goodbye WordPress, Hello Jekyll

2015-05-21T00:00:00.000Z

WordPress and I had an altercation the other day.

I logged in to the management interface for my blog, and it was begging me to update to WordPress 4.2.2, and given the number of security vulnerabilities that have been in the news lately, I figured that was probably a good idea.

But this time, the automated, one-click update process failed me for the first time. I don’t know precisely why, but it blew up mid-stream. Luckily the public portion of the site was still serving, but the admin was completely roasted. So I had to go through the painful process of doing a manual upgrade over FTP.

Luckily, I got everything working on Version 4.2.2, but I resolved at that point that 4.2.2 would be my last.

It’s not that I’m a huge WordPress hater. When it works it works well enough. But I absolutely disdain PHP, kind of like this guy, so I can’t really go hack on it very well because to do so would give me the itchies all over. The freedom I want to have with my blog makes WordPress.com hosting impossible, but I don’t want to go through the hassle of self-hosting either. Plus I really really want to the particular webhost I’m using because reasons, and moving my blog is a great first step.

I recently joined Particular Software and we do everything on GitHub. No really, I mean everything. (Well OK, GitHub and Slack.) And while I’m no slouch at HTML when I really want to write, nothing beats Markdown. So it makes sense to take advantage of that.

So, if you’re reading this post, my blog is now run by Jekyll on GitHub Pages. How does that work? Glad you asked.

Create a new GitHub repository called your-github-username.github.io - mine is davidboike.github.io.
Create a Jekyll repository. (This is the tricky part.)
Write your posts as Markdown files.
Push your changes to GitHub.
GitHub compiles your posts from your master branch into HTML pages serves it up as static content.

Rather than do a whole bunch of work on #2, I decided to stand on the shoulders of Phil Haack whose blog post on converting his own blog had originally informed me Jekyll, and ~~take inspiration from~~ outright steal his Jekyll repository as a starting point. Luckily, he’s OK with that. I did make some changes to make it my own.

The trickiest part turned out to be porting my content. There is a WordPress to Jekyll converter but it goes pretty crazy on you, and doesn’t convert the WordPress HTML to Markdown. So I had to do a lot of work on my own to convert the HTML to Markdown with Pandoc and then clean up a lot of the mess afterwards.

But it’s definitely worth it. Now I don’t have to worry if my blog is down. That’s GitHub’s problem. I can edit my posts with Markdown using the same GitHub workflow I use every day. And it means I can accept pull requests on my blog! So if I make a mistake, please speak up and correct me!

Hopefully this will make it even easier for me to blog in the future.

Joining Particular Software

2015-04-30T00:00:00.000Z

For the last few years, I have tried to arrange my career around the following two principles:

Surround myself with the smartest people I can find.
Be prepared to do the thing that scares me a little bit.

And it’s been working great. This led me to leave a full-time SaaS company to join ILM Professional Services a few years ago. As a software consultant, I was able to work on multiple projects for different companies, learn new technologies and different methodologies, and have a lot of fun doing it. But the best part about ILM is its people. Even though I would go out to work at client sites, ILM fosters a real sense of community and family, so I was never on an island.

I really can’t say enough good things about ILM. If you’re a software developer in the Minneapolis/St. Paul area, you love to code, and you have a thirst for knowledge, you should check them out. Tell them David sent you.

These principles of mine also led me to write my book, Learning NServiceBus, which is now in its second edition. Compared to the effort that goes into writing a book, you really don’t get paid that much in terms of raw dollars, but it has been amazing for my career growth. Pretty soon I was speaking at conferences and teaching the official NServiceBus training course.

And that has all led to this moment.

It’s sad to have to leave ILM, because they have become a family to me, but I am very excited to be joining Particular Software full-time on May 4. (That’s right, May the Fourth!) It’s pretty difficult to find someone smarter than Udi Dahan, and it will be a privilege to work for him. Add in all the other extremely smart and talented people in the company, and I just hope that I’ll be able to keep up.

When I first moved away from home to go to college, my father told me, “Well son, this is going to be quite the adventure.” Well, the adventure continues, and I’m excited to get started.

Learning NServiceBus Second Edition

2015-02-02T00:00:00.000Z

I’m excited to say that the second edition of my book, Learning NServiceBus, has now been published!

The second edition of the book includes the following improvements over the first edition:

Completely updated to cover NServiceBus 5.0
All-new chapter on the Service Platform (ServiceControl, ServiceInsight, ServicePulse, and ServiceMatrix)
More diagrams (these were unfortunately sparse in the first edition)
Coverage of V5-specific features (Pipeline, Outbox)
Revised and expanded…everything
All told, there are roughly 44 additional pages (over the first edition) of just raw new content.

And perhaps best of all, the new edition includes a foreward from Udi Dahan himself, which tells the story of how NServiceBus got its start in the first place, tracing the history from his early days as a programmer to the point where this book has been published in its second edition. It’s very humbling for me personally to have his endorsement on my work, and I am very thankful.

Also, as always, so many thanks to everyone at Particular Software who were very helpful during the development of the book, and to my tech reviewers Daniel Marbach, Hadi Eskandari, Roy Cornelissen, and Prashant Brall, who made sure that you have the best content in your hands possible.

The book is available for purchase right now from the publisher in physical and eBook forms, and will be available via other channels (Amazon, Barnes & Noble, Safari Books Online, etc.) shortly. I hope of course that you buy it, but more importantly, that you find it useful.

My Next Endeavor: Teaching

2014-12-16T00:00:00.000Z

Uncle Bob Martin is one of the true learned elders of our industry, one of those who signed the Agile Manifesto when I was still taking college courses. Recently, he wrote about (and has talked about) something that absolutely blew me away.

Uncle Bob correctly identifies that many in our industry are young (even too young) and that there is a relative lack of older (and one would hope, more experienced) software developers. Not because they are going away, but because of the exponential growth in the number of total software developers. Indeed, he estimates that the number of software developers doubles every five years.

This was not altogether unsurprising to me, until he pointed out that this means if the number of developers is doubling every five years, then at any given point in time half of all software developers on the planet have less than five years of experience.

Woah.

Half of ALL software developers are fairly junior developers without a lot of experience under their belt. So to really help advance the field of software development, we should be finding better ways to train those scores of junior developers joining our ranks each and every year.

Luckily, I’ve been given the opportunity to do something about that.

This January, I won’t be going to code up some website for a new client. As part of a partnership between ILM, The Learning House, the Software Craftsmanship Guild, and Concordia University, I will be serving as an Adjunct Professor at Concordia teaching the .NET track of the Coding Bootcamp, an intensive 12-week course designed to take a student with an aptitude for computers and turn them into a well-trained junior developer ready to write code in the real world.

I can’t speak for Computer Science curriculum around the country, but in my own college experience I felt that while I was taught the basics of a computer language (C++ at the time) I was not effectively prepared to be a software developer in the real world. Hopefully this has changed in the intervening years, but I found I had to learn a lot of those other skills on the job.

This is why I’m so excited about the Coding Bootcamp curriculum. (Look at the very bottom of the linked page.) It’s not just about learning C#. It doesn’t cover what was current five years ago. Students will be learning ASP.NET MVC 5, Web API, and SQL Server 2012. It’s not just Microsoft-centered; there’s some Dapper in there in addition to Entity Framework. Beyond just HTML and CSS, students will be introduced to jQuery and AngularJS. And they’ll learn about effective source control with Git–a skill I consider just as important as writing the code itself.

All told, what the students will learn has less in common with what I learned in college, and much more in common with the core areas on which we evaluate all potential ILM employees. Teaching students the exact skills they’ll actually need in a real-life job seems like such a crazy idea that it just might work.

I don’t remember a lot about my early days in elementary school, but I do remember that my fourth grade teacher Mrs. Dickerson gave me a book about programming in AppleSoft BASIC ¹ that I read cover to cover. (The book pictured may be that book or perhaps a later edition of it. I was unable to find it in my parents’ attic.) While other events certainly had an impact on my life’s direction, you could argue that she set me upon a path that led to my eventual career and where I am today. For that I am forever grateful to her, and I hope that I can do her proud.

I’m really looking forward to helping people the way Mrs. Dickerson helped me by sharing what I love to do. If the number of software developers really is doubling every five years, hopefully I can ensure that 10-15 of them at a time are at the very least well prepared to get started.

If you would like to attend the Coding Bootcamp you can apply at bootcamp.csp.edu.

Or, if you are looking for talented new developers, you can join the employer network.

In yet another example of Atwood’s Law, AppleSoft BASIC now runs in JavaScript.

Wrapping a jQuery plugin in an AngularJS directive (Screencast)

2014-06-16T00:00:00.000Z

Recently I saw Evil Trout’s screencast Wrapping a jQuery plugin in an Ember.js component, and thought that it would be really valuable to show the exact same plugin implemented instead as an AngularJS directive.

I want to be clear that I don’t intend to start some sort of Angular vs. Ember flame war. I happen to believe, as Ben Lesh concludes in his excellent 6-part blog series comparing the two frameworks, that Angular and Ember are two paths up the same mountain, and that together they are pushing the state of web development forward.

It’s clear that I need better audio equipment if I intend to keep doing screencasts, but I think it’s passable. I hope you enjoy it!

NServiceBus and the Mystery of IWantCustomInitialization

2014-05-27T00:00:00.000Z

Recently Romiko Derbynew was reading my book, Learning NServiceBus, and noticed a contradiction between the manuscript and the included source code:

In the Sample for DependecyInjection in the book, the code is:
public class ConfigureDependencyInjection : NServiceBus.IWantCustomInitialization
However in the book, is says
IWantCustomInitialization should be implemented only on the class that implements IConfigureThisEndpoint and allows you to perform customizations that are unique to that endpoint.

Of course I could wax philosophic about the tight deadline of book publishing, or how difficult it is to keep the sample code in sync with the manuscript, or how I probably wrote that part of the code and that part of the manuscript on different days on different pre-betas of NServiceBus 4.0, but what it comes down to at the end of the day is #FAIL!

So here’s the real scoop, or at least, updated information as I see it and would recommend now, circa NServiceBus 4.6.1.

The part of the book referenced comes from Chapter 5: Advanced Messaging, where I am discussing the various general extension points in the NServiceBus framework that you can hook into by implementing certain marker interfaces, and then at startup, NServiceBus finds all these classes via assembly scanning and executes them at the proper time.

The interfaces are nominally described in the order that they are executed, and so I described IWantCustomInitialization third, after IWantCustomLogging and IWantToRunBeforeConfiguration, and described it as shown above. (The quoted passage is the entire bullet point.)

Unfortunately, it isn’t that easy.

(The following bit of explanation references history that exists mainly in my mind and, therefore, may not be entirely accurate as I’m not willing to dig through years of Git history to prove it. I might get a detail or two wrong, but stay with me.)

IWantCustomInitialization and IWantCustomLogging are somewhat unique in the list because they have been around forever (I gauge “forever” as since I started with NServiceBus at Version 2.0) and in the meantime, all of the other interfaces were added on (at least as far as my memory serves) through the development of V3 and V4.

So in this before-time of long-long-ago, these two interfaces only worked when applied to the EndpointConfig (the class that implements IConfigureThisEndpoint) but the new ones can be on any class, and there can be multiple ones.

Except as it turns out, IWantCustomInitialization pulls double-duty. It will execute either on the EndpointConfig OR as a standalone class, but with one critical difference: Whether or not it exists on the EndpointConfig changes the order of execution with respect to the other extension point interfaces!

When implemented on a random class, an IWantCustomInitialization will run third, where I described in the book (after IWantToRunBeforeConfiguration, but before INeedInitialization) but if implemented on EndpointConfig, it will run second only to IWantCustomLogging, which always runs first because otherwise, you don’t have logging.

Confused yet? Here’s the definitive updated order:

IWantCustomLogging (only executes on EndpointConfig)
IWantCustomInitialization, implemented on EndpointConfig
IWantToRunBeforeConfiguration
IWantCustomInitialization, implemented on its own class
INeedInitialization
IWantToRunBeforeConfigurationIsFinalized
IWantToRunWhenConfigurationIsComplete *
IWantToRunWhenBusStartsAndStops
\ One could argue that IWantToRunWhenConfigurationIsComplete should not be listed as a “general extension point” because it alone is located in the NServiceBus.Config namespace, not in the root NServiceBus namespace with all the others. This may have been an oversight that the NServiceBus developers weren’t willing to break SemVer for (which would require bumping to 5.0) or may be intentional, but I personally see the value in having a near-the-end extension point with full access to the DI container.*

So what would I recommend about IWantCustomInitialization now?

I view the duality of “runs at different times depending upon which class it’s implemented on” to be dangerous and I would rather avoid that, especially since INeedInitialization provides basically the exact same behavior at the exact same time. So I would respect history (and the text in the book) and say that IWantCustomInitialization should only be used on the EndpointConfig for endpoint-specific behaviors, or in a BaseEndpointConfig class you inherit real EndpointConfigs from.

So that would make the code sample from the book wrong, even though it technically works. I would use INeedInitialization for that class instead.

By the way, this is a great time to mention a previously reported issue with the same section of the book. Even though the text says that “INeedInitialization is the best place to perform common customizations
such as setting a custom dependency injection container…” but as it turns out, the only place you can set up a custom DI container is from IWantCustomInitialization on the EndpointConfig.

And on a closing note, I would suggest that if you don’t get enough evidence that you are human and make mistakes by being a software developer, you should perhaps consider writing a book. ;-)

Failed Message Notification with ServiceControl

2014-05-07T00:00:00.000Z

In my post Distributed System Monitoring Done Right, I mentioned in passing how ServicePulse doesn’t ship with any built-in notification system for failed messages, but that you could easily build a system to send an email (or SMS, or carrier pigeon) to do so.

In this post I’ll show you how.

First, create a new Class Project called ErrorNotify, and turn it into an endpoint by including the NServiceBus.Host NuGet package.

Next, you need to reference the messages assembly that ServiceControl uses for its externally published events. It’s called ServiceControl.Contracts and you can find it in your ServiceControl installation directory. For me that’s located at:

C:\Program Files (x86)\Particular Software\ServiceControl\ServiceControl.Contracts.dll

Note that ServiceControl uses the JSON serializer internally, so if you subscribe to the failed message notifications, your endpoint will need to use the JSON serializer too. Even if you use a different serializer (like the default XML one) in the rest of your system, it doesn’t matter because this error notifier endpoint is completely separate and decoupled from the rest of your system.

To set the serializer to JSON, modify your EndpointConfig.cs given to you by NuGet so that it implements IWantCustomInitialization:

Next we need to write the actual code to subscribe to the MessageFailed event published by ServiceControl. I’m not going to show you how to build and send an email. That would be boring and silly and I’m sure you can do it yourself. But it is important to point out that you can extract the FailedMessageId from the failed message details and craft a URL using ServiceInsight’s URL scheme that will launch ServiceInsightand show you the offending message directly!

Lastly, we need to modify the App.config file to subscribe to messages from the Particular.ServiceControl service.

That’s it! Once we deploy this code, we will get email notifications of failures complete with links to ServiceInsight so we can go figure out exactly what went wrong.

RavenConf 2014 Slides and Video

2014-04-30T00:00:00.000Z

Here is the RavenConf 2014 video and slides for my presentation “Modeling Tricks My Relational Database Never Taught Me” in which I compare SQL Server to Big Brother from Apple’s 1984 ad and RavenDB to the Dude from The Big Lebowski.

Distributed System Monitoring Done Right

2014-04-24T00:00:00.000Z

When I first started writing Learning NServiceBus, I was targeting Version 4.0 which, at that time, was still several months away from release. Writing about something that’s still very much in flux is definitely a challenge, and to some extent I was definitely learning as I went.

What really struck me during the writing process was how much easier people learning NServiceBus 4.0 were going to have it than I did when I learned NServiceBus 2.0. The developers at Particular Software (a name change from NServiceBus Ltd – a lot of people seem to think they were bought and this is not the case) are really obsessive about making a powerful framework as easy to use as possible, and I salute them for that.

I remember creating endpoints by hand. Create a new Class Library project. Reference the NServiceBus DLLs and NServiceBus.Host.exe. Build so that the EXE is copied to the bin directory. Go to Project Properties. Set the debugger to run the Host. Create an EndpointConfig class. Add an App.config. Enter a bunch of required XML configuration. OK that’s a lie. As I was once quoted during a live coding demo, “Don’t worry I have been doing this for years. You never write this yourself; you always copy it from somewhere else.” Not exactly a glowing recommendation right?

Then you start debugging and hope you didn’t screw anything up.

NServiceBus 3.x and 4.x changed all that. Now you just reference a the NServiceBus.Host NuGet package and it sets all that stuff up for you. And if you need some bit of config, you can run a helpful PowerShell cmdlet from the Package Manager Console to generate it for you along with XML comments describing what every knob and lever does.

NServiceBus 4.x is a fantastic platform to build distributed systems, but as of the release of NServiceBus 4.0 in July 2013, the big thing still missing was the ability to effectively debug a messaging system (let’s face it, gargantuan log files don’t count) and monitor a distributed system in production to make sure everything isn’t running off the rails.

Well that’s all about to change.

Don’t Build Your Own Monitoring Tools

For the first system I ever built on NServiceBus 2.x, I built my own monitoring and management tools because I had no other choice. I didn’t want to remote desktop into a server and launch Computer Management to view the Message Queues. Let’s face it, that tool is heinous enough when run locally. And I certainly didn’t want to remote desktop into the server to run ReturnToSourceQueue.exe, and have to potentially copy and paste a message id into a console window over remote desktop. No thank you!

So I built a tool called MsmqRemote that had a daemon process that I installed on every single server that hosted any NServiceBus queues. It was responsible for interacting with MSMQ and NServiceBus on each server. It had the capability to list queues, and get details about the messages in each queue, and return all of this information to a client application via a WCF service hosted over TCP. It could move and delete messages, all based on MSMQ code I had to write myself. It contained a copy of the relevant ReturnToSourceQueue code so that it could do that operation as well.

The client application was a WinForms monstrosity with four panes. First you selected a server which was populated from a config file, that told the application which WCF service URL to try to connect to. Then it would ask the server for a list of queues, which would appear in the second pane. After selecting a queue, it would ask for a list of messages, which would appear in the third pane, and finally, selecting a message would again go to the server to ask for message details and contents, and the XML representation of the message would appear in the fourth and final pane.

The tool suffered from the same problem that plagues many internal tools. It wasn’t refined or nice or even very usable. It was always the minimum necessary to get the job done which meant that it was always pretty shitty. It didn’t always work quite right either, especially when a queue would fill up with a significant amount of messages, everything would slow to a crawl. And sometimes the daemon process would just completely crap out, because as you’re probably already aware WCF is such a joy to work with.(Sarcasm intentional.)

I don’t have any idea how many hours I ended up pouring into that tool, but what I do know for sure is that I wasn’t solving any business problems during that time. Meanwhile it was never the tool I wanted or really needed it to be, and addressing its shortcomings was always my lowest priority.

And MsmqRemote even begin to cover everything we needed to effectively monitor a production system. Endpoint health was a big concern. It wasn’t unheard of for an endpoint to appear to be healthy as far as the Process Manager was concerned, but for some reason to have stopped processing messages for whatever reason. I can think of one instance where in retrospect I’m sure my crappy code was to blame – a “command and control” sort of component implemented in an IWantToRunAtStartup that should have been a bunch of never-ending sagas instead. So my IT Manager would create a bunch of monitors in Microsoft SCOM (may have been MOM at the time) based on queue sizes and performance counters and all that sort of stuff. That was really his deal, not mine. But every once in awhile we’d forget to register a new endpoint when it got deployed for the first time, so then the first time it acted up or stalled we’d have to deal with problems like a few million messages backed up in a queue with no warning.

What a pain! If only there was a company out there that understood how distributed systems worked that could make tools to address these issues!

The Service Platform

The whole reason that NServiceBus Ltd. changed its name to Particular Software is that they were developing products to meet these needs, making NServiceBus itself only part of the story.

NServiceBus is now joined by a bunch of friends:

ServiceControl

ServiceControl is a specialized endpoint that lives on the same server as your centralized audit and error queues. It processes every message that arrives in the audit queue (in other words, a copy of every message that flows through your entire system) and stores the details in an embedded RavenDB database. It then discards those audit messages because otherwise you’d be running out of disk space in a hurry. It also reads the messages off the error queue and similarly stores these in Raven, but keeps these message around in a new queue called error.log because you’ll more than likely want to send those messages back to their source queues after you fix the underlying problem.

All this information stored in the embedded RavenDB database is made available via a REST API. (Suck it WCF.) With this API you can build your own reports and tools if you like, but this provides the foundation from which the other Service Platform tools are built.

ServiceInsight

ServiceInsight is a WPF application that makes my little MsmqRemote look like it was written by a 3rd grader, but it extends much deeper than just showing message details and retrying errors. Because it feeds off the ServiceControl API, which is processing the audit messages from ALL your endpoints, it shows a holistic view of your entire distributed system.

When NServiceBus sends or publishes a message in the scope of a message handler, enough headers are added that by the time it gets to ServiceInsight, complete conversations can be stitched together and represented as graphical flows, where sending commands are represented as solid lines and published events are represented as dashed lines.

Check out the flow diagram in this screenshot.

Notice how some of those messages have “policies” mentioned under the timestamp. Those are sagas, and show how the message flow integrates with sagas you write. This is because I’ve included the ServiceControl.Plugin.SagaAudit NuGet Package in my endpoints, which inserts itself into the pipeline to send saga auditing information to ServiceControl.

If you click on one of those, or on the Saga tab near the bottom, you’ll get this amazing visualization showing the saga’s state changes in vivid detail, like this screenshot zoomed to show only the saga flow:

This is pure awesome, and something you’ll only ever have time to build on your own if either 1) you work for Particular, or 2) you work for a company that somehow isn’t concerned with making money. You’re also not going to get this level of tooling from MassTransit. You do, after all, get what you pay for.

ServicePulse

Where ServiceInsight is the tool for a developer to debug a system, ServicePulse is the tool for my IT Manager and our other Ops friends to monitor our systems in production and make sure that everything is healthy.

All you need to do is deploy the ServiceControl.Plugin.Heartbeat NuGet package with your endpoint, and it will begin periodically sending heartbeat messages to ServiceControl. ServicePulse is a web application that will use this information, along with information about failed messages, and serve up a dashboard giving you near real-time updates on system health with all sorts of SignalR-powered goodness.

In addition, you can program your own custom checks to be tracked in ServicePulse. For instance, let’s say you needed to be sure a certain FTP server was up. You could program a custom check for that by including the ServiceControl.Plugin.CustomChecks NuGet package and creating a class that inherits PeriodicCheck.

This is what ServicePulse looks like moments after I stopped debugging in Visual Studio, causing the heartbeat messages to stop.

Yes, the Endpoints box bounces when there’s an issue. I guess it’s mad at me! I would show you more screenshots, but they’re full of a recent client’s name and I don’t love image editing that much, plus you should go try for yourself!

The one thing that is missing from ServicePulse, by necessity really, is a direct notification feature. You aren’t going to want your Ops people constantly staring at the ServicePulse website; you need some way for them to be notified when there’s an issue. Every company is going to want to do that differently, of course. Some will want a simple email notification, some will want an SMS, some will want integration with a HipChat bot, and of course some will want all of the above!

It’s convenient that ServiceControl is really just another endpoint. It has an events assembly ServiceControl.Contracts that contains events that you can subscribe to. Check out this sample MessageFailedHandler that shows how you could subscribe to the MessageFailed event and send a notification email.

In the future there will be additional tooling to connect ServicePulse with Microsoft SCOM and perhaps other monitoring suites as well.

ServiceMatrix

This article is mostly about system monitoring, and ServiceMatrix is really not a monitoring tool, but it deserves a mention because it is also a part of the new Service Platform suite of tools.

ServiceMatrix is a Visual Studio plugin that makes it possible to build an NServiceBus system with graphical design tools, dragging and dropping to send messages from one endpoint to another, and that sort of thing. It really deserves an article all to itself.

I’ve been doing NServiceBus the hard way for quite some time, so it’s hard for me to wrap my head around doing it graphically. But the hard truth is that the NServiceBus code-by-hand demo I frequently give that takes about an hour to create manually can be done in about 5 minutes with ServiceMatrix. Five Minutes. Udi himself has stated that now that he’s gotten used to ServiceMatrix, he can’t envision creating NServiceBus solutions any other way.

Aside from creation speed and baking in NServiceBus design best practices, ServiceMatrix contains two features I really feel are game-changers.

First, whenever you debug your solution with ServiceMatrix, it will generate a debug session id that is shared with all your endpoints, and reported to ServiceControl via the ServiceControl.Plugin.DebugSession NuGet Package. It will then navigate to a URL starting with the si:// scheme, which is registered to ServiceInsight, so ServiceInsight will open up and show you details for just the messages volleyed around during the current debug session. This means, in many cases, you won’t need to painstakingly arrange all of your endpoint console windows just right so that you can see what’s going on, you’ll just look at the results in ServiceInsight.

Second, when you create an MVC website with ServiceMatrix, it will auto-scaffold a UI to test sending messages with fields to enter message property values. What a big time saver over creating temporary controllers just to test things out, and having to interact with them only from the query string!

Conclusion

When I think about the old about the crummy tools I built in the past for NServiceBus monitoring in comparison to the new tools in the Service Platform, it reminds me of the difference between my garage and my grandfather’s woodshop. My garage contains a bunch of the basics. Sure I have a couple saws and screwdrivers and a hammer or two, but my grandfather has been retired for several years and in that time has been pursuing woodworking seriously as more than a hobby, so he’s got a dozen saws and the central vacuum system and all the little toys and jigs you need to really get some serious work done. Every time I need to use the table saw I have to back a car out and drag the saw out from the corner, but he doesn’t waste time with that because his whole workshop is set up and ready to go.

Just as I could accomplish so much more in my grandpa’s workshop than in my garage, I will be able to accomplish so much more with NServiceBus using the tools in the Service Platform. They’re exactly the tools I would have built myself (or better) if only I’d had the time.

But I didn’t have to.

David Boike's Blog

Using a SmartThings sensor in "garage door" mode with Home Assistant

The SmartThings sensor

zha_event

Measuring the door

What does “open” mean?

Open helper

A finishing touch

Wait, I live in Minnesota

Summary

Distance learning alerts with Google Calendar, Alexa, and Home Assistant

Google Calendar

Alexa

Home Assistant

HACS & Alexa

Automation

Caveats

Summary

Path.Combine() isn't as cross-platform as you think it is

Background

Problem 1: Multi-segment paths

Problem 2: Windows is too forgiving

Problem 3: Root paths

Summary

Overriding the NServiceBus ConversationId

What ConversationId is for

The problem

Solution

Summary

Creating a scheduler with NServiceBus

The saga

Clock drift

Reprogramming is limited

Auditing

Date and time is just hard

Scheduling alternatives

Summary

Creating a RavenDB cluster in Docker

Docker networking is fun

Docker compose

Setting up the cluster

Summary

Sure, you can just use RabbitMQ

Messaging

Transactions

Workflow

Documentation

Introduction to NServiceBus

What’s wrong with books?

The new tutorial

The code isn't everything

Inbox Few - How I wrangle my email

Basic setup and filters

Inbox Few

Cleaning up cruft

Summary

How to build Gmail's "Undo Send" feature

Buyer’s remorse pattern

Step-by-step

Message definitions

Basic saga structure

Message handlers

Full code

Summary

Goodbye WordPress, Hello Jekyll

Joining Particular Software

Learning NServiceBus Second Edition

My Next Endeavor: Teaching

Wrapping a jQuery plugin in an AngularJS directive (Screencast)

NServiceBus and the Mystery of IWantCustomInitialization

Failed Message Notification with ServiceControl

RavenConf 2014 Slides and Video

Distributed System Monitoring Done Right

Don’t Build Your Own Monitoring Tools

The Service Platform

ServiceControl

ServiceInsight

ServicePulse

ServiceMatrix

Conclusion

What `ConversationId` is for