Some Thoughts on Aligning Event Handlers with Domain Boundaries in Event-Driven Architecture

9 min readJun 7, 2020

Recently while refactoring the order system (of a B2B e-commerce platform) to an event-driven architecture (EDA), we ran into a situation where we needed to decide whether or not an event handler should be part of the Order service domain. Even though it did not cause us too much trouble to decide what to do (given the constraints we were facing), I found it a really interesting problem that would arise frequently and was closely related to the DDD and microservices concepts.

In this essay I will try to summarise the thought processes we came through when deciding the alignment of the event handlers to domains, and hope to demonstrate how the DDD and/or microservices guidelines can be applied in this process.

Note that this essay is not about how to map a bounded context (domain) to microservices; There are plenty of excellent writings on that topic, e.g. here, here, and here.

Background

I’ve been reading about Domain-Drive Design (DDD), on and off. Strictly speaking I’m not a DDD practitioner, but I do find some of the DDD concepts very relevant and useful when designing a microservice architecture, especially the following ones:

High cohesion within a service and loose coupling between services
Bounded context is a domain model that is often realised by one or more microservices
An aggregate is a good candidate for a microservice

However, for any specific system we still need to decide what is the exact boundary of a bounded context (aka domain). Only with well defined boundaries, the services can achieve high cohesion and loose coupling.

Note that I will use the terms event handler and worker almost interchangeably. However, when there is a need to distinguish, I use handler as a more generic term as it can be implemented as an independent process or as function in a process, whereas I refer to a worker as an independent process.

The Problem

The system in question is a B2B e-commerce platform that involves multiple services. There is also a separate ERP system that requires orders be synchronised from the e-commerce platform in real time. For the purpose of illustration, suffice to consider three services in the e-commerce platform, namely the Order, Inventory and Notification services. A conceptual overview of these services and their interactions is shown in the following diagram:

Conceptual overview of the services involved and their interactions

The legacy system was all RPC based, and the motivations to move to an event-drive architecture include (1) better performance and (2) more importantly better handling of the order lifecycle when failures occur. Long story short, the new EDA design is shown below.

Event-driven architecture (simplified for clarity), domain boundary not considered

Specifically, the handling of the event_reserve_stock depends on the Inventory service (or we may call it a command). The handling of the event_order_created is open-ended, because the Order service does not know exactly what services are interested in this event and new handlers may be added in the future. In our case, the Notification service and the ERP system are interested in the event_order_created.

Note that the event handlers in grey colour are at the centre of this discussion: Which domain do they belong to? The decision has practical implications, and we’ll see why next.

Implications of Aligning the Event Handlers with Domains

Before we talk about possible assignments, it is important to understand the decision’s implications, both technically and team-wise.

Resource allocation: Assigning a handler to one service domain means the team responsible for that service will implement the handler, hence it has an impact on resource allocation.
Domain expertise and efficiency: Centralizing the handlers in one domain means the pieces of event handling logic are grouped closer to one another, and it’s relatively easier for the team (responsible for the domain) to build up domain knowledge. It also means that the team can work more efficiently (less communication overhead, familiar codebase, etc.).
Intrusion to existing services: If a service already exists with an established API, putting the handler into the service domain means we need to retrofit the message subscription mechanism into the service.
Technology choice: Each domain may be implemented with a different technology stack, which in turn affects how we implement the handler. This is obvious as that’s what the microservices architecture promises to enable in the first place.

What Options Do We Have?

Okay, enough background info. In our case, we identified three options to assign each handler, and let’s call them external handler, internal handler and integration layer handler, respectively. As an example, the different options for the event_reserve_stock handler is illustrated below:

Options of assigning handlers. Dotted arrows indicate messaging, and solid arrows indicate RPC

Now let’s look at each of the options in a bit more detail.

External Handler

In this design, the handlers are implemented within the depended services, except for those in their own layer (i.e. the integration layer, more on this in a bit). The system diagram would look like this:

A few salient points can be observed from this diagram:

The message queue (MQ) is either a common component that sits between the Order and other services, or owned by the Order service but exposed to external event subscribers.
The handler for the event_reserve_stock becomes part of the Inventory service.
The Notification service implements one handler for event_order_created.
Another handler for event_order_created stands in the integration layer, dealing with the ERP system.

In practice, if the depended services are already developed, we would need to consider if it’s easy or even possible to retrofit the handlers. When it’s possible to retrofit, we will need to update a few services at the same time and the communication and coordination effort is a function of the number of teams involved.

Internal Handler

Another alternative is to have the handlers in the Order service domain, and the handlers will invoke the Inventory and Notification services’ API to get their job done, respectively, as illustrated below.

A few important points to highlight:

The MQ is now owned by the Order service for the most part, but it needs to allow access by the worker (i.e. handler) in the integration layer.
The handlers can be implemented as standalone processes (may be grouped) independent of the core Order service, hence they are labelled workers. These workers actually form an anti-corruption layer (see DDD), as they will shield the core Order service from any changes in the depended services (Inventory and Notification services in this case).

Using internal handlers allows more flexibility in the message payload. If the message payload contains the order_id, the handler would need to query the Order database to retrieve the order details before calling the Inventory service. This is only possible if the handler belongs to the Order domain. In contrast, if the handler belongs to the Inventory service, i.e. external handlers, we would need to provide the necessary order details in the message payload so that the handler does not need to invoke the Order service API to fetch those details (which leads a circular dependency).

Integration Layer Worker

As the e-commerce and ERP are two separate systems, the order synchronisation should be treated as an integration concern. We cannot change the ERP as it is a purchased solution, but the Order domain relies on the ERP system to provide order fulfilment data (e.g. order shipping status). Typically, we create a worker to synchronise orders data from the e-commerce platform to the ERP and vice versa.

Apparently, the workers in the integration layer have to know both systems. Having an integration layer helps keep the Order domain (1) focused on its core responsibilities (e.g. managing the order life cycle) and (2) isolated from the specific details of the ERP system (especially if we need to replace the ERP system).

Making a Decision

We considered the following factors when deciding which option to go with.

For request-response style inter-service communications, prefer synchronous RPC as it is easier to understand and reason about the system behaviour. On the other hand, asynchronous messaging should be contained within a single domain. This can ensure that a single domain is easier to understand and debug. In addition, synchronous API helps keep cross-team communication to a minimum.
Isolate changes. When the depended service changes its API, there will be a ripple effect on the dependent service. Having the event handler inside the Order service means the handler can act as an anti-corruption layer that shields the changes in the depended services (e.g. Inventory service). This is actually a very nice property as it helps keep the core Order service stable.
If an event handler carries out a non-essential task to the core domain, the worker is better kept outside the core domain. Doing so adheres better to the single responsibility principle and also increases the cohesion of the core domain. What is a non-essential handler? It usually means what the handler does does not affect the core domain. For example, email notification for order status change is purely for-information-only and does not affect the order lifecycle, therefore the corresponding handler is non-essential. Whether the worker should go to the integration layer or become part of the other domain, it is a case by case decision.
Try to avoid the integration layer when possible, as it can quickly become a dumping ground for event handlers. Maintenance may eventually become a headache.
Keep the teams aligned with their domains (bounded contexts), so that it’s easier to build up domain expertise.

The Verdict

With the above considerations and taking into account our team’s realities, we decided to go with the internal handlers for the most part, and move the worker in the (conceptual) integration layer into the Order domain. There are a few reasons for this decision:

The other services are already developed and we want to minimise retrofitting effort.
It keeps the changes within one domain, so it’s more efficient in terms of testing, debugging and communication. Also good for domain knowledge.
It allows for more flexible message payload, as we may not be able to foresee future changes. Worst comes to worst, the internal handlers can query the database to get the data needed (but missing in the payload).
In the long term, the workers will function as an anti-corruption layer so that the core Order service will be more stable.

Wrap Up

The above is the thought processes we had gone through while refactoring the B2B e-commerce system to an event-driven architecture. It certainly has specifics that only apply to our scenario, however, I think there are some principles that can guide us when it comes to determine which domain an event handler should belong to.

Minimise the cross-domain dependencies in the form of asynchronous messaging (very subjective, just my preference).
If the result of an API call or processing an event is not essential to the domain’s core logic, it can be done through asynchronous message in a fire-and-forget manner and the event handler should not be part of the domain.
If the core domain logic needs to handle failure scenarios and/or has scalability requirements, it can be implemented using asynchronous messaging. But I think it’s better to contain the asynchronous messaging within the domain so that it’s easier for the team to understand the system behaviour.
Keep an eye on the domain and team alignment.

PS: This is a tough topic to write, and I was struggling to decide what details to include and what to leave out so that there is enough context and yet the main points are still clear. Hope I’ve achieved my goal. Please leave a message if you have any comment, thanks!

References

Scott Millett, Nick Tune. Patterns, Principles, and Practices of Domain-Driven Design

Follow these practical principles to get well-designed microservices boundaries

by Jake Lumetta Follow these practical principles to get well-designed microservices boundaries How to avoid making…

www.freecodecamp.org

From monolith to distributed monolith

What anti-patterns are needed to get from a monolith to a distributed monolith

medium.com

You're Not Actually Building Microservices - Simple Thread

I recently read a post called The False Dichotomy of Monoliths and Microservices by Jimmy Bogard, which I absolutely…

www.simplethread.com

The False Dichotomy of Monoliths and Microservices

When learning about microservices, you're nearly always introduced to the concept of a monolith. If you're not doing…

jimmybogard.com

Microservices and The Bounded Context: Part 1

tl;dr The mapping between a bounded context and a microservice isn’t always one to one, it may be one to many. Which…