COROS IIa: A Series of Tubes

March 9, 2022

When last I wrote about COROS I explored the EVQ component of it with a focus on the API and some of its underlying construction. In this post I will expand on that underlying construction giving reasons for some of the design decisions, as well as providing some example use cases for this.

The event queue itself.

Stripped down of runtime irrelevancies, this is what an event queue looks like:

typedef struct _evq_queue_t
{
    evq_queue_t down, up; // <1>

    evq_event_t head;     // <2>

    // ... elided content ...
} evq_queue_t_impl;

At <1> we establish where in the tree of queues this particular queue sits: down points to the child of this queue, up points to the parent. The parent value is only ever used when managing queue relationships; it is not used in navigating queue processing. The child value is used both in managing queue relationships as well as when processing queued events.

At <2> we point to the first event in this queue. Events are chained as a list, though each event also has a pointer to its parent.

Queue insertion and deletion

When evq_attach_queue(parent, child) is called, the meat of the operation is as follows:

child->down = parent->down;
parent->down = child;
child->up = parent;

if (child->down)
{
        child->down->up = child;
}

As can be seen, the list of children is a push-down stack, notionally, in that a new child is placed in front of previous children. This has implications for fancier uses of EVQ in priority management, but is simple enough to follow.

Calling evq_detach_queue(q) effectively reverses the above operations, using knowledge of parents and children to snap itself out of wherever the queue finds itself. Note that queues cannot be deleted, only removed from the tree. Since they occupy so little actual memory, the added complexity of code for dealing with deleted queues (including all of its attached events) is likely to cost more in code space than is saved in reclaiming such a paltry amount of memory.

Implications of this design

There are some key implications to this design. First, as will be seen when we discuss event processing below, the tree-of-lists structure of EVQ lends itself to establishing, at need, both a priority system and a means to circumvent the priority scheme. In addition, the ability to detach and attach queues at need allows entire sets of events to be paused in-flight. When the third component of COROS is discussed, the importance of this will be clearer, but in effect it permits a system to turn pieces of itself off and on at need. This is especially valuable if, for example, the system does power management and needs to be able to disable and enable component processing quickly.

Events

Stripped to focus on important constructs, an event looks like this:

typedef struct _evq_event_t
{
    tick_ms  trigger_time; // <1>
    delay_ms period;

    bool is_coroutine; // <2>
    void *data;
    union
    {
        evq_callback cb;
        co_t         co;
    };

    evq_event_t chain;     // <3>
    evq_queue_t container;
} evq_event_t_impl;

As shown at <1>, events have two time-related pieces of information: the time at which the event is triggered, and, for repeated events, the period of repetition. Note that although the delay and period are relative to the current time in the API, the trigger_time is absolute: it is calculated when the event is created. This is important for reasons which will be explained later.

<2> deals with how events are fired. There are two kinds of events: they can be simple callback functions or they can be coroutines. Each kind is activated differently, so a flag is provided to identify which kind and a union holds either a callback function or a coroutine handle. In either case the event data is stored as a void *.

<3> marks where the event management occurs. Each event contains a link to the next event in its chain (singly-linked list) as well as a pointer to its owning event queue. The latter is used for the deletion or rescheduling of events. The former plays a key role in how events are inserted and how they are selected for firing.

Event queue processing

Now the pieces are in place to understand both how event processing works and the implications of some of the design decisions.

Whenever evq_queue_now/in/every() or evq_resume_now/in/every() are called, behind the scenes it boils down to the manual void evq_post_event(evq_queue_t q, evq_event_t e, void *data); function. This function will insert event e into queue q associating the provided data with the event (data which will be passed into the callback or coroutine when called or resumed as appropriate). The container of e will be set to q for ease of navigation and queue management.

Event insertion

To insert the event, the logic is simple: 1. the chain is searched for the first event whose trigger_time is greater than e's trigger_time and is placed ahead of it in the chain; but 2. if no such event is found by the end of the chain, e is placed at the end of the chain.

The trigger_time is a calculated value based on a current tick counter plus the delay value of the event when created. This is done to prevent an event from starving the queue by being inserted at the head perpetually. By calculating the trigger_time based on an offset from current, and by placing the inserted event after all events with a lower or equal trigger_time, the system ensures that all events will eventually be fired as the tick counter inexorably counts upward. (Note that events can't starve their own queue. They can starve queues lower in the chain than them. Care needs to be taken still.)

Another, more subtle facet, of the design, however, requires understanding of how the assorted evq_process*() functions process their queue.

Event extraction

evq_process() and its siblings all take an evq_queue_t as an argument. While their processing is active (permanently, for a given duration, or for a single pass, depending on which function of the family is called), they follow this logic, given q as the passed-in evq_queue_t:

The chain of events in q->head is visited one by one.
Each event in that chain has its callback or coroutine called or resumed iff the stored trigger_time is less than or equal to the current timer tick value.
As soon as an event with a trigger_time that is greater than the current timer tick is encountered the chain is abandoned.
At this point q->down is processed recursively.

This is where the utility of having chained, tree-structured event queues shows itself. Usually the evq_process*() family will be invoked on a “master queue” that is system-wide, and thus all events of all components in the system will be processed in a strictly-defined order. Since, however, there's no requirement for the “master queue” to be the one processed at any given point, it is easily possible to limit event processing to a subset of the overall system at need. Indeed, since not every message queue has to even be attached to the overall event tree, it's possible to have critical services that are on their own, detached queue all serviced before systemwide services are processed.

Implications

EVQ is not only a good tool for decoupling subsystems and governing their interactions, it is a powerful tool for establishing priorities and processing critical code independently of the rest of the system. And there is one final feature that is hinted at in the design, but not stated openly.

Note that an event is associated with a void *data. The nature of void * means that any kind of data can be passed into an event handler.

Including an event queue. Or another event.

Using EVQ gives the ability to do Pi-calculus-like operations since the communications channel can accept a communications channel as an argument. Exactly how this would work will be explored in the next two articles in the COROS series, but the gist of it is that subsystems can flexibly address other subsystems without knowing they exist.

For a practical example, consider an embedded system that logs events. Where it logs will depend on which hardware modules are present and/or active. If, for example, the Ethernet hardware is present and it has an active connection, the logger can send things there. If it's not present, it may be able to instead log to a serial port. If that isn't available, maybe in a pinch it can record its logs to built-in or peripheral Flash.

The key is that the component that's sending out the log messages should not be required to know of all these. It should just know that when a loggable event happens in any subsystem, that subsystem knows to talk to it (via a message queue), and that it just has to add its decoration (severity, time stamp, etc.) to that information and push it out...

...to the message queue that it was passed in when initialized.

This keeps coupling in the overall system to a minimum. The other subsystems only have to know that a logger is available that accepts messages in a very limited number of formats. The logger only has to know that it is connected to something that accepts a particular set of messages. None of the subsystems need to know that there's possibly a dozen different places to log messages, and the logger doesn't need to know it either. Only the portion of the code that configures the relationships need care about how each piece is connected.

One could argue that this could be handled using a global variable containing a collection of function pointers, and indeed this would be the usual way to do this. (It could very well be the way that gets used in your system even if using COROS!) The strength, however, of doing it this way is that all configuration is kept in one place, and all involved components are decoupled. Log clients only have to know the logging message format (or API, as will be seen when the COMP component is discussed). Loggers will only have to know the logging message format. Configurators only need to know of the existence of logging backends and a single logger message format. And all the subsystems communicate in exactly the same way using exactly the same communications medium.

And, most importantly from my perspective as a builder and maintainer of dozens of embedded systems, the code for each component can be easily ripped out of an existing project and copied wholesale to another project, especially if the COMP component discipline is followed. (This is foreshadowing, a well-known sign of quality writing!)

Caveats

Events in EVQ are usually dynamically created and destroyed. This can be murder in embedded systems as it tends to fragment the heap leading to ridiculous situations where there's plenty of free SRAM, but not enough contiguous to allocate an event object. To mitigate this, an object pool allocator is used that has proved to be useful in real-world applications. In this case it employs the default doubling strategy and individual object allocation with a starting size of five events. All of these can be adjusted in a single line of code in evq.c to fine tune memory costs and fragmentation issues.

EVQ supports “classic” event style, in the form of callback events, but these should contain only very simple logic with little to no persistent state between calls. A perfect example of a use case for callback events, for example, would be something that just turns a LED on or off (or toggles it). If there is any significant logic, the backwards-inside-out nature of callback-based systems (“callback Hell”) suggests that using coroutine event handlers is the better approach.

In either case, disciplined use of event queues to decouple distinct subsystems that work together but should not necessarily know anything about each other beyond “existence” is the way to approach overall system design. Use of COROS' CO and EVQ components can lead to very robust overall systems. Which segues nicely into the final component of COROS.

Parts is parts

COROS' final component, subject of the next article, is the COMP component. (COMP being short for “component”. So the component component.) COROS is, at its heart, an opinionated system, albeit the opinions being weakly stated so that people with different opinions can use different parts of it in isolation. The final opinion of COROS (the first two being coroutines are superior to threading and loose coupling of subsystems is a must) is that object-orientation is king.

But not object-orientation the way that it is traditionally viewed. You will not see any classes and instances and inheritance. You will not see prototypes and cloning, even. COROS is a nod to the originator of OOP whose words people mouth without ever actually using. COROS components are Kay-style OOP: independent software entities that communicate via messages.

For a brief taste of what a COROS COMP is like, a stereotypical COMP will have:

A configuration function.
Its own private coroutine for processing messages.
Its own private message queue for accepting messages.
An API that it presents to the outside world to conceal precisely how the messages are implemented and handled.

The details of how this plays out will be the topic of the next article.