qqmrichter

An embedded systems programmer living the Shenzhen I/O life in another city.

I carry around a little toolkit with me on a thumb drive. It contains useful libraries, little snippets of code, and a few of my own practice projects (the ones I use to keep myself in the game outside of my realm of employment). The gnome sorts I mentioned in an earlier blog entry came from my little toolkit, as do a few Lua extension libraries, a lot of small Rexx tools that I've had use for over the years, and so on.

One of the tools I carry around with me is a simple, architecture-quasi-independent coroutine system. Another is a simple, architecture-independent timed event queue system (that supports my coroutine system as well as callback-style events). On Friday, it turned out the hardware prototype I was supposed to be writing drivers for and testing was broken and could not be used. Faced with a bit of spare time I decided to throw them into our suite of libraries so I could finally program something with decent readability and reliability. Come Monday, my boss comes in and we talk about the hardware issue, and I say:

Yeah, I didn't lose much time. And there was a bonus. I wrote an RTOS for us.

I love watching my boss's face when I do things like that. He's an electrical engineer so someone writing an RTOS at all is, to him, miraculous. It's doubly-miraculous to have it done in a day. (I didn't tell him, naturally, that it was ¾ done years ago.)

I have an opinionated stance on software (to put it politely!). I think most software written is crap, and I think this is largely because programmers focus on the wrong aspects of software to the detriment of the whole. As an example of what I mean, I'm going to look at sorting algorithms.

There are a whole lot of them, and they all have O-notation properties, k-factors, and a bunch of other mathematical dross surrounding them that a particular kind of programmer just loves to go on and on about when selecting them. There's another particular kind of programmer that doesn't care what sort is being used as long as it's in the standard library so they don't have to think about icky things like “algorithms” and “choice”.

Both are bad engineers, in my books.

Introducing the gnome sort

Both groups stare at me like I'm growing a third eyeball in the middle of my head, Twilight Zone-style, when I tell them that the sort I most commonly use (to the point I usually didn't even bother keeping a version around and just coded it off the top of my head) is the infamous Gnome Sort (a.k.a. Stupid Sort). I'll drop the C version that I finally put into my formal toolkit here just as an example.

First the header:

/* vim: ft=c */
#ifndef GNOME_SORT_INCLUDED
#define GNOME_SORT_INCLUDED
/******************************************************************************
* (c)2020 Michael T. Richter
*
* This software is distributed under the terms of WTFPLv2.  The full terms and
* text of the license can be found at http://www.wtfpl.net/txt/copying
******************************************************************************/
/** @file
 *  @brief Implementation of the Gnome sort for integers and generic types.
 *
 *  Gnome sorts are grossly inefficient at scale, but if you have a small
 *  number of items to sort and need to do it in a small amount of code space
 *  (like would be the case in most bare-metal embedded systems), they are a
 *  god-send.
 *
 *  This library implements two kinds of Gnome sorts: specialized for the most
 *  common case (integers), and customizable by data type.
 */
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

/**** CONFIGURATION - user-adjustable parameters go here *********************/
/* Note: -DDEFAULT_GSORT_TYPE=<type> in the command line works too.          */
#ifdef DEFAULT_GSORT_TYPE
typedef DEFAULT_GSORT_TYPE gsort_t;
#else
typedef int32_t gsort_t;
#endif
/*****************************************************************************/

/*! Macro for conveniently counting the elements in a statically-allocated array. */
#define ELEM_COUNT(A) (sizeof(A) / sizeof(A[0]))

/*! Function pointer type for custom "greater than" comparisons of user types. */
typedef bool (*gsort_gt_t)(void *, void *);
/*! Function pointer type for custom value swaps of user types. */
typedef void (*gsort_swap_t)(void *, void *);

/*! Sort an array of integers (gsort_t) in-place with a gnome sort.
 *
 *  @param[in,out]  array      The array to sort.
 *  @param[in]      array_size The number of **elements** in the array.
 */
void gnome_sort_i(gsort_t array[], size_t array_size);

/*! Sort an array of arbitrary type in-place with a gnome sort.
 *
 *  @param[in,out]  array      The array to sort.
 *  @param[in]      array_size The number of **elements** in the array.
 *  @param[in]      gt_op      Function pointer: returns true if first operand is greater.
 *  @param[in]      swap_op    Function pointer: swaps its operands in place.
 */
void gnome_sort(void *array[], size_t array_size, gsort_gt_t gt_op, gsort_swap_t swap_op);

#endif //GNOME_SORT_INCLUDED

Now the implementation:

/* vim: ft=c */
/******************************************************************************
* (c)2020 Michael T. Richter
*
* This software is distributed under the terms of WTFPLv2.  The full terms and
* text of the license can be found at http://www.wtfpl.net/txt/copying
******************************************************************************/
#include "gnome_sort.h"

#include <stdbool.h>

#ifdef DEBUG
#include <stdio.h>
#define DEBUG_OUT(F, ...) fprintf(stderr, F, __VA_ARGS__)
#else
#define DEBUG_OUT(F, ...)
#endif

static inline void swap_i(gsort_t array[], size_t i, size_t j)
{
    gsort_t t = array[i];
    array[i] = array[j];
    array[j] = t;
}
static inline bool gt_i(gsort_t i, gsort_t j)
{
    return i > j;
}
void gnome_sort_i(gsort_t array[], size_t array_size)
{
    for (size_t i = 1, j = 2 ; i < array_size ; )
    {
        if (gt_i(array[i - 1], array[i]))
        {
            swap_i(array, i - 1, i);
            if (--i > 0)
            {
                continue;
            }
        }
        i = j++;
    }
}

void gnome_sort(void *array[], size_t array_size, gsort_gt_t gt, gsort_swap_t swap)
{
    for (size_t i = 1, j = 2 ; i < array_size ; )
    {
        if (gt(array[i - 1], array[i]))
        {
            swap(&array[i - 1], &array[i]);
            if (--i > 0)
            {
                continue;
            }
        }
        i = j++;
    }
}

Engineering vs. maths

In under 120 lines of code including all documentation comments, I have not one, but two optimized implementations of a gnome sort: a generic one that will sort any items (though without type safety and some inefficiency in function pointer overhead) and an integer-specific version that is type-safe and faster for the most common use case.

Why does this matter when it's O(n^2) average and worst-case performance? Shouldn't I favour an introsort with its O(n log n) performance?

Hell no! This is where engineering instead of pure theoretical maths shines.

On a 64-bit Intel machine using 32-bit integers for the sort the total compiled code for both functions put together is 214 bytes. On 32-bit or 8-bit microcontrollers the code is even smaller. Potentially much smaller, depending on the MCU in question and the ISA of its core.

The integer variant (the one I use most commonly) is only 80 bytes of text on a 64-bit Intel!

What's so important about size?

Since I do most of my work in embedded space, my data space will be limited, so n in the O-notation will also be limited. When I'm sorting typically two-digit n counts, the O-notation is meaningless. I have to get well into 5 digits before it starts to matter.

You know what I typically don't have in an MCU? 5 digits of available SRAM to sort things in…

So for small amounts of data, the O-notation is meaningless. There would be no noticeable impact on performance from iteration counts in that space. But … cache usage will have a huge impact, especially with the cache-starved cores I tend to work with. Introsort has impressive performance potential if you have loads of memory and infinite cache. On real machines, though, it can stall in the cache and start causing performance bottlenecks as the code for it gets moved in and out of cache lines. The algorithmic superiority can lose to the caching equivalent of the Schlemiel the Painter Algorithm. Your super-duper O(optimized) algorithm may choke your cache and bring your entire system to a screaming halt.

Whereas my little 80-byte (!) gnome sort fits into any cache I've ever seen in a core that has caches of any kind. Sure I may use, say, three or even ten times as many iterations as that super-duper introsort implementation, but I'll be running it (and my data) all in-cache with no stalls. I'll get done first.

So for small amounts of data, and especially on tightly constrained machines, a gnome sort isn't just “good enough” it's frequently superior to the “better” sorting algorithms.

And is that it? Just size?

No. There's another factor in software engineering. Your software should be as simple as possible and no simpler. Every time you use a complex building block in your code where a simpler would do, you are adding technical debt. Introsort is a good sort, don't get me wrong, but it's complex. It's easy to get wrong. If it suits your needs and you have a good, battle-tested implementation, go ahead and use it! Alternatively, if you have to use it in one part of your program, it's actually simpler to just use it in all parts instead of having multiple, competing sorts vying for your attention.

If, however, you're coding a quick sort for small amounts of data where, for whatever reason, you lack access to such a library, keep in mind that you can memorize the pseudo-code for a gnome sort in a few minutes. It's stupidly easy to understand and stupidly easy to code from that understanding in any imperative language. (It would be a bit harder to code in a language like Prolog or SML, and hard to justify when merge sorts are a thing.)

TL;DR LOL!

Software engineering isn't about universal “best” anythings. Not “best” programming language, not “best” algorithm, not even “best” operating system or target platform. Software engineering is the twin terror of choosing “best for purpose” in all of these as well as “making do” with the tools and techniques you have at hand. In the case of sort algorithms, that means you sometimes reach for something as seemingly idiotic as a gnome sort over the fancier ones with the “better”-looking numbers.

Bonus: Rexx!

Here's a gnome sort in Rexx:

/* ft=rexx */
/*******************************************************************************
* GSORT.rx
*
* Perform a gnome sort on a stem variable.
*
* Usage: call gsort <stem> [<field>]
*
* Parameters:
*   <stem>      - The name of the stem variable to be sorted in place.
*                 .0 must contain the count.
*   <field>     - The optional field (as per "words()") to sort on.
*
* Return:
*
* Exit codes:
*   0   Normal exit.
*   1   Error in external process.
*   2   Failure in system service.
*   4   Loss of significant digits in calculation.
*   5   I/O error.
*   6   Variable used before assignment.
*   7   Syntax error.
*--- user codes ---
*   -1  General error.
*******************************************************************************/

/***************************************
* VARIABLE INITIALIZATION
***************************************/
trace. = off

E_OK      = 0
E_GENERAL = -1
E_BADCALL = -2

/***************************************
* MAINLINE CODE
***************************************/
main:
  parse source . type .

    if type = 'COMMAND' then
    do
        say 'Cannot be invoked as a command.'
        exit E_BADCALL
    end

    use arg stem., field
    if \ var('field') then field = 0

    lc = 2
    do r = 3 while lc <= stem.0
        lp = lc - 1
        if field > 0 then
            do
                v1 = word(stem.lp, field)
                v2 = word(stem.lc, field)
            end
        else
            do
                v1 = stem.lp
                v2 = stem.lc
            end
        if v1 <= v2 then
            lc = r   -- lp and lc are in order, so move lc up to r
        else
            do
                -- swap lp and lc
                t = stem.lp
                stem.lp = stem.lc
                stem.lc = t
                -- move lc back by one
                lc = lc - 1
                if lc <= 1 then lc = r      -- lc has hit far left, so we're sorted to r
                           else r = r - 1   -- back the r pointer up a bit
            end
    end

    exit E_OK

There are more than a few conspiracy theories surrounding the origins of COVID-19. This is understandable. People have this innate (albeit irrational) desire to blame someone when something goes wrong. The notion that “shit happens” is not a popular one because it leaves people feeling like they're not in control.

Of course they're not, which makes it even worse.

Still, while most conspiracy theories are harmless, some are very harmful and the COVID-19 origin stories are some of the deadliest of the lot. For purposes of instruction, I'm going to pick two of the largest: one from right wing lunatics, and the other from left wing nuts. These are:

  1. COVID-19 leaked from the Wuhan Institute of Virology.

  2. COVID-19 was endemic in the world and the strain that broke out in Wuhan was just a deadly mutation of a disease that was already everywhere.

Wuhan Institute of Virology

The narrative of this story is that the virus leaked, somehow, through improper handling (or even malice, in extreme variants) from the Wuhan Institute of Virology to the nearby Huanan Seafood Fishery Wholesale Market. In this story some virus samples obtained from bats somehow leaked through irresponsibility and some poor saps in Huanan Market caught the disease and the rest is history.

Let's take a closer look at this – let's be generous and call it a “hypothesis”, shall we? It has a singular flaw: it's a load of fetid dingo's kidneys. Let's take a look at a map: Directions from the Wuhan Institute of Virology and the Huanan Seafood Fishery Wholesale Market

The Wuhan Institute of Virology is in Wuchang (and a medium stroll away from my home), one of the three cities that were merged to form Wuhan. (The other two are Hankou and Hanyang.) The Huanan Seafood Fishery Wholesale Market is across the Yangtze river in Hankou. It's almost 12km away as the crow flies, and it's a 24km drive through some of the densest traffic of Wuhan. This is not “nearby” by any meaningful sense. These two are separated about as far as are central Harlem and central Queens in New York City, with more natural barriers and fewer roads available between them.

But do you know what is nearby? Here's another map: Map showing markets near to the Wuhan Institute of Virology and on the same side of the Yangtze

Google Maps has a paucity of information about Chinese cities (for obvious reasons), so only the largest and/or most interesting markets are marked. They're out of date, with so many being flagged as closed, but all of those flagged markets—far closer than Huanan by far!—were open and bustling in December of 2019. So how, precisely, do you think that a disease spread from the Institute to Huanan without popping up in the literally dozens of markets, malls, shopping streets, bars, restaurants, and other such entities that service Wuhan University?

Oh, did I fail to mention that? The Wuhan Institute of Virology is in the middle of the single largest, most important university in Wuhan. Wuhan University is a powerful economic force and is ringed by markets, shops, restaurants, bars, and a whole host of other venues that are ripe for viral transmission.

Yet it supposedly popped up in practically another city. Hmmmm…

Let's just stop embarrassing the right wing lunatics with facts, shall we?

Already endemic

This one is even easier to debunk. Here's another map: Map of directions between the Huanan Seafood Fishery Wholesale Market and Wuhan Tianhe International Airport

Picture the narrative involved. In the absolute least unlikely scenario, someone with COVID-19 had to fly in to Tienhe (somehow not infecting the hundreds of people also in the plane) and then make a beeline for Huanan in some form of conveyance that wasn't near anybody else. Then they walked into the market, coughed, and unleashed the viral bomb that subsequently ravaged the world.

Does this strike anybody with a brain cell as being even remotely plausible? Really? How would you fly for hours in a tin can with recycled and then-largely-unfiltered air without infecting people in it first? Why would someone go straight from the airport to the market? Would they not go home first, or to a hotel? Maybe get dinner? There's literally dozens of trivial holes to poke in this theory even if you don't have the map above. With the map above it's laughable.

No, the left wing nuts need their safe space from the mocking laughter that they so richly deserve for foisting this on the public.

So where did it come from?

Because this is so badly politicised now, there's literally never going to be a full story known. What is known, however, is that in November there was a string of odd fevers leading to death in a straight line leading from Chongqing and environs to Wuhan in the month of November. Poor medical attention in rural China combined with unwieldy bureaucracy in the Chinese CDC led to the importance of this being missed. Quick cremation also led to lost tissue samples and subsequent opportunities.

The prevailing narrative that this provides is that hunters in Sichuan who hunted or trapped wild animals were directly or indirectly infected by the bats mentioned in the first conspiracy theory. They then carried this to Wuhan and its lucrative wild game market: the infamous “wet market” (termed “farmer's market” by non-racists) where the viral bomb blew up.

The rest is current events.

But …

My narrative is still speculation. It's just speculation that doesn't directly contradict trivially verifiable facts. As more facts come to light, if they ever do, the narrative will of necessity shift. That's the strength of not being a conspiracy theorist, though. I'm not invested in my story. I don't mind changing it in the face of fact.