Engineering and software

I have an opinionated stance on software (to put it politely!). I think most software written is crap, and I think this is largely because programmers focus on the wrong aspects of software to the detriment of the whole. As an example of what I mean, I'm going to look at sorting algorithms.

There are a whole lot of them, and they all have O-notation properties, k-factors, and a bunch of other mathematical dross surrounding them that a particular kind of programmer just loves to go on and on about when selecting them. There's another particular kind of programmer that doesn't care what sort is being used as long as it's in the standard library so they don't have to think about icky things like “algorithms” and “choice”.

Both are bad engineers, in my books.

Introducing the gnome sort

Both groups stare at me like I'm growing a third eyeball in the middle of my head, Twilight Zone-style, when I tell them that the sort I most commonly use (to the point I usually didn't even bother keeping a version around and just coded it off the top of my head) is the infamous Gnome Sort (a.k.a. Stupid Sort). I'll drop the C version that I finally put into my formal toolkit here just as an example.

First the header:

/* vim: ft=c */
#ifndef GNOME_SORT_INCLUDED
#define GNOME_SORT_INCLUDED
/******************************************************************************
* (c)2020 Michael T. Richter
*
* This software is distributed under the terms of WTFPLv2.  The full terms and
* text of the license can be found at http://www.wtfpl.net/txt/copying
******************************************************************************/
/** @file
 *  @brief Implementation of the Gnome sort for integers and generic types.
 *
 *  Gnome sorts are grossly inefficient at scale, but if you have a small
 *  number of items to sort and need to do it in a small amount of code space
 *  (like would be the case in most bare-metal embedded systems), they are a
 *  god-send.
 *
 *  This library implements two kinds of Gnome sorts: specialized for the most
 *  common case (integers), and customizable by data type.
 */
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

/**** CONFIGURATION - user-adjustable parameters go here *********************/
/* Note: -DDEFAULT_GSORT_TYPE=<type> in the command line works too.          */
#ifdef DEFAULT_GSORT_TYPE
typedef DEFAULT_GSORT_TYPE gsort_t;
#else
typedef int32_t gsort_t;
#endif
/*****************************************************************************/

/*! Macro for conveniently counting the elements in a statically-allocated array. */
#define ELEM_COUNT(A) (sizeof(A) / sizeof(A[0]))

/*! Function pointer type for custom "greater than" comparisons of user types. */
typedef bool (*gsort_gt_t)(void *, void *);
/*! Function pointer type for custom value swaps of user types. */
typedef void (*gsort_swap_t)(void *, void *);

/*! Sort an array of integers (gsort_t) in-place with a gnome sort.
 *
 *  @param[in,out]  array      The array to sort.
 *  @param[in]      array_size The number of **elements** in the array.
 */
void gnome_sort_i(gsort_t array[], size_t array_size);

/*! Sort an array of arbitrary type in-place with a gnome sort.
 *
 *  @param[in,out]  array      The array to sort.
 *  @param[in]      array_size The number of **elements** in the array.
 *  @param[in]      gt_op      Function pointer: returns true if first operand is greater.
 *  @param[in]      swap_op    Function pointer: swaps its operands in place.
 */
void gnome_sort(void *array[], size_t array_size, gsort_gt_t gt_op, gsort_swap_t swap_op);

#endif //GNOME_SORT_INCLUDED

Now the implementation:

/* vim: ft=c */
/******************************************************************************
* (c)2020 Michael T. Richter
*
* This software is distributed under the terms of WTFPLv2.  The full terms and
* text of the license can be found at http://www.wtfpl.net/txt/copying
******************************************************************************/
#include "gnome_sort.h"

#include <stdbool.h>

#ifdef DEBUG
#include <stdio.h>
#define DEBUG_OUT(F, ...) fprintf(stderr, F, __VA_ARGS__)
#else
#define DEBUG_OUT(F, ...)
#endif

static inline void swap_i(gsort_t array[], size_t i, size_t j)
{
    gsort_t t = array[i];
    array[i] = array[j];
    array[j] = t;
}
static inline bool gt_i(gsort_t i, gsort_t j)
{
    return i > j;
}
void gnome_sort_i(gsort_t array[], size_t array_size)
{
    for (size_t i = 1, j = 2 ; i < array_size ; )
    {
        if (gt_i(array[i - 1], array[i]))
        {
            swap_i(array, i - 1, i);
            if (--i > 0)
            {
                continue;
            }
        }
        i = j++;
    }
}

void gnome_sort(void *array[], size_t array_size, gsort_gt_t gt, gsort_swap_t swap)
{
    for (size_t i = 1, j = 2 ; i < array_size ; )
    {
        if (gt(array[i - 1], array[i]))
        {
            swap(&array[i - 1], &array[i]);
            if (--i > 0)
            {
                continue;
            }
        }
        i = j++;
    }
}

Engineering vs. maths

In under 120 lines of code including all documentation comments, I have not one, but two optimized implementations of a gnome sort: a generic one that will sort any items (though without type safety and some inefficiency in function pointer overhead) and an integer-specific version that is type-safe and faster for the most common use case.

Why does this matter when it's O(n^2) average and worst-case performance? Shouldn't I favour an introsort with its O(n log n) performance?

Hell no! This is where engineering instead of pure theoretical maths shines.

On a 64-bit Intel machine using 32-bit integers for the sort the total compiled code for both functions put together is 214 bytes. On 32-bit or 8-bit microcontrollers the code is even smaller. Potentially much smaller, depending on the MCU in question and the ISA of its core.

The integer variant (the one I use most commonly) is only 80 bytes of text on a 64-bit Intel!

What's so important about size?

Since I do most of my work in embedded space, my data space will be limited, so n in the O-notation will also be limited. When I'm sorting typically two-digit n counts, the O-notation is meaningless. I have to get well into 5 digits before it starts to matter.

You know what I typically don't have in an MCU? 5 digits of available SRAM to sort things in…

So for small amounts of data, the O-notation is meaningless. There would be no noticeable impact on performance from iteration counts in that space. But … cache usage will have a huge impact, especially with the cache-starved cores I tend to work with. Introsort has impressive performance potential if you have loads of memory and infinite cache. On real machines, though, it can stall in the cache and start causing performance bottlenecks as the code for it gets moved in and out of cache lines. The algorithmic superiority can lose to the caching equivalent of the Schlemiel the Painter Algorithm. Your super-duper O(optimized) algorithm may choke your cache and bring your entire system to a screaming halt.

Whereas my little 80-byte (!) gnome sort fits into any cache I've ever seen in a core that has caches of any kind. Sure I may use, say, three or even ten times as many iterations as that super-duper introsort implementation, but I'll be running it (and my data) all in-cache with no stalls. I'll get done first.

So for small amounts of data, and especially on tightly constrained machines, a gnome sort isn't just “good enough” it's frequently superior to the “better” sorting algorithms.

And is that it? Just size?

No. There's another factor in software engineering. Your software should be as simple as possible and no simpler. Every time you use a complex building block in your code where a simpler would do, you are adding technical debt. Introsort is a good sort, don't get me wrong, but it's complex. It's easy to get wrong. If it suits your needs and you have a good, battle-tested implementation, go ahead and use it! Alternatively, if you have to use it in one part of your program, it's actually simpler to just use it in all parts instead of having multiple, competing sorts vying for your attention.

If, however, you're coding a quick sort for small amounts of data where, for whatever reason, you lack access to such a library, keep in mind that you can memorize the pseudo-code for a gnome sort in a few minutes. It's stupidly easy to understand and stupidly easy to code from that understanding in any imperative language. (It would be a bit harder to code in a language like Prolog or SML, and hard to justify when merge sorts are a thing.)

TL;DR LOL!

Software engineering isn't about universal “best” anythings. Not “best” programming language, not “best” algorithm, not even “best” operating system or target platform. Software engineering is the twin terror of choosing “best for purpose” in all of these as well as “making do” with the tools and techniques you have at hand. In the case of sort algorithms, that means you sometimes reach for something as seemingly idiotic as a gnome sort over the fancier ones with the “better”-looking numbers.

Bonus: Rexx!

Here's a gnome sort in Rexx:

/* ft=rexx */
/*******************************************************************************
* GSORT.rx
*
* Perform a gnome sort on a stem variable.
*
* Usage: call gsort <stem> [<field>]
*
* Parameters:
*   <stem>      - The name of the stem variable to be sorted in place.
*                 .0 must contain the count.
*   <field>     - The optional field (as per "words()") to sort on.
*
* Return:
*
* Exit codes:
*   0   Normal exit.
*   1   Error in external process.
*   2   Failure in system service.
*   4   Loss of significant digits in calculation.
*   5   I/O error.
*   6   Variable used before assignment.
*   7   Syntax error.
*--- user codes ---
*   -1  General error.
*******************************************************************************/

/***************************************
* VARIABLE INITIALIZATION
***************************************/
trace. = off

E_OK      = 0
E_GENERAL = -1
E_BADCALL = -2

/***************************************
* MAINLINE CODE
***************************************/
main:
  parse source . type .

    if type = 'COMMAND' then
    do
        say 'Cannot be invoked as a command.'
        exit E_BADCALL
    end

    use arg stem., field
    if \ var('field') then field = 0

    lc = 2
    do r = 3 while lc <= stem.0
        lp = lc - 1
        if field > 0 then
            do
                v1 = word(stem.lp, field)
                v2 = word(stem.lc, field)
            end
        else
            do
                v1 = stem.lp
                v2 = stem.lc
            end
        if v1 <= v2 then
            lc = r   -- lp and lc are in order, so move lc up to r
        else
            do
                -- swap lp and lc
                t = stem.lp
                stem.lp = stem.lc
                stem.lc = t
                -- move lc back by one
                lc = lc - 1
                if lc <= 1 then lc = r      -- lc has hit far left, so we're sorted to r
                           else r = r - 1   -- back the r pointer up a bit
            end
    end

    exit E_OK