qqmrichter

Languages for software engineering

December 29, 2021

I make no secret of my love for weird programming languages. Whether spending time on Rosetta Code adding bizarre sorting algorithms in bizarre programming languages, or writing tutorials for how to use modules in a Prolog dialect, or helping a friend build a new programming language from scratch, programming languages are the thing that have fascinated me from my early days with Commodore and TRS-80 BASIC, through assorted assemblers, and then an explosion of other (often very obscure) languages over the decades. The very idea that we could use something that look, at least at a trivial glance, like actual words to tell computers what to do is just a deep joy in my life of programming.

But...

I generally hate software and the process of creating it. This is largely because most people who make software are either:

unable to make good, reliable software;
unwilling to make good, reliable software; and/or
forced to make shoddy, unreliable software by outside forces.

One of the major problems, I feel, in the making of software, is a very bad choice in programming languages. Most especially the choice to use C or one of its descendent languages. Thus I'd like to first explain why I think C is a bad base language and why I view it as a blight on computing, and why even the descendants that improve on it miss the mark. After that I'll introduce some languages I think do the job properly.

The case against C

C was a powerhouse of a language for its original, intended target. That original intended target was for porting the then-nascent Unix operating system from a DEC PDP-7 to a DEC PDP-11.

For reference, the PDP-7 had 4K (18-bit) words standard memory and could be expanded (at tremendous cost) to 64K. The PDP-11 starts at 8KB (bytes) memory and over the years could go as high as 4MB. If you have a USB charger it has more computing horsepower in it than the PDP-7. If you have a USB 2.0 hub it has about as much computing horsepower (possibly more) as a higher-end PDP-11.

This was the environment in which C was created. The compiler had to be small and simple because the machines were cramped. This was OK, however, because the amount of complexity you could have in your programs was similarly limited. For writing small kernels and simple utilities—which all original Unix utilities were—C was acceptable. Not great. Acceptable. It's when you start programming at scale that the C 'philosophy' (read: 'attitude') fails.

So where does C fail us?

Modularity

The problem is that C doesn't scale. At all. The larger your program gets, the more obvious this becomes. You start relying on ad hoc rules of thumb and weird boilerplate to get around its limitations. A prime example of this is the eternal …

#ifndef FAUX_MODULE_HEADER_INCLUDED
#define FAUX_MODULE_HEADER_INCLUDED

.
.
.

#endif /*!FAUX_MODULE_HEADER_INCLUDED*/

… you will find in every piece of C code of substance in some form or another.

This is C's answer to modularity. It's three lines of code added to every header file, practically, that exists only because C doesn't do actual modularity. The closest thing C gets to a declaration/implementation divide is a header file (.h) and a corresponding source file (.c). And that divide is purely one of convention. Convention that starts getting messy very quickly.

But it goes beyond weird-looking boilerplate. Modularity in C simply doesn't exist unless you force it to by convention. Convention that's actually hard to pull off in many cases.

In theory, for example, you're going to want to have only public API declarations in your .h files and only private API declarations and implementations in your .c files. In practice, however, there's nothing enforcing this. Header files are brought in through blind textual substitution via the #include directive to the pre-processor. The compiler does not even see header files. It sees the expanded text that the pre-processor provides.

Inevitably people start putting things into headers that are actually code, not just declarations. Sometimes they're even almost forced into it. Any time, for example, that you work with inline functions (or their elder macro predecessors), it's sufficiently painful that a lot of the time (all of the time with macros) it's just easier to put the implementation in a header file.

And now you've polluted the declaration/implementation divide.

Modularity and encapsulation is the foundation of sane programming. While C doesn't directly prevent modularity, it does little to support it either. And this continues through many of its children. C++ isn't much better (and indeed because of how clumsy the class construct is, it often has even more code than C in its headers). Objective-C (and later Swift), along with D, share much of this flawed approach. (Other C-descended languages like Java and C# make attempts to fix this, but are constrained by what C programmers are used to. This makes them better but not great.)

Type safety

C has one data type: integers. There are some mild differences in interpretation of them: sometimes by width (though it took an astonishingly long time to even have standard ways of specifying type widths!), sometimes by sign interpretation, and sometimes by interpretation as a pointer.

On top of that one data type it has ways of grouping integers (of various sizes) into a namespace (i.e. struct and union declarations).

And finally it has a few bits of syntax sugar around pointer-flavoured integers to make them act somewhat like strings (without being actual strings) or to make them act somewhat like arrays (without being actual arrays).

Frustratingly, the one tool that C always had to provide integers of varying widths—bitfields—is so lackadaisically defined that they are in effect utterly useless. Which means that even though bit fields sound like they would be a perfect way to specify, well, bit fields in hardware registers, in practice they cannot be used reliably for that so instead ugly macros or enums are used instead, involving manual (and thus error-prone) shifting and masking procedures to manipulate them.

Now there is some attempt to prevent conversion willy-nilly from one flavour of integer to another, but it's trivially overridden and indeed, because of C's limitations, it's often necessary to override this: lacking any form of genericity pretty much forces casting as a regular tool, for example. The result is that it's easy for an invalid use of casting to slip past and into production code.

Arrays and strings

And then there's the problem of the faux-arrays and faux-strings. I'll use array sizes as an example of what's wrong with C's approach to types in general.

int array1[5] = { 1, 2, 3, 4, 5 };
int *array2 = { 1, 2, 3, 4, 5 };

/* stuff goes here */

void func1(int input[])
{
    for (int i = 0; i <= 5; i++) { printf("%d ", input[i]); }
    printf("\n");
}

What happens if you call func1(array1)? What happens with func1(array2)? What happens is that both do exactly the same thing because array1 and array2 are the same thing internally: pointer-flavoured integers that point to the head of a block of memory.

Oh, and both will have unexpected outputs because we walked one step past the size of the “array”. If the test had been i < 1000000 the function would have merrily continued printing 999995 invalid values because C's faux-arrays have no intrinsic size. The int input[] parameter is fooling you. It's really a const int *input. (Note: const in C isn't. Or might be. Sometimes. It depends on your hardware. No, really!)

There is no way for a function that receives an “array” in C to know the size of the array. What we really needed was this:

void func2(const int *input, int input_size)
{
    for (int i = 0; i < input_size; i++) { printf("%d ", input[i]); }
    printf("\n");
}

And here we face another problem: getting the size of the faux-array. The wrong way for array1 is func2(array1, sizeof(array1)); Because the sizeof operator returns the number of bytes array1 occupies, but int isn't a byte. So instead we need this mouthful: func2(array1, sizeof(array1) / sizeof(array1[0]));

All this ugly crud needed because C doesn't have actual arrays!

And it gets worse for array2. func2(array2, sizeof(array2)); won't do what you want (and its problem will be the opposite of the array1 failure). But there's no quick, easy programmatic way to calculate the number of elements as with array1. Indeed short of just magically knowing the size and manually inserting it into the call to func2(), it's not possible.

So it gets even uglier, with spooky-action-at-a-distance rearing its ugly head to use the second faux-array.

Strings are no better. (I leave it as an exercise for the student to explain why “an array of bytes terminated by a NUL character” is a maladaptive solution to string representation. There is more than one reason. Enjoy digging them all up!)

And a cast of thousands

These flaws in C are just a taste of C's failures as a language for serious software engineering. There are entire books that could be written that go into deep detail on its flaws (like its lack of memory safety, its utter mismatch to real-world hardware, its perennial security flaws, …). And really, it's not the main point of this essay. The main point is languages that are good for software engineering. It was just necessary to point out why C (and most of its offshoots) are a bad choice.

The imperative world: Ada

If I had absolute control over a low-level project from beginning to end, I would choose Ada as the implementation language. Oft-derided as “too complicated” when it was first introduced (1983 was the first published standard), it has aged well against its competitors. Yes, Ada-83 was more complicated than the C compilers of the time, but … it did more for software engineering.

First and foremost, Ada built on the legacy of Niklaus Wirth's languages in the Modula family to provide an actual module system instead of ad-hoc conventions around textual substitution. The “specification” file of a “package” (roughly equivalent to a header file) is rigorously limited only to declarations, and even determines which features are “visible” outside of the package and which aren't. Only type and function/procedure (these are different in Ada) declarations can go into a specification. Attempts to put in anything else will result in error. The “body” file of a package contains private declarations and definitions of all functions, public or otherwise. As a result there is no choice but to use only the public API. No 'clever' way for a package user to circumvent the public-facing API to use implementation details, freeing that package to change implementations without changing the front-end.

Oddly, this constrained approach to programming turns out to be very liberating.

Type safety too is vastly superior in Ada. Integer types are constrained by value ranges, representation sizes, and other such restrictions. If a given integer must fall in the range of -2 to +5672, it can be constrained to those values and those values only. Enumerations are their own type and not just a fancy way of making names for integer values. Subsets can be defined at need (of both enumerations and integers). Pointers have their own semantics (and it is literally a syntax error to “leak” dynamically-allocated memory outside of a stack frame, unlike C). Arrays and strings carry with them the information needed to use them: sizes, for example. It also has genericity mechanisms so that casting (although available for the rare times it's needed) is not needed and type safety is maintained.

Is Ada perfect? Naturally not. It is a human product and, even worse, a product of bureaucracy (the source of much of its initially-derided “complexity”). It cannot be perfect. But it is certainly a language that supports actual software engineering instead of mere coding. (Note: supports, not permits. There are no languages which don't permit software engineering.) And, in addition, over time, it's shown that its initial “complexity” was well-thought out. Ada-83 was significantly more complicated than the later C89 was. But it did more out of the box. It covered more ground. And over the years C's standard has expanded and expanded, making C an ever-more-complicated language without actually adding a whole lot of new capabilities to it. The modern C17 standard is only a little bit smaller than the modern Ada 2012 standard. Ada, from 1983 to 2012 added object orientation and contracts, primarily, making very few changes to the language to do so. C from 1983 to 2017 made a lot more (sometimes-breaking) changes (like function prototypes, void pointers, inline functions, variadic macros, threading primitives, etc.) that still amount to being less capable than Ada … while having a standard that is roughly the same size.

“Modern” C is better, yes, than pre-ANSI C, but it's still a mess (and most compilers don't really support all of the standard anyway—even C11 support isn't guaranteed across all live compilers!). In all respects, Ada is the superior choice for engineering solid software. Similar criticisms can be levied against the C++ suite of standards, except that the C++ standard is roughly an order of magnitude larger than Ada's … while being, again, strictly less capable.

The declarative world: Logtalk

The declarative world has likely been nodding along with my critiques of C and smugly patting their favoured language while doing so. And if that favoured language is, say, SML or OCaml or the like, they even have reason to be smug. But Prolog programmers don't have a reason to be smug. Is Prolog better than C?

Let's take a look.

Prolog is, without a doubt, a far higher-level programming language than C that can operate on abstractions that C users could only dream of working with. It's a powerhouse of a language and rendered particularly amazing given its age.

But it didn't age well.

I won't be getting into the virtues and flaws of dynamic vs. static type systems. Suffice it to say that I see reasons for both approaches and use both approaches at need. That being said, I lean toward static in my preferences: as static as possible, but no more. Prolog, however, is militantly dynamic. Indeed making a static Prolog (like Mercury kinda/sorta is) tends to drastically change the semantics of Prolog and reduce the power. It's almost as if great power comes with great responsibility …

What I can critique, however, is the module system used by most Prolog implementations. (Not all Prologs even have a module system. Those that do, including the popular SWI-Prolog package, conform largely, with minor variations, to the Quintus Prolog one.) Look in particular at the stupid module tricks I outlined in SWI-Prolog's module system.. Count the number of times the whole point of modules is missed in the way they're handled.

This is why I've largely dropped pure Prolog from my toolkit. Aside from quick tests of things, I rarely program in plain Prolog. No, I instead use Logtalk where Prolog once featured, and I use this chiefly because Logtalk does modularity right.

In many ways even better than some of my all-time favourite languages: Ada, Dylan, Modula-3, Eiffel, etc.

At its easiest, Logtalk can be used as a succ(Prolog, Logtalk). (What!? If C++ can use a syntax gag, so can I!). In specific instead of messing around with Prolog's inconsistent, incoherent, and fundamentally broken module system, wrapping what would be a Prolog module in an object declaration instead is going to give you better, more portable results.

Over and above using object as a module replacement, however, Logtalk gives us a veritable, tasty stew of modular abstractions including (naturally) object, but also protocol, and category. These can be mixed and matched in relationships shown by keywords like implements or imports or extends or instantiates or specializes. Further, objects (module stand-ins) can be parameterized at need while all coding constructs (object, protocol, and category) can be created statically at compile time or programmatically at run time. And, naturally, given that this is a Prolog superset, this is all open to introspection at run-time.

Logtalk has taken all the concepts of object-oriented and modular programming and turned all the dials to 11. While still being, at its core, relatively simple and elegant. It's far simpler for a Prolog user to learn Logtalk, for example, than it is for a C user to learn C++.

On this front alone, modularity, Logtalk leaves even Ada behind in the dust, making, ML-like, modules first-class constructs in the language. Further engineering focus, however, includes one of my favourite things: built-in documentation.

Any language can have a third-party documentation tool. (Most languages, even oddball “dead” ones like SNOBOL4, have them, in fact.) Very few languages, however, make documentation part of the language like Logtalk does. And the impact of making it part of the language is that documentation is far more tightly tied to the language than it is when using third-party solutions like Doxygen, making the resulting documentation more accurately paired with language constructs.

And it accomplishes all of this while keeping the powerful aspects of Prolog front and centre. Logtalk doesn't diminish Prolog in the same way that, say, Mercury tends to. It manages to elevate the power of Prolog and turns Prolog into a tool for serious software engineering.

Choice and consequences

In conclusion, this is not an attempt to tell you to use Ada or Logtalk. Ada and Logtalk happen to fit my needs for engineering-focused languages, but they may not fit yours. What is almost guaranteed, however, is that most common languages also don't fit yours, even when you think they do. (There's a whole lot of 'my cold dead fingers' attitudes in language choices out in the programming world, largely the result of ignorance of other languages.)

What I'm saying, however, is that your choice of language will constrain your ability to program. Some languages—the ones focused more on engineering software—will constrain your ability to do stupid things, dangerous things, or just plain make silly mistakes. Others—the ones focused more on making your immediate, in-the-moment coding simpler at the expense of safety, security, or correctness, will constrain your ability to make quality software that works.

Choose the right one, whichever it may be.

The Montgomery Scott approach to software engineering

December 27, 2021

I carry around a little toolkit with me on a thumb drive. It contains useful libraries, little snippets of code, and a few of my own practice projects (the ones I use to keep myself in the game outside of my realm of employment). The gnome sorts I mentioned in an earlier blog entry came from my little toolkit, as do a few Lua extension libraries, a lot of small Rexx tools that I've had use for over the years, and so on.

One of the tools I carry around with me is a simple, architecture-quasi-independent coroutine system. Another is a simple, architecture-independent timed event queue system (that supports my coroutine system as well as callback-style events). On Friday, it turned out the hardware prototype I was supposed to be writing drivers for and testing was broken and could not be used. Faced with a bit of spare time I decided to throw them into our suite of libraries so I could finally program something with decent readability and reliability. Come Monday, my boss comes in and we talk about the hardware issue, and I say:

Yeah, I didn't lose much time. And there was a bonus. I wrote an RTOS for us.

I love watching my boss's face when I do things like that. He's an electrical engineer so someone writing an RTOS at all is, to him, miraculous. It's doubly-miraculous to have it done in a day. (I didn't tell him, naturally, that it was ¾ done years ago.)

Engineering and software

December 23, 2021

I have an opinionated stance on software (to put it politely!). I think most software written is crap, and I think this is largely because programmers focus on the wrong aspects of software to the detriment of the whole. As an example of what I mean, I'm going to look at sorting algorithms.

There are a whole lot of them, and they all have O-notation properties, k-factors, and a bunch of other mathematical dross surrounding them that a particular kind of programmer just loves to go on and on about when selecting them. There's another particular kind of programmer that doesn't care what sort is being used as long as it's in the standard library so they don't have to think about icky things like “algorithms” and “choice”.

Both are bad engineers, in my books.

Introducing the gnome sort

Both groups stare at me like I'm growing a third eyeball in the middle of my head, Twilight Zone-style, when I tell them that the sort I most commonly use (to the point I usually didn't even bother keeping a version around and just coded it off the top of my head) is the infamous Gnome Sort (a.k.a. Stupid Sort). I'll drop the C version that I finally put into my formal toolkit here just as an example.

First the header:

/* vim: ft=c */
#ifndef GNOME_SORT_INCLUDED
#define GNOME_SORT_INCLUDED
/******************************************************************************
* (c)2020 Michael T. Richter
*
* This software is distributed under the terms of WTFPLv2.  The full terms and
* text of the license can be found at http://www.wtfpl.net/txt/copying
******************************************************************************/
/** @file
 *  @brief Implementation of the Gnome sort for integers and generic types.
 *
 *  Gnome sorts are grossly inefficient at scale, but if you have a small
 *  number of items to sort and need to do it in a small amount of code space
 *  (like would be the case in most bare-metal embedded systems), they are a
 *  god-send.
 *
 *  This library implements two kinds of Gnome sorts: specialized for the most
 *  common case (integers), and customizable by data type.
 */
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

/**** CONFIGURATION - user-adjustable parameters go here *********************/
/* Note: -DDEFAULT_GSORT_TYPE=<type> in the command line works too.          */
#ifdef DEFAULT_GSORT_TYPE
typedef DEFAULT_GSORT_TYPE gsort_t;
#else
typedef int32_t gsort_t;
#endif
/*****************************************************************************/

/*! Macro for conveniently counting the elements in a statically-allocated array. */
#define ELEM_COUNT(A) (sizeof(A) / sizeof(A[0]))

/*! Function pointer type for custom "greater than" comparisons of user types. */
typedef bool (*gsort_gt_t)(void *, void *);
/*! Function pointer type for custom value swaps of user types. */
typedef void (*gsort_swap_t)(void *, void *);

/*! Sort an array of integers (gsort_t) in-place with a gnome sort.
 *
 *  @param[in,out]  array      The array to sort.
 *  @param[in]      array_size The number of **elements** in the array.
 */
void gnome_sort_i(gsort_t array[], size_t array_size);

/*! Sort an array of arbitrary type in-place with a gnome sort.
 *
 *  @param[in,out]  array      The array to sort.
 *  @param[in]      array_size The number of **elements** in the array.
 *  @param[in]      gt_op      Function pointer: returns true if first operand is greater.
 *  @param[in]      swap_op    Function pointer: swaps its operands in place.
 */
void gnome_sort(void *array[], size_t array_size, gsort_gt_t gt_op, gsort_swap_t swap_op);

#endif //GNOME_SORT_INCLUDED

Now the implementation:

/* vim: ft=c */
/******************************************************************************
* (c)2020 Michael T. Richter
*
* This software is distributed under the terms of WTFPLv2.  The full terms and
* text of the license can be found at http://www.wtfpl.net/txt/copying
******************************************************************************/
#include "gnome_sort.h"

#include <stdbool.h>

#ifdef DEBUG
#include <stdio.h>
#define DEBUG_OUT(F, ...) fprintf(stderr, F, __VA_ARGS__)
#else
#define DEBUG_OUT(F, ...)
#endif

static inline void swap_i(gsort_t array[], size_t i, size_t j)
{
    gsort_t t = array[i];
    array[i] = array[j];
    array[j] = t;
}
static inline bool gt_i(gsort_t i, gsort_t j)
{
    return i > j;
}
void gnome_sort_i(gsort_t array[], size_t array_size)
{
    for (size_t i = 1, j = 2 ; i < array_size ; )
    {
        if (gt_i(array[i - 1], array[i]))
        {
            swap_i(array, i - 1, i);
            if (--i > 0)
            {
                continue;
            }
        }
        i = j++;
    }
}

void gnome_sort(void *array[], size_t array_size, gsort_gt_t gt, gsort_swap_t swap)
{
    for (size_t i = 1, j = 2 ; i < array_size ; )
    {
        if (gt(array[i - 1], array[i]))
        {
            swap(&array[i - 1], &array[i]);
            if (--i > 0)
            {
                continue;
            }
        }
        i = j++;
    }
}

Engineering vs. maths

In under 120 lines of code including all documentation comments, I have not one, but two optimized implementations of a gnome sort: a generic one that will sort any items (though without type safety and some inefficiency in function pointer overhead) and an integer-specific version that is type-safe and faster for the most common use case.

Why does this matter when it's O(n^2) average and worst-case performance? Shouldn't I favour an introsort with its O(n log n) performance?

Hell no! This is where engineering instead of pure theoretical maths shines.

On a 64-bit Intel machine using 32-bit integers for the sort the total compiled code for both functions put together is 214 bytes. On 32-bit or 8-bit microcontrollers the code is even smaller. Potentially much smaller, depending on the MCU in question and the ISA of its core.

The integer variant (the one I use most commonly) is only 80 bytes of text on a 64-bit Intel!

What's so important about size?

Since I do most of my work in embedded space, my data space will be limited, so n in the O-notation will also be limited. When I'm sorting typically two-digit n counts, the O-notation is meaningless. I have to get well into 5 digits before it starts to matter.

You know what I typically don't have in an MCU? 5 digits of available SRAM to sort things in…

So for small amounts of data, the O-notation is meaningless. There would be no noticeable impact on performance from iteration counts in that space. But … cache usage will have a huge impact, especially with the cache-starved cores I tend to work with. Introsort has impressive performance potential if you have loads of memory and infinite cache. On real machines, though, it can stall in the cache and start causing performance bottlenecks as the code for it gets moved in and out of cache lines. The algorithmic superiority can lose to the caching equivalent of the Schlemiel the Painter Algorithm. Your super-duper O(optimized) algorithm may choke your cache and bring your entire system to a screaming halt.

Whereas my little 80-byte (!) gnome sort fits into any cache I've ever seen in a core that has caches of any kind. Sure I may use, say, three or even ten times as many iterations as that super-duper introsort implementation, but I'll be running it (and my data) all in-cache with no stalls. I'll get done first.

So for small amounts of data, and especially on tightly constrained machines, a gnome sort isn't just “good enough” it's frequently superior to the “better” sorting algorithms.

And is that it? Just size?

No. There's another factor in software engineering. Your software should be as simple as possible and no simpler. Every time you use a complex building block in your code where a simpler would do, you are adding technical debt. Introsort is a good sort, don't get me wrong, but it's complex. It's easy to get wrong. If it suits your needs and you have a good, battle-tested implementation, go ahead and use it! Alternatively, if you have to use it in one part of your program, it's actually simpler to just use it in all parts instead of having multiple, competing sorts vying for your attention.

If, however, you're coding a quick sort for small amounts of data where, for whatever reason, you lack access to such a library, keep in mind that you can memorize the pseudo-code for a gnome sort in a few minutes. It's stupidly easy to understand and stupidly easy to code from that understanding in any imperative language. (It would be a bit harder to code in a language like Prolog or SML, and hard to justify when merge sorts are a thing.)

TL;DR LOL!

Software engineering isn't about universal “best” anythings. Not “best” programming language, not “best” algorithm, not even “best” operating system or target platform. Software engineering is the twin terror of choosing “best for purpose” in all of these as well as “making do” with the tools and techniques you have at hand. In the case of sort algorithms, that means you sometimes reach for something as seemingly idiotic as a gnome sort over the fancier ones with the “better”-looking numbers.

Bonus: Rexx!

Here's a gnome sort in Rexx:

/* ft=rexx */
/*******************************************************************************
* GSORT.rx
*
* Perform a gnome sort on a stem variable.
*
* Usage: call gsort <stem> [<field>]
*
* Parameters:
*   <stem>      - The name of the stem variable to be sorted in place.
*                 .0 must contain the count.
*   <field>     - The optional field (as per "words()") to sort on.
*
* Return:
*
* Exit codes:
*   0   Normal exit.
*   1   Error in external process.
*   2   Failure in system service.
*   4   Loss of significant digits in calculation.
*   5   I/O error.
*   6   Variable used before assignment.
*   7   Syntax error.
*--- user codes ---
*   -1  General error.
*******************************************************************************/

/***************************************
* VARIABLE INITIALIZATION
***************************************/
trace. = off

E_OK      = 0
E_GENERAL = -1
E_BADCALL = -2

/***************************************
* MAINLINE CODE
***************************************/
main:
  parse source . type .

    if type = 'COMMAND' then
    do
        say 'Cannot be invoked as a command.'
        exit E_BADCALL
    end

    use arg stem., field
    if \ var('field') then field = 0

    lc = 2
    do r = 3 while lc <= stem.0
        lp = lc - 1
        if field > 0 then
            do
                v1 = word(stem.lp, field)
                v2 = word(stem.lc, field)
            end
        else
            do
                v1 = stem.lp
                v2 = stem.lc
            end
        if v1 <= v2 then
            lc = r   -- lp and lc are in order, so move lc up to r
        else
            do
                -- swap lp and lc
                t = stem.lp
                stem.lp = stem.lc
                stem.lc = t
                -- move lc back by one
                lc = lc - 1
                if lc <= 1 then lc = r      -- lc has hit far left, so we're sorted to r
                           else r = r - 1   -- back the r pointer up a bit
            end
    end

    exit E_OK

How to debunk a conspiracy theory in just a few minutes

December 23, 2021

There are more than a few conspiracy theories surrounding the origins of COVID-19. This is understandable. People have this innate (albeit irrational) desire to blame someone when something goes wrong. The notion that “shit happens” is not a popular one because it leaves people feeling like they're not in control.

Of course they're not, which makes it even worse.

Still, while most conspiracy theories are harmless, some are very harmful and the COVID-19 origin stories are some of the deadliest of the lot. For purposes of instruction, I'm going to pick two of the largest: one from right wing lunatics, and the other from left wing nuts. These are:

COVID-19 leaked from the Wuhan Institute of Virology.
COVID-19 was endemic in the world and the strain that broke out in Wuhan was just a deadly mutation of a disease that was already everywhere.

Wuhan Institute of Virology

The narrative of this story is that the virus leaked, somehow, through improper handling (or even malice, in extreme variants) from the Wuhan Institute of Virology to the nearby Huanan Seafood Fishery Wholesale Market. In this story some virus samples obtained from bats somehow leaked through irresponsibility and some poor saps in Huanan Market caught the disease and the rest is history.

Let's take a closer look at this – let's be generous and call it a “hypothesis”, shall we? It has a singular flaw: it's a load of fetid dingo's kidneys. Let's take a look at a map: Directions from the Wuhan Institute of Virology and the Huanan Seafood Fishery Wholesale Market

The Wuhan Institute of Virology is in Wuchang (and a medium stroll away from my home), one of the three cities that were merged to form Wuhan. (The other two are Hankou and Hanyang.) The Huanan Seafood Fishery Wholesale Market is across the Yangtze river in Hankou. It's almost 12km away as the crow flies, and it's a 24km drive through some of the densest traffic of Wuhan. This is not “nearby” by any meaningful sense. These two are separated about as far as are central Harlem and central Queens in New York City, with more natural barriers and fewer roads available between them.

But do you know what is nearby? Here's another map: Map showing markets near to the Wuhan Institute of Virology and on the same side of the Yangtze

Google Maps has a paucity of information about Chinese cities (for obvious reasons), so only the largest and/or most interesting markets are marked. They're out of date, with so many being flagged as closed, but all of those flagged markets—far closer than Huanan by far!—were open and bustling in December of 2019. So how, precisely, do you think that a disease spread from the Institute to Huanan without popping up in the literally dozens of markets, malls, shopping streets, bars, restaurants, and other such entities that service Wuhan University?

Oh, did I fail to mention that? The Wuhan Institute of Virology is in the middle of the single largest, most important university in Wuhan. Wuhan University is a powerful economic force and is ringed by markets, shops, restaurants, bars, and a whole host of other venues that are ripe for viral transmission.

Yet it supposedly popped up in practically another city. Hmmmm…

Let's just stop embarrassing the right wing lunatics with facts, shall we?

Already endemic

This one is even easier to debunk. Here's another map: Map of directions between the Huanan Seafood Fishery Wholesale Market and Wuhan Tianhe International Airport

Picture the narrative involved. In the absolute least unlikely scenario, someone with COVID-19 had to fly in to Tienhe (somehow not infecting the hundreds of people also in the plane) and then make a beeline for Huanan in some form of conveyance that wasn't near anybody else. Then they walked into the market, coughed, and unleashed the viral bomb that subsequently ravaged the world.

Does this strike anybody with a brain cell as being even remotely plausible? Really? How would you fly for hours in a tin can with recycled and then-largely-unfiltered air without infecting people in it first? Why would someone go straight from the airport to the market? Would they not go home first, or to a hotel? Maybe get dinner? There's literally dozens of trivial holes to poke in this theory even if you don't have the map above. With the map above it's laughable.

No, the left wing nuts need their safe space from the mocking laughter that they so richly deserve for foisting this on the public.

So where did it come from?

Because this is so badly politicised now, there's literally never going to be a full story known. What is known, however, is that in November there was a string of odd fevers leading to death in a straight line leading from Chongqing and environs to Wuhan in the month of November. Poor medical attention in rural China combined with unwieldy bureaucracy in the Chinese CDC led to the importance of this being missed. Quick cremation also led to lost tissue samples and subsequent opportunities.

The prevailing narrative that this provides is that hunters in Sichuan who hunted or trapped wild animals were directly or indirectly infected by the bats mentioned in the first conspiracy theory. They then carried this to Wuhan and its lucrative wild game market: the infamous “wet market” (termed “farmer's market” by non-racists) where the viral bomb blew up.

The rest is current events.

But …

My narrative is still speculation. It's just speculation that doesn't directly contradict trivially verifiable facts. As more facts come to light, if they ever do, the narrative will of necessity shift. That's the strength of not being a conspiracy theorist, though. I'm not invested in my story. I don't mind changing it in the face of fact.