Counting

Aug 29, 2022

TLDR: Utopia counts in little-endian base-six.

Prerequisites: None

All over the world, when people count they tend to do so using a radix (a.k.a. base) of ten. Starting with 0, they use ten digits to count: …7, 8, 9, 10… It’s somewhat dumb to write this as “base-10,” if you think about it, because no matter what radix was used (e.g. binary), if we’re using that radix to write numbers then the first number represented with two digits will be “10” (e.g. two). But regardless, humans (currently) count using radix ten all over Earth.

And we should be proud of this! It used to be that some people (e.g. the Romans) counted using obtuse systems that couldn’t scale to large numbers. Arabic numerals and a place-based system (“ten’s place”, “hundred’s place”, etc) is genuinely pretty good. And it’s universally accepted! Which is an accomplishment in itself. If only other standards (*grumblemetricgrumble*) could be so widespread.

It’s too bad that ten is the wrong radix!

This isn’t even close to the hottest-take in this post. Most people who have a clue will note that it’s useful to choose a radix that has good factors. Ten has (non-trivial) factors of {two, five}, while a number like twelve has factors of {two, three, four, six}. Even if we restrict ourselves to prime factors, three is a much more common number than five, and we would have been better off using a radix of twelve — a.k.a. dozenal.

Here’s the hilarious jan Misali to explain:

I’ll summarize and expand on the content, for those who don’t want to watch an 18 minute video filled with jokes (or the 17 minute follow-up video). There are two properties that determine the quality of a radix: size and factors.

Misali says that smaller radices (i.e. binary and ternary) are better, as they have a better radix economy, and result in fewer rules that need to be memorized to do arithmetic (i.e. multiplication tables). Radix economy doesn’t seem to me like the right metric for human minds, but I agree that smaller bases have easier arithmetic. It seems to me that the more objects a human has to track the more working memory they require, and lower radices require more objects (digits) to represent numbers. Similarly, it helps when those objects are distinct, increasing the pressure to use a large base. For example, fifty-one is two things for most people (5 and 1), but in binary it’s six things: 110011. To me, this feels like the number is simply three times as big, not (12/20 =) 3/5ths as big, as radix economy would suggest.

Where do the pressures of easy-arithmetic versus small-number-representations balance out? I personally think it’s somewhere around radix eight. You may disagree. I, unfortunately, don’t have a formal way of working this out.

So let’s turn to the second desideratum, then: having a radix with good factors. The primary value of factors is divisibility. How big is each part if I divide something equally, using various radices?

Numbers in each column are written using the relevant radix.

Quaternary is actually pretty good! If it weren’t for the fact that it falls down a little when representing 1/3 (and that it’s too small), I’d suggest using it. Decimal fails approximately as hard, and we use decimal. But 1/3 is a pretty important fraction. Let’s try to do better.

Dozenal is pretty good. It has clean representations for most of the major fractions. But when it doesn’t have a clean representation it falls down extremely hard. The fact it can’t do 1/5 without four repeating digits is a real downside. In fact, if you’re not allergic to repeating digits, decimal is perhaps better?? But again, 1/3 is a pretty important fraction.

Heximal (or seximal, as jan Misali prefers), is tied with dozenal for the highest number of clean representations, but doesn’t fall on its face until one-eleventh, which as we can surely agree, is a terrible number that should be cast into tartarus. This is thanks to a number-theoretic property where numbers one away from the radix have a simple representation.

For more on heximal, check out jan Misali’s seximal resources page.

Endianness

In the 1726 novel Gulliver’s Travels, the main character comes across an island where there is a great controversy. When cracking an egg, which end should be broken? The common folk prefer the big end of the egg, while the king has decreed by law that it must be the small end that is broken. This, of course, has resulted in many rebellions.

It is computed, that eleven thousand persons have, at several times, suffered death, rather than submit to break their eggs at the smaller end. Many hundred large volumes have been published upon this controversy: but the books of the Big-Endians have been long forbidden, and the whole party rendered incapable by law of holding employments.

There is a similar battle in computer science. When numbers are stored as a series of digits on a machine, should one start with the biggest (i.e. most significant) digits first (big-endian), or the least significant first (little-endian)? Unfortunately machines and systems of both kinds exist, so the battle rages on.

One of the most common arguments for big-endian numbers in computers is that we use big-endian numbers on paper. We start from the left, after all, which is the big end of the number.

Or do we?

It depends on who you are, after all. Those who speak Arabic read right-to-left… And guess who invented Arabic numerals? That's right: the Hindus.

But it’s complicated! As far as I can tell, the primary origin of the number system we use today is the Brahmi numeral system, that emerged at least a couple thousand years ago. Brahmi numerals are non-positional, however, and we don’t have any documented cases of positional numbers until an inscription in Khmer that reads “The Caka era reached year 605 on the fifth day of the waning moon.”

This is also the first recorded use of a digit indicating 0.

Khmer is written left-to-right, and I have to assume that the historians translated the number correctly. But regardless, Indian numerals caught on in the Middle East thanks to mathematicians like Al-Khwarizmi and Al-Kindi, and then later these numerals were brought to western Europe via Arab merchants. If one reads right-to-left then these numbers are little-endian. Perhaps the western world ought to have reversed them when they were adopted…

Starting with the least-significant digits is the most useful way to do addition, multiplication, and subtraction. We left-justify most text, but with numbers it’s conventional to right-justify so as to compare length more easily (just ask your favorite spreadsheet software). And divisibility tests (such as even/odd) are done by examining the least significant digit.

When reading out loud a large number, like 4389710239, is a pain when read big-endian. Sure, it starts with a 4, but 4 whats? One must count the digits before anything useful can be said about the most significant digit. When using little-endian, on the other hand, one can simply read “nine-thirty-hundred-two, ten-hundred-seven-thousand, nine-eighty-hundred-three-million, four-billion.” That last “four billion” is the best summary, and it naturally comes together at the end, where it’s memorable.

(I’ll also note that little-endian is my preferred way of writing software. I don’t think this is a very compelling argument about numbers more broadly, as the needs for computers are different than those for humans, but it compels me somewhat.)

Utopian Numbers

Utopia uses little-endian heximal. For instance, Utopia writes the number of days in the year as “⌊5041”, unless it’s a leap-year, in which case it has “⌊0141” days.

The “⌊” symbols indicate that a number is written in simplified form. Most numbers in Utopia are written in compressed form, unless one is (for example) doing math by hand. Compressed form numbers use thirty-six symbols to indicate digits, which here we’ll indicate with the standard digits, followed by the letters of the roman alphabet. Each of these digit-symbols stands in for a pair of simplified digits. Whenever a compressed number is written, it is preceded by either a plus or minus sign.

+0 = ⌊00 = zero
+1 = ⌊10 = one
+6 = ⌊01 = six
+9 = ⌊31 = nine
+A = ⌊41 = ten
+C = ⌊02 = twelve
+U = ⌊05 = thirty
+Z = ⌊55 = thirty-five

But of course we don’t have to stop there. Larger numbers, like ⌊5041 are written by compressing each pair of digits. So since ⌊50 = +5 and ⌊41 = +A, ⌊5041 = +5A. Numbers like this are read left-to-right, with a mnemonic word for each symbol and a place-indicator word between pairs of symbols. We can use the NATO alphabet here to read the symbols, with place indicator words of “uno” and “duo”. (Utopia has a better language, with more specialized words/sounds than these.)

365 = ⌊5041 = +5A = “five alpha”
186,282 = ⌊0324553 = +IQZ3 = “india quebec uno zulu three”
8675309 = ⌊542535505 = +TWX55 = “tango whiskey uno xray five duo five”

Non-integer numbers are written by placing a dot after the integer portion and then switching to big-endian for the fractional part. This is because the fractional part of the number has the reverse-logic of the integer part — it’s easier to read big-endian. Numbers can also be written as scientific notation via a separator (we can use “e”) followed by the exponent. (Be careful when counting orders of magnitude. Symbols in compressed form count for 2 each.)

3.14159 ≈ ⌊3.05033 = +3.53I = “three decimal five three india”
3 × 10⁸ = ⌊02521043454 = +CH1MY4 ≈ +5eA = “five pow alpha”

And that’s objectively the best way to count!

Raelifin

Jul 5

Someone recently sent me this video arguing binary is better than heximal: https://www.youtube.com/watch?v=rDDaEVcwIJM

I agree binary is king for specialized machines. It's very clean and simple in a bunch of ways. But I am unconvinced that it's better for humans, and perhaps for neural nets in general. The quantity of mental entities ("chunks") that are used to represent a number is extremely important and costly. In practice the video seems mostly to chunk into octal, which is a reasonable choice for base in part because it can be decomposed easily into binary. Division and multiplication by 3 is extremely common, however, and will always be easier if it's a factor of the base. I predict that on a battery of basic math problems, humans from a civilization based on binary or octal will perform worse than humans from a heximal civilization.

Expand full comment

Utopian Dreams

Discussion about this post