Futarchy
TLDR: Utopia sets policy and elects governors via conditional prediction markets on aggregated values.
Prerequisites: Opinions, (Recommended: Scott Alexander’s Prediction Market FAQ)
Many forms of Government have been tried, and will be tried in this world of sin and woe. No one pretends that democracy is perfect or all-wise. Indeed it has been said that democracy is the worst form of Government except for all those other forms that have been tried from time to time; but there is the broad feeling in our country that the people should rule, and that public opinion expressed by all constitutional means, should shape, guide, and control the actions of Ministers who are their servants and not their masters.
- Winston Churchill, 1947
I am not a fan of democracy. Churchill spoke of a feeling that people should rule, and that public opinion should shape the actions of public servants. This is a better principle of governance than the tyranny of the few, I will grant, but it’s hardly Utopian. People are, by and large, uninformed fools. A government managed by the people is thus one driven by popularity and bias, rather than truth and principle.
It’s easy to find cases of democratic government making predictably bad choices, and not just in aphorisms like “Democracy is two wolves and a lamb voting on what to have for lunch.”
Democratic governments have repeatedly voted dictators into power, including Adolf Hitler (Germany), Vladimir Putin (Russia), Hugo Chávez (Venezuela), Recep Erdoğan (Turkey), Viktor Orbán (Hungary), Alberto Fujimori (Peru), and arguably Lee Kuan Yew (Singapore). Had he been more successful at concentrating power and overturning the 2020 election, Donald Trump would also have made that list. These authoritarian strongmen gain power through charisma, networking, and playing up their commitment to populism (conservative or progressive) then use that power to strangle the free press, constitutional rights, and other institutions that stand in the way of their autocracy.
The role of the media in the democratic process has often been emphasized as a key safeguard against tyranny. If the people know what their leaders are doing, they’ll vote against corruption, and choose leaders that demonstrate wisdom and good judgment. Or at least, so the story goes. In practice people demonstrate confirmation bias, listening to sources that support their preferred team or candidate, and systematically discount evidence of corruption when it’d be inconvenient to address. To satisfy partisans, the press produces emotionally-engaging opinion pieces meant to grab attention, rather than inform. Knowing this, politicians cater to the superficial — speaking in soundbites and slogans that can be clipped and shared by the media instead of emphasizing sound policies that might be too nuanced to be discussed meaningfully in a TV program or short Youtube video.
As I see it, while independent media is a necessary component of a healthy democracy, most of the resilience of good governments today comes from strong institutions such as courts of law, and entrenched bureaucrats and officials that transition invisibly from one administration to the next. If a would-be dictator attempts a coup in a healthy country, they are rebuffed by soldiers, police, judges, accountants, and clerks that are invested in the status-quo. In a way, this resilience is an anti-democratic phenomenon! The checks-and-balances in the US government, for instance, make it difficult for even a sizable majority of citizens to make dramatic changes at a reasonable speed. This is why times of war and emergency are so critical — the normally sclerotic bureaucracy can speed up out of a sense of obvious need, only to deliver power into the hands of tyrants.

Doubling Down with Direct Democracy
Perhaps the solution to good governance is to do away with rulers all-together. Who needs ministers, governors, presidents, senators, or administrators? Let the people decide on policy! With no specific person at the helm, there’s nobody who can seize power or otherwise corrupt the government against the people.
This is the idea behind direct democracy, a system that has been tried in small governments such as ancient Athens, the Crow Nation of Montana, and the town meetings in Vermont. Many other governments, such as Switzerland and many US states use semi-direct systems, where a representative government is supplemented with popular referenda that are given to the general populace to vote on directly. (Some other countries like the UK have popular referenda that are non-binding, and are used to help the government decide what actions to take.)

Direct democracy seems to work best when the populace is fairly small. Beyond a certain size things can get bogged down by those who care more about being heard and getting their way than in compromising and moving forward. Proponents of direct democracy sometimes respond that in the past it may have been infeasible to include everyone in every government decision past a certain size, but thanks to the internet and other technologies, that’s changing. Imagine if we had a website or other service whereby every citizen could read and weigh in on things as big as declarations of war, or as small as changes to how the post-office budget is allocated. With proper design, things could remain inclusive at scale.
But while technological and scale-based hurdles are real problems for direct democracy, they’re not the essential problems. The core issues are the same as for representative democracy but often magnified in scale, with the exception the risk of corruption, which is much-reduced:
The average voter is extremely ignorant, and can’t afford to spend the time to become educated on complex policy domains.
The average voter is biased, short-sighted, and prone to favor superficial considerations, like whether they’ve seen advertisements about the vote.
The process is extremely inefficient. People must vote in order to have their values incorporated, but almost all votes are wasted; opposing votes cancel each other out and votes in over-decided polls simply change the margin of victory.
The process favors those who are less-busy and more radical. Retirees and activists have disproportionate influence compared to children, laborers, and parents.
I have an easy solution, or at least patch, for these issues. This patch can apply to both direct and representative democracies, but is especially needed in direct democracy.
We should run referenda/elections by first selecting a jury of 216 citizens from the populace entirely at random, then go with however the jurors vote. Juries (sometimes called citizen’s assemblies) should be short-lived (probably existing for only one ballot) and mostly secret, to reduce strain on citizen’s lives and reduce risks of corruption. Jurors should be strongly compensated and legally obligated to participate in order to reduce bias in who is represented. And because the jury is relatively small, experts can be hired to be directly consulted by any juror who wants their opinion.
This change would dramatically ease the issues of ignorance and representation, entirely eliminate the inefficiency, and help with bias and superficiality. It introduces a couple points of possible corruption in both the jury-selection process and how consultants and other informative materials are chosen, but these seem manageable through independent oversight.
Why 216? It’s a nice round number that is simultaneously small enough to allow all the jurors to know each other and fit in the same room, if they decide that the ballot is worth discussing, but it’s also large enough that it’s unlikely to randomly neglect any particular minority. (A group has to have less than 0.48% of the population before it has more than a 50% chance of being absent from a jury of that size. (0.5% is roughly the fraction of Americans that make more than $1m/year, are above 6’4” (192cm) (including women), or who live in Philadelphia.) Of course, there are so many minority groups that it’s nearly guaranteed that some minority will be absent, especially when intersections are concerned.)
If you’re a fan of democracy, and think governments should stick to decisions made directly by citizens, then I strongly recommend randomly selecting voters. But I also think we can do even better — a lot better.
Utopian Governance
In Utopia, large-scale policies and ministers are selected/appointed by a futarchy. Futarchy (“few-tar-key”) is a form of government (a la anarchy, monarchy, democracy) that is often summed up with the slogan: “Vote on values; bet on beliefs.” Here’s an 8-minute, medium-good explainer by the system’s inventor, Robin Hanson:
In Utopia, every 163 days or so, one out of every 1296 people from the total population is surveyed, totally at random. This is a lot of people, but is basically as few people as possible while still causing most humans to personally know several people who were surveyed and therefore feel at least somewhat represented. When a surveyed person is a child, non-human, or otherwise has a guardian, their guardian represents them on the survey.
Survey takers are presented with a fairly long list of possible goals for the near future, such as reducing crime, developing new technologies, or increasing the reported happiness of citizens. They are then given 36 points to allocate to various goals, with the instruction to put points on whatever they would focus on maintaining/improving if they were in power, where more points means more attention.
These point allocations are then treated as weights, averaged, and carefully aggregated into a Citizen’s Priorities Metric (CPM) using a somewhat involved process that I’ll explain in a footnote1, to spare those who don’t want gritty details. (Robin Hanson calls this metric "National Welfare" instead of CPM, which I find to be a mild misnomer.)
The government then has a clear goal: select policies and ministers that maximize the expected CPM. (Hereafter I’ll treat “appoint X as Minister of Y” as a kind of policy.) But how does the government decide which policies are the best for the expected CPM? Conditional prediction markets!
At any given time there are many policies under consideration by the government. The government has a limited attention, so two mechanisms are used to select what the government considers. First, auctions are held to raise tax revenue and let people express what they care about. But to prevent the considered policy space from being filled by those that specifically serve wealthy demographics, there is also a queue of policies ordered by number of signatures. About half the policies under consideration at a time are from auctions and half are from grass-roots campaigns.
There is a state-sponsored platform that holds a set of prediction markets for each policy under consideration (seeded with some tax dollars to make participation slightly positive-sum). For each choice that the government could select, there is a market for contracts that pay out proportional to the current CPM evaluated at some specific point in the future, such as in 16 years. (The CPM can change with time, as the citizens decide on new priorities, but these contracts are evaluated only in terms of the CPM that’s in place when the market opens.) Vitally, these trades are conditional on the government choosing the specific action associated with that market. If the action is not taken, all trades are nullified, and everyone gets their money back.
For example, let’s say that Utopia was considering banning the cultivation of genetically modified organisms (GMOs). Two futures markets then open on the trading platform: one for the CPM if GMOs are banned, and one for the CPM if GMOs are not banned. Alice buys a contract from Bob for $8 that pays out a number of dollars equal to the current CPM evaluated 16 years from now on the market associated with GMOs getting banned. Alice also buys a contract from Bob on the market where they’re not banned, this time for $10. Now let’s say the government decides not to ban GMOs. The first trade is then reversed, meaning Alice has only given Bob $10, and now holds a contract whose value depends on the CPM in 16 years.
The key is that the market price of CPM contracts reflects expert’s opinion on what the CPM will be in the future. Thus, at a random point after a market has been open for a while, the Utopian futarchy simply selects whichever action is predicted to lead to a better world, as evaluated by the market and the priorities of its citizens.
If the market price for CPM contracts is $8 on the GMO-ban market, and $10 on the no-GMO-ban market, this indicates that the CPM is expected to be lower if GMOs are banned. Because the market contracts involve the real flow of dollars from the ignorant to the knowledgable, there’s a strong incentive for market participants to be informed experts betting on their true beliefs. The randomness in when policies are decided and the fact that experts can manage large funds on behalf of others makes the system robust to manipulation.
Thus in Utopia, policy is decided by the people, but in an extremely meritocratic way that selects for the best and brightest. The values of people from all walks of life are reflected faithfully, however, thanks to the government steering towards the average priorities of the citizenship.
Most citizens of Utopia don’t think about government policy very much; they’re not experts and nobody is trying to persuade them. They are, however, occasionally pressured to weight/prioritize certain things, getting into philosophical conversations about what matters in society. Those with particular interest and aptitude train to become futarchy traders — a noble and respected profession.
Futarchy isn’t a panacea. Opportunities for corruption, deceit, and manipulation still exist. But thanks to a strong legal system and many watchful, self-interested auditors and journalists, these things are rare in Utopia. And at the end of the day, the Utopian futarchy is much more competent, sane, and unified than democracy.
Okay, nitty-gritty time! Here’s how the CPM sausage is made:
There’s a CPM for each survey cycle of ~163 days. The CPM for any given cycle is created by taking a rolling average, where 5/6 of the new CPM is just the previous CPM and 1/6 is calculated using the fresh survey data. This 1/6 of the CPM is called the Instantaneous-CPM (ICPM). Thus if the old CPM was 36 and the fresh ICPM is 72, then the new CPM is 36×5/6 + 72×1/6 = 30 + 12 = 42.
Weights provided by survey participants are averaged. Thus if I put 18 points on “Crime Reduction,” 6 on “Happiness,” and 12 on “Technology Development,” and you put all 36 on “Happiness,” then our average weights would be 9 on crime, 21 on happiness, and 6 on technology.
The goal categories that citizens assign weights to are a scarce common good, so (like what policies are considered by government) are chosen by a two-track system. Half of the categories are chosen according to a signature priority queue with a small buffer to prevent jostling (i.e. the chosen categories are treated as having %117 of the votes they actually have). The other half of the categories are selected by a tax-donation priority queue, where citizen coalitions and special-interests can pour as much money into their desired categories as they want (with the same buffer). The number of options on the citizen’s priority survey is chosen by the futarchy.
For example, let’s say that there are 4 spots on the survey: 2 from signatures/votes and 2 from donations/bids. (I’ll use “vote” and “bid” because they’re shorter words.) Citizens have voted on “Self-reported Happiness” (5k votes), “Healthy Ecosystems” (3k votes), and a long-tail of other priorities like “Less Pollution” (1k votes). If people wanted “Less Pollution” to show up on the survey, it’d need 3.5k votes (i.e. 2.5k more votes) to beat “Healthy Ecosystems” (assuming “Healthy Ecosystems” didn’t get any additional votes in the meantime). Meanwhile, citizens have bid for “Wealth” ($79k), “Simple Laws” ($55k), “Technology Development” ($60k), and many others, each with smaller total bids. Despite “Technology Development” having a higher total bid than “Simple Laws,” it hasn’t yet jostled out the #2 spot because of the buffer. Once “Technology Development” gets an additional $5k it’ll become #2, and need $76k to be ousted.
Each goal category (and ultimately the ICPM) is measured by an aggregation of Specific Goal Metrics (SGMs), that come from a wide range of institutions. Basically, if an institution (e.g. the World Bank) has been publishing a metric (e.g. GDP/capita) for at least 16 years, it’s fair game to be chosen as an SGM. When a metric is chosen to be an SGM, the institution that produces the metric must agree to continue to measure things in a fair and impartial way, refusing to distort the metric based on political pressure. Corruption of metric evaluation is a very serious crime, resulting in steep prison time for both the corrupters and those in the corrupted institution.
Each goal category can be based on up to 12 SGMs: 6 chosen by signatures and 6 chosen by donations, like goal categories themselves. An SGM may be used for multiple categories, or even be selected by both donations and signatures within a single category, but signatures/donations don’t cary over between categories. For instance, the World Bank’s GDP/capita measure could be an SGM that’s used in the category of “Wealth,” selected both by bids and votes, but also in the category of “Technological Development” as selected by bids. Money donated to the government to bid up its importance to “Technological Development” doesn’t impact its standing in “Wealth.”
Astute readers will notice that even with the weights given by the survey takers, there’s still a huge obstacle to overcome before we get to ICPM: SGMs produce arbitrary real numbers, some of which are high-quality, and some of which suck. In their raw state, these are entirely unsuited to be averaged or whatnot. For example, let’s say I have two categories: Wealth and Happiness. Wealth is has two SGMs: GDP/capita and number of movies released to theaters in the last year. Happiness has one SGM: average reported happiness on a 0-6 scale. Furthermore, let’s say that the citizenship collectively weights Happiness as twice as important as Wealth. It would be crazy to have the ICPM be equal to 2×(avg reported happiness) + 0.5×(GDP/capita) + 0.5×(movies in the last year). Even with the 2× multiplier, the happiness score would be immediately drowned out by the wealth metrics, and on top of that the wealth metrics are decidedly not equal in how well they measure wealth (or how robust they are to being artificially manipulated).
To solve this, the government first finds people who care about the issues at hand, either by picking survey takers who assigned weight to the respective categories, or in the case of a new category, simply by randomly selecting citizens and seeing how much they seem to care. These people are paid to confidentially share their financial details and do a few day’s work answering some questions. These chosen citizens are collectively provided with a series of thought experiments wherein they’re assumed to be trying to improve a SGM by spending money, and they’re asked how much money they’d spend to change the SGM by a certain amount. Dollar values are then back-ported to estimated utility using models of how valuable money is to people in a given financial situation and how much those people seem to care about the issue (for example, by checking the weight they gave it in the citizen’s survey). Using this data, a pseudo-utility curve is fit to each SGM to give it a positive, linear value.
For instance, a middle-class person says they’d pay $50 to go from 10 movies in theaters/year to 20 movies in theaters/year, but only $10 to go from 20/year to 30/year, and wouldn’t pay anything beyond that. A wealthy person says they’d pay $100 to go from 10/year to 20/year, and they’d pay $90 to go from 20/year to 30/year. Hundreds of samples are taken and analyzed, and a logarithmic pseudo-utility curve is fit to movies in theaters/year.
Secondarily, people (not necessarily the same as those used to fit the pseudo-utility curve) are are asked to select which SGM is the best measure of the goal category. For example, 99.9% of people might say GDP/capita is the right way to measure “Wealth,” while only 0.1% say movies in theaters/year is better.
We can now compute our ICPM. First we take each SGM and feed it into the function that was fit from the data to get a pseudo-utility. A weighted sum of pseudo-utilities is calculated for each goal category using the fraction of people who think it’s the best specific measure. Pseudo-utilities for each category are turned into utilities by multiplying them by the weights citizens give each goal in the survey, and then summed together to get the ICPM.
For instance, let’s say that we find that 50 movies were released in the last year, the GDP/capita was $60k, and the average reported happiness was 4.7 out of 6. And let’s again say Wealth and Happiness are the only two goals, with citizens preferring happiness to wealth at a ratio of 2:1. We plug the three SGM numbers into our curves and get pseudo-utilities of $71 for the movies, $41,990 for the GDP, and $52,202 for the happiness. We then calculate the ICPM as a weighted sum: 0.1%×(1/3)×$71 + 99.9%×(1/3)×$41,990 + (2/3)×$52,202 = $48,784.
As we can see, the system allows for the use of a variety of metrics and naturally discounts bad measures. Curve-fitting is set up to be able to be done rarely, especially when SGMs are stable and aren’t being manipulated. Most of the steering power comes in the form of citizens reweighing their priorities.