Hyperlinks

Feb 01, 2023

TLDR: Hyperlinks on the web are naturally unstable and host a variety of other issues such as failing to warn readers of what’s on the other end of the link. Utopia uses semantic links where the reader’s computer does last-mile search to reduce link-rot.

Prerequisites: Customizable Software

Hyperlinks are not the worst part of the web. That dishonor is reserved for pop-up advertisements with autoplaying videos. But still… hyperlinks suck.

For example, here is a link to Google: https://www.google.com/

Nothing about that link warns you that it’s a trap except perhaps an on-hover URL if you’re on desktop. Most people are not scrupulous about checking the target of links before clicking, and even those who are have to deal with the regularly used mystery-meat of shortened URLs.

But the problems with hyperlinks are multitude. Another way in which they’re poorly designed is how it’s fairly random which links open new tabs and which ones navigate within the current tab. This issue is paired with technical vulnerabilities on the back-end. And don’t get me started on web forms and buttons, which make the distinction between links and non-links even more blurry. The upside of the web being a flexible platform is a diversity of interesting designs, but the downside is a lack of clarity and standards about what clicking on anything is supposed to do.

But the worst part of hyperlinks is…

Link Rot

Link rot is a serious problem:

A 2003 study found that on the Web, about one link out of every 200 broke each week, suggesting a half-life of 138 weeks. This rate was largely confirmed by a 2016–2017 study of links in Yahoo! Directory (which had stopped updating in 2014 after 21 years of development) that found the half-life of the directory's links to be two years.

If all links were equal, nearly all links on the web would be dead within a decade with half-lives that short. Some links are that bad — looking up webpages from 2010 is pretty hit-or-miss. I hope it’s obvious that insofar as the internet serves as something like a collective memory for civilization, it’s pretty bad to have that memory’s longevity measured in weeks.

But not all links are equal, of course. Some websites belong to organizations or individuals with a long perspective. The fleeting links become broken while the long-lasting links persist.

An obvious solution to link rot, then, is to place responsibility in maintaining permanent content on companies that we expect to exist in the future. This essay was originally published on Substack, and perhaps Substack will be around forever! If it is, there’s a chance that it’ll maintain my writing at the original URL in perpetuity.

But this is probably wishful thinking. Big platforms have failed in the past. Content on social media platforms gets removed (by users or administrators) or simply lost and forgotten. Businesses typically aren’t structured to be museums, and can decide something isn’t worth maintaining. And even those that want to preserve content might go out of business. Disasters happen. Many put their faith in Internet Archive — linking to cached sites on the Wayback machine. I guess we’ll just all have to hope that it never fails.

Or… perhaps there’s an alternative?

Blockchain Sites

Blockchains are a cool technology. Unfortunately, they’re poorly understood by most people, and the signal/noise ratio around them is awful. A diligent student can learn about them, but only by sifting through garbage and hype. My favorite short explainer is 3Blue1Brown’s 25 minute video on Bitcoin, and beyond that I suggest just reading the Wikipedia page. I’m not going to unpack all the details or ramifications of blockchains here, but let’s briefly go over the basics.

A blockchain is a distributed database composed of a series of documents called “blocks” hosted on a part of the Internet. Anyone can set up a computer to be part of the network, often by storing a local copy of the blockchain (currently ~450GB for Bitcoin). The first block in the chain is specified by fiat, but then each subsequent block is added to the chain by someone in the network. The big insight with blockchains is that it’s possible to make it easy to verify new blocks as being correct, but much harder to generate blocks; this makes attacking a blockchain with false information prohibitive and allows the network to trust itself and reach consensus.

The first major technology supported by blockchains was cryptocurrency. The Bitcoin blockchain is basically a great big bank ledger that specifies who paid how many coins to whom and when. The nice thing about cryptocurrencies is that they serve as a natural way to incentivize people to support the network and continue to grow the chain — “miners” that add blocks to the bitcoin blockchain get rewarded with some coins for their trouble. I think it’s likely that building/maintaining all future blockchains will be supported in approximately that way.

But here’s the reason we’re talking about blockchains here: the information in the database doesn’t have to (just) be financial records. In theory, any data can be put onto a blockchain, and then that data will copied out, protected, and hosted by every node in the network. This has deep ramifications for things like intellectual property, censorship, and more, but I’ll leave those for a later essay. The only relevant part is that it’s theoretically possible for a website to be on a blockchain.

And if a website is on a blockchain, permanently etched into that record, then it has (in theory) a fixed address that can be linked to. As long as people keep using that blockchain, that site will be visible, without need for trust in a single person or organization. Because blockchains are such a new technology we don’t know how long they’ll last in practice. But in theory, we could see information stored on blockchains today last thousands of years into the future, thanks to the robustness of decentralized digital copies.

But… this isn’t really a good solution to link rot, in my opinion.

Fetching information from deep in a blockchain is slow. It’ll probably get faster as new tech shows up, but I can’t see how it’ll ever be comparable to the speed of getting data from a dedicated server.
Putting information on blockchains is expensive, especially in large quantities. Again, new tech may make it cheaper, but blockchain space is scarce compared to server space on the broader net.
It’s impossible to update data once it’s embedded in the blockchain. In theory one could publish updates later on in the chain, but then links to the old content won’t include future data.

Know what else is a decentralized digital record? Public domain books. It seems to me that publishing “Greatest Hits of the 2012 Web, Volume 1” and releasing it into the public domain would be comparably good to using blockchains to host websites.

Utopian Hyperlinks

I think that in Utopia hyperlinks come in a variety of flavors, clearly distinguished by appearance and behavior. For instance, some links are for site navigation, and clicking on them will direct the user to a new page on the same site. Other links are for moving to a new site, where a click leads the user to a different part of the web.

Navigation links look different than movement links. How? That’s not for me to say; users can configure their browsers to display things in a way that works for them and the device they’re using. My guess is that navigation links are usually colored differently, while movement links come juxtaposed with an icon representing the destination, like those on Gwern’s website. But again, I think the ideal presentation is customized to the user, rather than trying to look good to everyone.

Nav and move links cannot open new tabs/windows, only change what page the user is looking at. The purpose of this restriction is to produce consistent, predictable behavior. Along a similar vein, textual URLs cannot be given a custom target; they always link to the destination implied by the text (e.g. no “https://www.google.com/”).

To prevent link rot where a URL is repurposed by another company, movement links are annotated with metadata that indicates when the author created the link and who owned the website at the time. Browsers can then automatically detect stale links, warn users that the new site is probably not what the author intended to point to, and search known archives (such as the Internet Archive) for cached versions of the referent.

This behavior is actually a special-class of how hyperlinks are built differently: in Utopia a link is closer to a citation in a standard format, where the user’s browser is in charge of searching for relevant destinations.

For instance, one thing I like to do is provide a link to Wikipedia when I introduce something that may be unfamiliar to a reader, such as a blockchain. In this I’m assuming that Wikipedia is, and will continue to be, a good source for laypeople to learn about unfamiliar concepts. But what if, down the line, Wikipedia falls apart, becomes politicized, or is simply out-competed by a better source of knowledge? In Utopia, my “link” would instead be a this-is-a-term-that-readers-might-want-to-investigate tag, where I can put disambiguation metadata. Reader’s browsers would then display new/important terms in some distinct style from navigation and moment links. Clicking on a term might pop-up a short definition of the term, with links to whatever encyclopedia the user prefers (probably opening in a new tab). In this way a “link” to a definition remains evergreen by putting the job of connecting that text to a target webpage on the user’s browser, rather than the author.

Likewise, sources of quotes, jargon definitions, books, videos, stocks, music, merchandise, and more are rarely linked to directly in Utopia. Instead, authors provide enough info for user’s machines to find what’s being referenced and serve it to the user in the way that they want. As a result, digital writing in Utopia ages much more gracefully, without much link rot.

Utopian Dreams

Discussion about this post