Tony Arcieri is a software engineer and cybersecurity expert working on the the platform security team at Square and a hobbyist cryptographer.
In this opinion piece, Arcieri sets out to clarify the often generic usage of the term 'blockchain' across the industry and in the media, and seek a definition of what a blockchain really is. The result is an enlightening examination of the workings of bitcoin itself.
At first there was bitcoin: the world’s most successful cryptocurrency to date. But lately there has been more and more talk about "the bitcoin blockchain", "the blockchain", "blockchain" or "blockchain technology".
Bloomberg reports that Nasdaq is seeking to show progress using the much-hyped blockchain. LWN notes The Linux Foundation recently announced a project to "advance blockchain technology". The Washington Post lists bitcoin and the blockchain as one of six inventions of magnitude we haven’t seen since the printing press. VISA, Citi, and Nasdaq have invested $30m into a blockchain company.
VCs have invested $1bn in the bitcoin ecosystem. Bank of America is allegedly trying to load up on "blockchain" patents. The Bank of England says there's "buzz around blockchain" and is curious what you’d use "blockchain" for.
It seems “blockchain” is becoming an increasingly generic term, like "cloud" or "cyber".
A new breed of snake oil purveyors are peddling “blockchain” as the magic sauce that will power all the world’s financial transactions and unlock the great decentralized database in the sky.
But what exactly is a “blockchain”?
Let’s turn to the definitive source, Satoshi Nakamoto’s seminal paper “Bitcoin: A Peer-To-Peer Electronic Cash System” and look for the first reference to “blockchain”. Hmm, there doesn’t seem to be one.
The paper contains multiple references to a "proof-of-work chain", and one reference to a “chain of blocks”, but other than that neither “blockchain” or “block chain” ever make an appearance in the bitcoin paper.
So if it’s not defined in the bitcoin paper, what does “blockchain” actually mean?
I’ve asked a lot of people this question, ranging from renowned cryptographers and distributed systems experts to bitcoin enthusiasts to people not particularly versed in either bitcoin, distributed systems, or cryptography. No two people have ever given me the same answer.
I can try to take a crack at the question myself.
Here are the interesting properties of the bitcoin “blockchain” as I see them:
- Replicated log: Bitcoin uses a transaction log which is replicated from the winning miner to all of the peers in the network. Log-based replication is an increasingly popular tool for building distributed systems, and is used by many databases and message queues.
- Merkle tree: The bitcoin paper describes incorporating Merkle trees into the interior structure of blocks, but overall I think "Merkle tree" describes the structure of bitcoin’s replicated log / “proof-of-work chain”. While not described in the paper as such, I would argue that the overall structure of the bitcoin “blockchain” is effectively a very flat-looking special case of a Merkle tree. Perhaps the flat, log-like structure (which feels a bit like a fast-forwardable git history) is why “Satoshi” chose to describe it as a "chain" in the bitcoin paper. That said, I think Satoshi’s expertise around Merkle trees is generally questionable: bitcoin’s Merkle trees previously had oddly broken behavior (CVE-2012-2459) and utilize a "naive" construction without type flags for leaf versus interior nodes, leading me to believe Satoshi is not an academic cryptographer (the bitcoin paper is also lacking in details around the structure of the "blockchain" which are typically present in academic papers on cryptographic protocols). If we take the hash-based structure of the “blockchain” independent of the use of a proof-of-work function, I think it largely resembles Merkle log proofs as used by systems like Certificate Transparency.
- Decentralized 'consensus by lottery' using a proof-of-work: The real innovation of bitcoin, in my opinion, is the use of a lottery-like mechanism to decide the next "block" to insert into the "Merkelized" replicated log, specifically the proof-of-work function and difficulty ratcheting mechanisms that increase the amount of work required in response to the number of miners working on the problem. Indeed the paper talks quite a bit about a “proof-of-work chain”. However, in discussing the definition of “blockchain” with several people, whether or not a “blockchain” necessarily includes a proof-of-work was one of the most contentious topics. There are several reasons why "blockchain" advocates may want to distance themselves from being necessarily tied-by-definition to a proof-of-work function, which I’ll cover later in this post.
- 'Transactions' authenticated with public-key cryptography: Bitcoin uses an elliptic curve practically no one else uses called secp256k1 (the rest of the crypto world has largely moved on to Curve25519) to digitally sign all transactions with an algorithm called ECDSA. But really it’s not the elliptic curve or signature algorithm that are important (for what it’s worth I don’t think bitcoin chose particularly good ones), so I think it’s silly to base the definition of a "blockchain" on, for example, the use of elliptic curve cryptography and ECDSA (especially as there’s interest in the bitcoin community in moving to Schnorr signatures). In my opinion, a "blockchain" is defined by the use of public-key cryptography in general for authenticating transactions. What’s a “transaction”? The word “transaction” has a very specific meaning in both databases and finance which I’ll go into below. Bitcoin meets a limited definition of one but not the other.
- Public decentralized transaction ledger: Bitcoin "solves" one particular problem: decentralized public transaction ledgers. "Blockchain technology" as it exists today in bitcoin is effectively a decentralized reconciliation system which maintains a global transaction ledger without a central authority. There are many proposals to use the bitcoin blockchain for other purposes, which I’ll discuss below.
- Broadcast protocol: Bitcoin broadcasts all transactions to all nodes in a peer-to-peer system. This has a lot of interesting properties (and is an idea I explored in my experimental messaging system "Confusion") but has inherent scalability limits.
- Scripting language or 'smart contracts': This is a very cool feature I will acknowledge exists and give a quick hat tip to Ethereum, but I will not be discussing it in this post. I think "smart contracts" can exist outside of blockchains and that not everyone using "blockchain technology" is necessarily interested in them. Apologies if you think these are what make the blockchain the blockchain but I don’t, and may address this subject in depth in a subsequent blog post if there’s enough interest.
Blockchain or not
When we look at the list above, what makes bitcoin unique? To me, it’s really about the "proof-of-work chain" approach to creating a replicated transaction ledger.
So as far as I’m concerned, as soon as we remove the "consensus-by-lottery" using proof-of-work part of the "blockchain", it ceases to lose meaning and lapses into a much more general set of ideas which solve a similar class of problems but have been in use for decades, are distinct from bitcoin, and are in no way "blockchain technology".
I would argue the etymology of "blockchain" can be traced to a sort of mutated, colloquial term for Satoshi’s original "proof-of-work chain" concept, and that as soon as you move beyond consensus-by-proof-of-work you are no longer using a "blockchain".
That is to say: I think systems which are not transaction ledgers and do not use bitcoin’s consensus-by-lottery using a proof-of-work function approach are not "blockchains".
I’ll again call out Certificate Transparency again as a system which has many of the same properties as the bitcoin blockchain, but which I would not define as a “blockchain” and whose creators would probably not describe it as a “blockchain” either.
The world’s worst database
Would you use a database with these features?
- Uses approximately the same amount of electricity as could power an average American household for a day per transaction.
- Supports 3 transactions per second across a global network with millions of CPUs/purpose-built ASICs.
- Takes over 10 minutes to “commit” a transaction.
- Doesn’t acknowledge accepted writes: requires you read your writes, but at any given time you may be on a blockchain fork, meaning your write might not actually make it into the "winning" fork of the blockchain (and no, just making it into the mempool doesn’t count). In other words: "blockchain technology" cannot by definition tell you if a given write is ever accepted/committed except by reading it out of the blockchain itself (and even then).
- Can only be used as a transaction ledger denominated in a single currency, or to store/timestamp a maximum of 80 bytes per transaction.
But it’s decentralized!
While bitcoin does a reasonable job of modeling financial transactions denominated in the one and only one cryptocurrency that is bitcoin, it generally fails to live up to the ideals of a “transaction” in databases, and what it manages to do comes at an incredible cost in terms of electricity and time.
Bitcoin fails to achieve the properties of byzantine fault tolerance, which is perhaps a bit unreasonable to ask in order for bitcoin to be considered sound, but from a less formal perspective bitcoin has no acknowledgement protocol for accepted "transactions" beyond reading your current view of the "blockchain", and because bitcoin’s "consensus-by-lottery" mechanism is inherently racy by design (who can solve the proof-of-work the fastest?
We’ll call that an accepted write. Uh-oh, two people solved it at the same time, we can never be quite sure that a particular transaction we don’t yet see in the blockchain will eventually be committed (and no, the mempool is not some magical band-aid that can solve this problem).
Compare this to pretty much any database or real-time payment system in the world, where getting a speedy "ack" (or error) of some sort, and having it mean something, is considered a basic feature. Even MongoDB can do better than this.
As a side-effect, bitcoin can also be used as a decentralized “timestamping”/audit log service (as noted in the original paper), however there are more efficient protocols which can solve the decentralized audit log problem.
Yet again I’ll look to Certificate Transparency, which solves the problem of verifiable audit logs without the use of a proof-of-work function, making it much easier and less computationally intensive to append to, query, and audit. For these reasons, I specifically call out bitcoin’s blockchain as being most noteworthy as a decentralized ledger, and nothing else.
Before bitcoin, the state-of-the-art in decentralized reconciliation over the Internet generally involved SCPing around GPG encrypted batch settlement files and processing them with zSeries mainframes. This is slow moving, not easily auditable, and clearly leaves a lot of room for improvement.
Bitcoin was a great demonstration of what is possible. But as the entire bitcoin ecosystem approaches a gross payment volume size nearing that of single top 10 US retailer (and about 1/10,000th the transaction volume of VISA), the "publish all transactions to everybody" approach bitcoin uses is starting to show its limits.
Bitcoin’s scalability is ultimately limited by the number of transactions that can fit in a block and the rate at which blocks are published to the network, and the fight over a switch to a larger block size has grown increasingly dramatic.
But even if bitcoin adopts a larger block size, the fact it’s already hitting scalability limits despite its comparatively small transaction volume does not bode well for the "blockchain" approach, especially as "blockchain technology" is being touted as a potential solution for systems which operate at multiple orders of magnitude higher transaction volume than bitcoin.
The central problem (pun intended, sorry) is that, despite claims of being "decentralized", the blockchain represents a single ledger which is global to the entire bitcoin ecosystem. It seems Satoshi’s back-of-the-napkin math doesn’t really work out, and publishing all transactions to everyone is expensive in terms of bandwidth and storage.
There are attempts within the bitcoin ecosystem to address this deficiency, for example blocks could be made larger as proposed in Bitcoin XT, or some transactions could be moved to “sidechains” as proposed in systems like the Bitcoin Lightning Network. But the Lightning Network is useful only for a ledger that is denominated in bitcoin, and we still have to deal with the "central" bitcoin blockchain, whose size is likely to continue to increase despite the addition of various "sidechain" mechanisms.
(Edit: Several people have pointed out the Lightning Network allows for offline transactions and that it's not a “sidechain”, and also that the Lightning Network can support non-bitcoin denominated transactions.)
For solving the general problem of over-the-Internet decentralized reconciliation though, we’ll need “blockchains” denominated in currencies other that bitcoin too. But now we have a new problem: how do we exchange different currencies or other financial instruments between blockchains denominated in different currencies?
While this problem may appear to have a straightforward answer, it becomes a bit more difficult when you take into account that moving money between ledgers actually involves integrating with those ugly legacy systems I was talking about earlier which can already move money denominated in "legacy" fiat currencies.
Turning your bitcoins into cold hard cash denominated in the currency of your choice is perhaps the cryptocurrency’s biggest problem beyond scalability (see Mt Gox and the many thefts related to shady bitcoin exchanges on /r/sorryforyourloss).
The solution to all of these problems requires taking a step back from bitcoin and re-evaluating the actual problem we wish to solve. The "proof-of-work chain" approach used by bitcoin is ultimately trying to solve a distributed consensus problem, where we have many parties who want to reconcile a transaction ledger over the Internet.
Bitcoin uses digital signatures to ensure the integrity of each transaction, and via proof-of-work manually selects an authority to decide which transactions are included in a particular block.
However, there are far more efficient distributed consensus algorithms than this which don’t involve a proof of work. So perhaps we should consider those.
Decentralized ledger protocols
Next-generation decentralized transaction ledgers are a topic I’ve blogged about before, but as this is a quickly evolving field some of my “picks” have changed.
I would like to call out the following projects as ones that are interesting to me today:
- Interledger: a protocol for making payments across different payment networks developed by Ripple Labs. Interledger uses escrows to handle movement of funds between ledgers which effectively provide the same function as bitcoin exchanges but as first-class citizens within the Interledger network. The Interledger protocol is formally modeled using TLA+, also used by Amazon for building mission-critical systems. Per the paper: “Unlike previous approaches, this protocol requires no global coordinating system or blockchain.”
- Stellar SCP: a formally modeled distributed consensus algorithm designed for Internet-scale operation, which provides global agreement among localized “quorum shards”. SCP provides distributed transaction ledgers denominated in the currency of your choice. Stellar plans to launch a cryptocurrency called “Lumen” using the protocol.
My Death of Bitcoin blog post also touched on the idea that the blockchain could be subject to incremental refinement in the same way the Watt steam engine massively improved on the previous Newcomen steam engine.
A few months later we saw exactly that with Bitcoin-NG (paper), a protocol that inverts the ordering of bitcoin consensus, in which a miner is first elected leader by winning the proof-of-work "lottery" by mining a "key block", and then once elected leader becomes a transaction broker who can mint “micro-blocks” via digital signature until the next leader is elected.
Decoupling leader election from the publishing of transactions allows the overall system to have a much higher throughput as the rate new transactions are published is no-longer coupled to the rate at which the proof-of-work problem is solved.
I’ve referenced Certificate Transparency several times in this post, but it has a few drawbacks: it’s a point-solution specifically for the purpose of X.509 certificates, and as a system that merely logs and audits what certificates CAs provide to it, it has no consensus protocol and therefore cannot be used for things like finding the canonical certificates for a given domain name.
For what Certificate Transparency is trying to accomplish, this is perfectly fine. However, given the several years they’ve spent working on it, it feels like a bit of a shame that it only serves the purpose of authenticating X.509 certificates when the general idea behind it seems much more powerful. This is perhaps how people feel about “the blockchain” when they see it applied only to bitcoin.
Cothority is a framework for building collective authority systems using a Merkelized log a la CT, a consensus algorithm, and threshold signatures (using Ed25519 for threshold Schnorr signatures).
By combining the ideas of consensus systems with a CT-like witness protocol, it provides a generalized framework for auditable decentralized trust and consensus which can be used for many of the same things people are pitching “blockchain technology” for without the need for a costly proof-of-work-based “consensus by lottery”.
These two images are taken from Philipp Jovanovic’s 32C3 talk on Cothority, where they were juxtaposed as two potential solutions to the same problem. While I think Bitcoin-NG is a brilliant optimization on the original bitcoin design (and one I’d strongly suggest bitcoin adopt some variant thereof), Cothority provides many of the same properties without a proof of work function.
Blockchain! Blockchain! Blockchain!
Lately I’ve seen a lot of systems of the sort I previously wouldn’t have classified as “blockchains” (because they do not use a proof-of-work chain) who previously seemed to be distancing themselves from bitcoin and the proof-of-work approach go FULL BLOCKCHAIN:
Have you accepted THE BLOCKCHAIN into your heart?
This is Tendermint, a protocol I highlighted in my last blog post as being based on a proof-of-stake system and distributed consensus protocol, as opposed to a proof-of-work scheme like bitcoin. Now there is no mention of proof-of-stake anywhere on their web page.
Is Tendermint a blockchain? I guess there’s no question about it now! Whatever Tendermint turns out to be, its creators leave little room to doubt that it is, indeed, a BLOCKCHAIN (blockchain blockchain).
Hyperledger, another protocol I highlighted in my last blog post, has also undergone a blockchain makeover. Their old web site now redirects here (as reported on by LWN), where we no longer see any mention of "Hyperledger", just Enterprise Blockchain 2.0 technology!
With this much ambiguity as to the actual definition coupled with hyperrepetition, "blockchain" is fast on its way to becoming the new “cloud” – one of those words whose actual meaning is nebulous and unspecific, but whatever it is it must be so important people can’t shut up about it!
Now don’t get me wrong: I like Hyperledger and Tendermint, or at least, I thought I did. Per my personal rubric above though, neither of these systems are "blockchains" because they do not use a proof-of-work-based consensus protocol.
The great database in the sky
The great thing about a nebulous term is that it knows no limits.
What can’t you put in the blockchain? Perhaps we could encode Wikipedia into the blockchain, or store the entire archive of Netflix videos in the blockchain. All of Archive.org could go in the blockchain. We could move the entire World Wide Web into the blockchain so all web pages are permanent and live forever.
The only real question is: What can’t you put in the blockchain?
Well, the answer is: not much. The bitcoin blockchain’s ability to store data is greatly limited by its "publish everything to everyone everywhere" nature. 80 bytes per transaction is pretty much the limit, and the system is already hitting scalability bottlenecks at a relatively modest scale.
To go beyond that, we need a different protocol. We can’t just throw "blockchain technology" at the problem. The relevant algorithms do not exist in the bitcoin codebase. We need a different protocol.
This is a problem many people have tried working on for a very long time. I’ve blogged about it before. There have been many pretenders to the throne: Xanadu, FreeNet, GNUnet, MojoNation/MNet, Tahoe-LAFS, OneSwarm, BitSpray, MaidSafe, IPFS. I’ll note MojoNation specifically as a system that tried to tie storage service to a cryptocurrency.
So far the leading technology for the decentralized database seems to be BitTorrent, which dominates Internet traffic. But it doesn’t make for much of a database, only a blob store. Perhaps you’re now thinking: TorrentChain! Yeah, that’s been tried. But I don’t think the great database in the sky is going to be unlocked by cobbling together disparate parts into a Rube Goldberg apparatus.
Believe me that I would like to see the craziest fantasies of what people hope to accomplish with decentralized systems realized. But the blockchain is probably not the technology that is going to do it.
I feel “blockchain technology” has not delivered a lot of practical value: compared to most payment systems, the value bitcoin moves, and the transaction rate, are both rather insignificant (and bitcoin is all that matters – all other blockchain-based systems move practically nothing by comparison).
Bitcoin is hitting scalability limits under a relatively modest payment volume.
The only thing I think "blockchain technology" has actually delivered on is hype: a press release with "blockchain" in the title garners media attention. (I direct you back to the opening paragraph of this post if you doubt that).
Old financial institutions recruiting for “blockchain” positions are a lot more likely to find talented engineers than if they have job requirements to maintain decades-old legacy systems. I won’t dispute that "blockchain" is pretty much guaranteed to engender a lot more excitement in your average engineer than "ledger", "reconciliation", "settlement" or "notarization".
In the meantime, "blockchain technology" advocates need a litany of big-name positive endorsements of “blockchain” to lend credibility to the idea, even if it’s little more than expressing interest in the concept.
Thus we wind up with a positive feedback loop of hype without anyone actually delivering on anything valuable.
That’s not to say that the idea of decentralized transaction ledgers and timestamping systems lacks merit, but I don’t think copying and pasting Satoshi-and-friends’ codebase all over the place is the best way to go about solving the problem.
In Blockchainiac terms, I don’t want there to be “on-chain” and “off-chain”. I want “sidechains all the way down”. I want systems that are built from the ground up to support that model. Bitcoin doesn’t scale. Decentralize the blockchain!
I want protocols that are formally proven to come to consensus correctly, not protocols that are formally proven to be broken.
I want each transaction to use less electricity than I do in a day. Much less. I want the entire system to use a lot less electricity than the entire nation of Ireland.
I want more than 3 transactions per second.
I want consensus faster than every 10 minutes. Ten seconds is a lot better.
The most interesting ideas I’m seeing are coming from people who describe their protocols as requiring no blockchain.
I worry the media are giving undue attention to questionable ideas simply because there’s a lot of "buzz around blockchain".
I worry that the hype surrounding the “blockchain” might lead those who award research budgets to favor blockchain-based solutions over those that are blockchain-free.
I worry financial institutions might pick a "blockchain"-based solution where a blockchain-free solution might be, by all quantitative metrics, better in every regard, simply because they’ve heard what a big deal “blockchain” is.
But perhaps my concerns are overblown, and this is just a giant semantic argument. Maybe "blockchain technology" is just becoming a meaningless all-encompassing umbrella term for decentralized protocols.
Can it do ledgers? Sure! Data? Why not? Computation? Smart contracts baby!
Perhaps "post-blockchain" protocols will start branding themselves as “blockchain technology” just to stay relevant.
"Cyber" is starting to grow on me, so why not "blockchain" too? Who needs a metaverse? I’ll see you on the blockchain.
This article originally appeared on TonyArcieri.com and has been republished here with the author's permission.
You can follow Tony Arcieri on Twitter
Conformity image via Shutterstock