How Anonymous is Bitcoin? A Backgrounder for Policymakers
Adam Ludwin co-founded Chain.com, a bitcoin developer platform. Prior to Chain, Adam was a venture investor in companies including Vine, Slack, Kik, and Paperless Post. In this article, he addresses the distinction between privacy and anonymity in bitcoin.
Bitcoin is often described as a way to transact anonymously. But just how anonymous is it?
First off, it is useful to draw a basic distinction between anonymity and privacy in the context of financial transactions. We will call a transaction “anonymous” if no one knows who you are. We will call a transaction “private” if what you purchased, and for what amount, are unknown.
Let’s draw a simple matrix and locate different kinds of financial transactions within it:
Cash or barter are the most intrinsically private and anonymous means of transacting.
In the opposite corner are transactions which are neither anonymous nor private. This includes, say, campaign contributions over a certain amount. We may also include in this quadrant credit card transactions: although not public knowledge like a campaign contribution, your identity is nevertheless connected to every purchase you make, and this information is available to the merchant, credit card network, issuing bank, and — if subpoenaed — law enforcement.
Certain financial transactions are private but not anonymous; for example, the donor wall at the local art museum, which identifies the names of donors but not the amounts donated.
Bitcoin, by contrast, is anonymous but not private: identities are nowhere recorded in the bitcoin protocol itself, but every transaction performed with bitcoin is visible on the distributed electronic public ledger known as the block chain.
The anonymity provided by bitcoin is at once a point of attraction and a challenge for financial regulation. As the pace of adoption of the currency grows and as it comes under scrutiny by the legal and financial systems, particularly with regard to compliance with applicable anti-money laundering (AML) statutes and know-your-customer (KYC) controls, its true level of anonymity will become an increasingly closely studied subject.
For many users of bitcoin, who access the currency through one of the popular online wallet or exchange services, their participation at the outset entails linking their personal identity to their bitcoin holdings. Bitcoin for these users is effectively no more anonymous than a bank account, although this loss of anonymity takes place at the point of entry into the currency and is not a feature of the bitcoin protocol itself.
For those who wish to take advantage of bitcoin’s intrinsic anonymity, they must find an alternative entry point, such as acquiring bitcoin in a private transaction, as compensation for goods or services rendered, or as a reward for mining. Subsequent bitcoin transactions can then be anonymous, since real-world identities are not recorded on the block chain ledger: the only identifying information recorded there are the bitcoin addresses, whose corresponding private keys are held by the owners as proof of ownership.
Maintaining one’s anonymity from this point forward, however, is in no way guaranteed: even supposing one manages to acquire bitcoins without giving up personal information, one’s real-world identity can still be discovered in the course of transacting bitcoin within the network. Let’s look at how this can happen.
Broadly speaking, deanonymization techniques pursue one of two complementary approaches, having to do with the public nature of the transaction ledger and with the possibility of exposing the IP addresses of the computers originating the transactions.
Anonymity and the transaction ledger
There is no upper limit to the number of addresses a bitcoin holder can control. All one’s bitcoins can be stored in a single address, or they can be dispersed into dozens or even thousands of addresses. Meanwhile, good practice recommends (though does not enforce) that every address be used only once: any amount left over in change from a transaction should not be kept in the old address but moved to a new one. This proliferation of addresses designedly obscures which ones are controlled by a single individual at a single point in time, and makes it difficult to track the flow of funds controlled by that individual over time.
It is possible, however, to leverage the perfect transparency of the transaction ledger to reveal spending patterns in the block chain that allow bitcoin addresses to be bundled by user. This is the domain of transaction graph analysis.
Transaction graph analysis
Transaction graph analysis applies a few tricks and some educated guesswork to link the approximately 57 million transactions taking place between 62 million addresses to a subset of the unique holders of bitcoin. It then allows transactional relationships between these bitcoin holders to be mapped.
One basic technique in transaction graph analysis involves transactions with more than one input address. By definition these inputs are controlled by the same person — and if either address appears elsewhere in the block chain then the associated transactions can also be linked to the same person.
A second technique takes advantage of the “good practice” mentioned above: if exactly one of the output addresses in a transaction has never appeared in the block chain before, then it is a good bet that the new address is the change address.
A third technique looks at the numerical precision of the amounts involved in a transaction. For example, in a transaction generating two outputs corresponding to two new bitcoin addresses, where one of the outputs is, say, 3 BTC and the other is 2.12791 BTC, then it is a very good bet that the first number corresponds to the recipient and the second number to the change. What is the chance, after all, that the change should happen to end up in such a neat figure? The address originating the transaction can thus be linked to the change address with a high degree of confidence. The same analysis can be repeated after converting to major currencies such as USD to find “whole numbers” that might otherwise be hidden in bitcoin-denominated transactions and that enable sender to be distinguished from receiver.
Address deanonymization using these methods can be thwarted by sending bitcoins through so-called mixers or tumblers, which take a set of bitcoins and returns another set of the same value (minus a processing fee) with different addresses and transaction histories, thus effectively “laundering” the coins. But these services come with serious caveats. Users must hand over control of their bitcoins and trust the service to return them. Transaction graph analysis can identify use of a mixing service and flag the user as potentially suspicious. Mixers do not work well for very large sums, unless others with similarly large sums happen to be mixing their bitcoins at the same time. Some mixing services do not work as advertised and can be reverse-engineered. Services that operate legally must keep detailed records of how the coins were mixed, which could later be hacked or subpoenaed. And the new bitcoins received might themselves be tainted by illegal activity.
Seeding the transaction graph
Transaction graph analysis by itself only reveals the imprint of individual agency in the block chain; it does not reveal any real-world identities. For this it is necessary to refer to information not contained in the block chain.
A great deal of information linking bitcoin addresses to their identities is available publicly. Businesses accepting bitcoin may place a QR code near a cash register or on a website. Others may announce their bitcoin address through services such as blockchain.info, which identifies the owners of thousands of addresses. Thousands more addresses can be harvested from public email forums when individuals include personal bitcoin addresses in signature lines to posts. This partial knowledge of identities can be combined with the transaction graph to deanonymize a swath of the transaction ledger.
Retroactive geolocation is one potential consequence of this deanonymization. Suppose a café accepts bitcoin and uses a fixed address for their over-the-counter transactions. If you are a patron of that establishment, and your bitcoin addresses become associated with your identity, then someone can easily call forth from the block chain a partial record of your personal whereabouts over time.
Conversely, suppose someone wanted to link your identity to your bitcoin address, and you happen to mention that you visited the same café for lunch that day. Someone can look up the address used by the café, find the subset of transactions on that address taking place over the lunch hour, and filter the results by price to exclude transactions involving just a hot drink. Perhaps a bit more information on what you had for lunch, and a look at the café’s menu, and the chances of making a successful match are high.
Perfect knowledge of the transaction ledger also means that any additional information discovered at a later date can be retroactively applied, allowing further pieces of the identity puzzle to be dropped into place at any time. A single disclosure of identity, even years in the future, and every transaction on that address and those connected to it is compromised.
IP address anonymity
A complementary source of potentially deanonymizing information is available to every computer that participates in the decentralized transaction network by hosting a bitcoin node. This information is the set of IP addresses of the computers that announce new bitcoin transactions.
At the time of writing there are around 6,500 nodes accepting inbound connections from other nodes, and perhaps ten times that number which don’t accept requests for connections. The former maintain connections to several dozen peers on average, while the latter typically have eight peers. Both kinds of nodes generate transactions. Transaction propagation through the node network begins with the computer that first broadcasts the event to its peers, which then forward the event to their peers in an information cascade that usually reaches every node in the network within a few seconds.
The simple observation which can be exploited is that, provided one can find a way to connect to a majority of nodes, perhaps by controlling a coordinated sub-network of nodes spread over many devices, the very first node to relay a transaction is on average the originator of that transaction. The risk increases if multiple transactions are relayed from the same IP address. While a small random delay is baked into the transaction propagation protocol to help preserve the anonymity of the original sender, with the proper techniques enough signal is available through the noise to make a positive identification in many cases. And while use of a TOR router offers some measure of protection against IP address discovery, it exposes the user to other potential attacks.
An example of this kind of IP address deanonymization made public is blockchain.info, which discloses the IP address of the first node to report a transaction to its servers. The information is only as reliable as the web site’s node connectivity: with a declared 800–900 connected nodes at the time of writing, it is probably not enough to reliably pinpoint the originating IP in all cases.
How anonymous is bitcoin today? Average users should be aware that it is certainly less anonymous than cash. Meanwhile, dedicated users willing to go through extraordinary lengths can find ways to acquire and use bitcoin anonymously, but the open nature of the transaction ledger and other unknowns leave open the possibility that identities and activities once considered perfectly secure may be revealed at some point down the road.
What about the future? As bitcoin adoption continues to increase, it is not out of the question that a technology arms race could arise between anonymizers and deanonymizers: on the one hand, increasingly sophisticated data mining schemes will be developed, possibly combining transaction graph analysis with IP address discovery, to trace the movement of funds in the block chain between individuals and across borders. On the other, improved techniques will be devised to better conceal individual identity and activity.
Here there are many unknowns. Will the core bitcoin code be modified to further protect anonymity or to facilitate regulation? Will bitcoin mixing services become pervasive and secure? Will transaction graph analysis reach a degree of sophistication where most user activities can be easily traced? Will an alternative digital currency or side chain arise which tilts the balance for or against anonymity? All we can say with certainty is that bitcoin is still in its infancy and that existing thinking and tools in the area of anonymity are still primitive. We have seen only the opening moves; the endgame has yet to be played.
This backgrounder was originally published by Coin Center, a non-profit research and advocacy center focused on the public policy issues facing cryptocurrency technologies such as bitcoin. More of their plain-language backgrounders can be found here.