Bitcoin and Blockchain Technology
- Commodities are fungible, transferrable, and somewhat scarce.
- Digital commodities can be transferred electronically.
- decentralized means no single party controls creation, allocation, or transfer of the commodity. Unlike dollars and WoW gold, there is no Board or company which can decide to create more.
- Every node or miner has agent -
bitcoind
ers send payments to each other by giving address and value to agent Addresses are cryptographic identifer of a public key, like 1K2CcfWYW5sBL2xSeQWXpcmjPCgoXdi36R - Agent sends signed messages to other agents, transferring coins.
- Receiving user’s agent informs them upon receipt of payment.
- What’s wrong with a centralized currency?
- From a security point of view, we’d like the usual properties (like unforgeability, theft-proof) to be enforced by code and math, rather than depending upon the goodwill of people
- From an economic point of view, some folks would like the total money supply to be predictable, as opposed to allowing a central bank to cause inflation by creating new money whenever they like.
- Putting a lot of power in one place is an invitation to abuse. Spreading that power out among multiple users can fix that.
- The most important thing a currency can do is to inspire confidence among its users.
- Bitcoin is testing a theory that more users will be willing to adopt a decentralized currency than one in which they have to trust a single person, company, or government.
- To specify a new money system, we need to describe
- how it is represented
- how it gets transferred between parties
- and how it’s created in the first place
- Scarcity is one of the important properties, but there must be more than zero of it, so we must explain how it’s created, and at what rate the money supply expands.
- Expansion is a policy decision: it helps whoever gets the new money, it hurts everyone else a little bit, it’s necessary (at least initially). So it must be managed.
- Digital money systems have traditionally fallen into roughly one of two categories
- Discrete Coin: bank issues lots of coins for a few denominations, blinded, can recognize valid coins later
- Ledger: bank tracks balance for each user, increments/decrements with each transfer
- Despite having “coin” in the name, Bitcoin is more like a ledger system.
- I’m going to explain how Bitcoin works by starting with a simple ledger system and making small incremental changes until we wind up with the actual Bitcoin protocol.
- Imagine you’re a programmer, tasked with creating the money system for the latest online game.
- You’d start with a database, since everyone loves SQL. Each user would get an account, with a balance.
- The system must handle transactions that
- transfer money between accounts, create money (perhaps by buying it with dollars, or by raiding a dungeon), and
- destroy money (selling it for dollars, or buying a Vorpal Sword)
- Your system maintains some invariants,
- like how transfers cannot create new money, and no account balance can go below zero.
- From an authority-analysis point of view, each user gets control over a facet which allows them to send credits to someone else.
- Alice can control her money
- Cannot make new money
- Cannot move anyone else’s money
- Bank gets the authority to create new money.
- What’s wrong with the ledger system? Specifically, what centralization are we trying to fix?
- Since the bank has full control over the ledger, the bank (or an attacker who compromises it) can spend any user’s money.
- The bank can also create new money any time it likes, which makes it hard to predict inflation.
- And since the only source of truth is in the bank’s ledger
- users cannot validate payments or even learn their current balance without talking to the bank.
- It is a fully online system, and bank downtime prevents anyone from using their money.
- We’re kind of used to these restrictions, but we can do better.
- So our starting point uses a centralized ledger, and the bank has complete control.
- Replace accounts with pubkeys:
- each column of the ledger is associated with a specific public verifying key
- now Alice authorizes a transfer to Bob with a signed message
- Signs with private key
- Sends to bank
- bank publishes both the transfer messages and the ledger publically
- everyone can verify the signature on each transfer message
- everyone can audit the ledger check the outputs and complain about violations
- bank’s job is to sequence transfers, reject underflows, publish log, manage expansion
- at the same time, let’s drop the ability to remove money from the system: total is monotonically increasing
- Next, allow each user to have multiple keys, created on demand.
- Bitcoin user agents routinely manage hundreds of keys.
- These are small ECDSA keys, so the effort involved is trivial.
- Carol’s current balance equals the sum of all her keys
- Her client can tell her the total and by default
- Balances can be split into different keys, or merged together.
- Pubkeys are pseudonyms: nobody needs to know A is for Alice
- This starts to help with anonymity: less obvious that A1 and A2 are related.
- Having the bank publish both the transfer messages and the resulting balance sheet is redundant.
- So we’ll make it only publish the transfers. We’ll call them transactions.
- Everyone will derive balances when necessary
- Bank’s job is to remember correct order of transactions, only permit valid ones, and manage expansion.
- Rest of the world can validate signatures and transactions, prevent underflow, and determine balances.
- Our network model has users who are sending signed transfer messages to the bank
- The bank does some checking,
- establishes the ordering,
- then broadcasts the message stream to all users (including the original sender).
- the bank also decides to give new money to certain people whenever it wants: we still don’t have control over expansion.
- Next, we enhance our transaction model.
- Each transaction consumes some inputs, creates some outputs
- each input points to some previous output
- these outputs are first-order objects:
- we’re not transferring credits from one column to another,
- we’re consuming some outputs and creating new ones for different keys
- The rule is: total outputs can’t be larger than total inputs
- Each output is “sent” to a pubkey: it’s labeled with the pubkey that must be used to claim it
- txn must be signed by keys for all inputs
- produces DAG of txns, and set of unspent outputs
- The transaction graph evolves over time
-
Let’s look at a growing transaction graph, already in progress.
-
At any given time, there are a set of unspent outputs, each with a value denominated in BTC
-
I’m eliding the previous history to keep this simple: pretend there’s a bunch of transactions above this line which results in just these three outputs.
-
Each transaction consumes some of those outputs, and produces new ones.
-
The transaction must be signed by the key which owned the input.
-
The outputs are labeled with the pubkey that will be needed to consume it.
-
This transaction transfers 12 credits from key B2 to key C1.
-
The old output is marked as being spent, and removed from the set of unspent outputs.
-
“Double-spends” are transactions which attempt to consume a previously-spent output.
-
These are invalid.
-
The first transaction to consume a given output wins, and all other competing ones are ignored.
-
Expansion (creation of new money) is caused by transactions with outputs greater than the inputs.
- We’ll explain this later.
-
Since outputs must be consumed completely, to send less than the full amount, the “change” has to go back to the sender.
-
In this example, Alice is sending 10 credits to Bob.
- Since she only has one unspent output, and it’s too big (20BTC)
- her client makes a transaction that splits it into two pieces
- returning the remainder to her in a new key.
-
Most Bitcoin transactions create two outputs
- the first transfers credit to the recipient
- the second sends the remainder back to a new key owned by the original sender.
- Does all of this transparently
-
There can be multiple unspent outputs for a given key.
-
Each transaction references specific outputs, not just keys.
-
A user’s balance is the sum of the values of all unspent outputs for all keys they control.
-
Transactions can create an arbitrary number of outputs.
-
And consume an arbitrary number of inputs
- either for consolidation
- or to make a large payment by combining several smaller ones.
-
At each stage, the transaction log contains a set of unspent outputs
-
This set grows and shrinks over time.
- So, how far does this get us?
- We rely upon the bank to provide all users with an append-only ordered transaction log
- We need it to be ordered so we can resolve double-spends (define which one came first).
- If everyone sees the same log, then double-spends and other validation failures are prevented.
- READ SLIDE
- The bank could still publish different logs to different people, so our next step is to decentralize the transaction log and remove our reliance on the central bank
- Now we need to decentralize this transaction log
- First we need to be concrete about the transaction format.
- Each transaction is serialized to a string and hashed to get a txnid.
- Txns reference each other by txnid and output number.
- This forms a DAG of transactions.
- Our goal is to have all users agree on the current txn graph. In particular, disagreements about which outputs have been consumed would allow double-spends.
- Aggregate txns into blocks.
- Every 10 minutes, take all the current valid transactions and bundle their serialized forms together into a block.
- Building on top of each other
- The rule is that each block may only contain valid transactions that reference an output in some previous block.
- Each block is like a page of the ledger book.
- You can imagine loose transactions like individual post-it notes with the transaction details
- every once in a while you paste them all into a new page of the book.
- Each block is just a bunch of serialized transactions and a header.
- Again, we can serialize it to a string and hash it to get a blockid.
- The header includes a copy of the previous block’s id. Since these are cryptographic hashes, this forms the blocks into a strong chain.
- Now if we can agree on the current block ID, we also agree on the full contents of the transaction log.
- It’s computationally impossible to insert or remove a single previous transaction, since that would change the hash dramatically
- This mostly reduces the agreement problem down to agreeing on a single blockid.
- So what do we have now: ..
- READ SLIDE
- So why do we need to agree on the current block?
- To discover double spend attack
- To double his spending power, the attacker needs each recipient to see a different block chain:
- the conflicting transactions each appear in only one chain.
- Nobody (including Alice and Bob) would accept both transactions in the same chain.
- The attacker needs to take delivery of whatever they’re buying before the victim learns about the other history chain.
- Keep them in two separate worlds for long enough
- So we need some means of decide which chain is “correct”, which both Alice and Bob can use to converge on a single history.
- To agree on which block is current, imagine a voting protocol
- Everybody performs the same computation, they accept each others results if they match.
- Here’s where it gets clever.
- Each block header also contains a large counter, included in the hash.
- We require the Block ID (hash of data) must be close to zero: a partial hash collision.
- Trivial to verify, requires predictable amount of work to generate.
- Difficulty Factor controls how close it must be, which controls how much work is needed (on average) to find a block.
- 83 petahashes takes (thousands of GPUs/FPGAs/ASICs) an average of 10 minutes.
- This process is called “mining”: keep trying, maybe you get lucky.
- Miners accept the valid block chain with the most work.
- That means every transaction is valid
- there are no double-spends
- all the rules of the economy are being followed.
- Bitcoin miners spend all day long incrementing counters and computing hashes
- trying to find that partial collision
- trying to extend the longest chain by one more block
- The current block chain is about 243k blocks long, and contains nearly 1.5 zettahashes (10^21) of work.
-
Race happens when two miners find a block at the same time,
-
or at least outside of each others light cones:
- they find a block before hearing about the other’s discovery
-
Since block discovery is a Poisson process parameterized by the number of hashes-per-second being generated,
- the chances of an accidental race are fairly low, and linear in the diameter of the network.
- The faster we can get information out, the less chance there is of a fork
-
Some folks hear about one block first, the rest hear about the other block first.
-
This results in two cliques: some working on top of block 3a, the rest working on top of 3b.
-
Eventually, somebody will find the next block. The larger clique has a better chance, but it doesn’t actually matter. The chances of hitting simultaneous discovery twice in a row are very low.
-
They broadcast their block to the network
-
Even the miners who were working on the competing block4a will give that up to work on block5.
-
Now you have all (or at least most) of the miners working on the same thing.
-
The vestigal chain is abandoned and forgotten.
-
So much work accumulates on the “real” chain that the alternatives can never catch up.
- Remember that to pull off a double-spend attack, the attacker needs to present multiple distinct valid block chain to his different victims.
- This becomes impossibly hard when the majority of miners are working on the same chain.
- So to maintain the integrity of the transaction log, we need lots of people to mine.
- But that costs real-world electricity
- So we need to give them something as an incentive.
- Also, we still haven’t explained how to create BTC in the first place, and how to choose some fair way to distribute it.
- (a side note: distribution doesn’t need to be fair, by the Coase Theorem, but few people will use a system that feels unfair)
- Since the only thing we have to offer is BTC, let’s pay miners with bitcoins.
- Bitcoin’s rules state that each block is allowed to have one “coinbase” transaction: zero inputs, one output, for a specific quantity of BTC.
- This is a “reward” for finding the block, currently 25BTC per block.
- In exchange for doing f(DF) hashes, miners get (on average) 25BTC.
- Rational participants compare expected future value of those BTC against costs and mine, or don’t.
- So miners are incentivized to enforce the rules of the economy
- Their reward is contingent upon other miners agreeing with their decisions
- If they do it wrong, other miners will ignore their block
- so the reward transaction won’t be in the block chain that everyone else recognizes
- so they won’t be able to spend it in the future.
- That means they’re going to be very careful about the transactions they put in their blocks, and following the rules themselves
- any violations will cost them the reward (and means they’ve wasted a lot of work)
- Miners are effectively betting their CPU cycles that other miners will accept their decisions, and receiving BTC as a payoff
- They maximize their chances by following the rules.
- There’s a lot of game theory going on here
- Trying to break the rules will get you ignored
- Following the same rules as everyone else will help you get blocks in the chain that everyone else follows.
- Clever incentive alignment here. Do integrity-protection work, get paid in BTC.
- Do it wrong (either accidentally or trying to cheat), and you’ve wasted CPU cycles.
- Potential attackers are incentivized to follow the rules too.
- A rational attacker deciding whether to spend their CPU time trying to build an alternate chain, or mining normally, will probably get a better average return by cooperating.
- The block chain causes an interesting side-effect that’s fairly unique.
- Bitcoin transactions have a “maturity” that improves with age, like wine, cheese, and teenagers
- Since the attacker’s task of producing a longer block-chain with their alternate history gets linearly harder as more blocks are added to the real chain
- the recipient can choose between attack resistance and settlement time.
- if someone buys a sandwich from me, I could give it to them quickly
- if they buy a car from me, I’m going to wait an hour or two for chain to grow
- just like waiting for a check to clear
- parallels the real world
- Initial reward of 50BTC per block drops in half every four years (210000 blocks)
- Total amount of BTC asymptotically approaches 21M over time.
- Interesting from an economic/money-supply POV: fixed, predictable expansion, eventually 0%.
- This removes one variable that drives inflation.
- Raises the question: what incentives do miners have when the reward goes away?
- Open to discussion after the talk
- Transaction rules said that the outputs can’t be greater than the inputs. It doesn’t actually require them to be equal.
- A txn is said to pay a fee when the outputs sum to less than the inputs
- leftover goes to miner who finds the block
- specifically: the rules say it’s valid to have a reward transaction which includes the sum of the fees
- in the long run, clients will need to provide a fee to get their txns published
- choose tradeoff between small fee and quick service
- market should stabilize upon an average fee that covers miner’s expenses plus reasonable profit
- Coke makes a fraction of a penny off of each can of soda
- economic velocity
- So that’s the basics of bitcoin.
- clients broadcast signed transactions
- miners collect valid transactions into blocks, perform proof-of-work hashing on blocks to obtain BTC rewards.
- the PoW on the block chain makes it expensive to create lengthy alternate history, so double-spends are not economical
- What properties do we get from this?
- READ SLIDE
- There are a lot of clever tricks lurking in the code.. here are some highlights.
- valid-block-rule adjusts DF upon how quickly blocks are found, every 2016 blocks (2weeks), to bring back to 1/10min
- In particular, miners only accept blocks that meet their current expectation of the DF, which is reset every 2016 blocks according to the timestamps inside the blocks.
- There’s an NTP-like protocol to deal with inaccurate local clocks.
- READ SLIDE
- Graph is difficulty vs time of blocks
- DF/mining marker: eventually, an equilibrium will be established, between how efficiently miners can operate and how much reward they expect to receive
- More miners, faster/cheaper hardware, software improvements, all get factored into the equilibrium point.
- Bitcoin relies upon the unforgeability of 256-bit ECDSA signatures and the lack of full preimages in SHA256
- It uses the collision-free properties of SHA256 as a proof-of-work function, but could tolerate partial breaks
- if miners could find partial collisions more easily, the difficulty factor would rise to compensate.
- Internally numbers are represented with a 64-bit integer, and the UI divides by 10^8 before displaying anything.
- READ SLIDE
- Zooko says if whole global economy were in BTC, 10 nBTC=$0.01
- Bitcoin eschews central control of expansion as a policy tool to control inflation
- While the money supply is eventually fixed, inflation could still be caused by expansion of the economy
- And deflation could be caused by money being lost over time (as private keys get forgotten).
- GPUs are about 10x more efficient than CPUs, in hashes-per-joule
- FPGA 10x GPU, ASIC 5x-30x FPGA
- current ASICs only yield about one block per month, GPUs one per year.
- Pools make the return more uniform.
- 4 largest pools provide about 75% of total hash power.
- The largest exchange moves about $1M/day.
- Paste 1K2CcfWYW5sBL2xSeQWXpcmjPCgoXdi36R into the blockexplorer to see a sample transaction.
- Five chars
- Relatively Scarce
- Easily Recognizable
- Divisible
- Fungible (substitutable)
- Portable
- All transactions are public, but the relationship between keys and people is not always easy to determine
- txns are flooded, so IP address is not too useful. Client could run over Tor.
- if ATMs recorded serial numbers on bills emitted, and merchants recorded serial numbers and customer names, cash would be traceable too. They don’t, so cash is generally anonymous.
- paste this address into the block explorer to see a transaction in detail: 1K2CcfWYW5sBL2xSeQWXpcmjPCgoXdi36R
The cryptocurrency that recently broke $6000. Bitcoin is a sophisticated system in that it’s complex in its simplicity, and today I’ll try to explain how blockchain technology works and how it is the logical next step in the computing world.
This talk is taken straight from Brian Warner when he gave his talk Bitcoin: A Technical Introduction. You can follow along with the slides that are posted on his personal site: lothar.com.