# Blackjack Strategy – A Monte Carlo Method

Beat the Dealer by Ed Thorp, published in 1962 (1966 rev. ed) laid out a winning strategy for playing Blackjack through card counting.  While an earlier book Playing Blackjack to Win, by Roger Baldwin et al (1957) exists, and Thorp gives them due credit, Beat the Dealer is certainly the book that popularized the concept and sold over 1,000,000 copies.

When Professor Thorp did his calculations, computers were a little bit slower.  I’d like to think he used  the 1960’s version of a desktop computer: the Elliot Brothers 903 Elliot 903 description .

This machine had up to 64K of memory (core, not modern-day RAM), 18-bit words and a cycle time of 6 microseconds.  The desktop computer I will use to recompute the basic Blackjack strategies has a 4 GHz clock rate, or 0.25 nanosecond cycle time (that’s about 24,000 times faster then the Elliot 903), not to mention (a) 64-bit words, (b) a million times more memory, (c) many more instructions than the handful the Elliot had,  and (d) 8 cores running in parallel. Plus I can program in Python instead of FORTRAN or Assembly.

Unfortunately, Thorp did not use the Elliott.  It was probably too slow.  In his book’s acknowledgment, Thorp says he is grateful for the use of MIT’s IBM 704 computer.  That’s this one:

What I would like to do is recalculate the Basic Strategy that Thorp outlined in his book.  I will use a technique from Reinforcement Learning, by Sutton & Barlow.  Rich Sutton is the father of reinforcement learning and I highly recommend his book as a well-written, clear and easy to understand introduction to the subject.

The technique we will implement is the one labeled “Monte Carlo ES (Exploring Starts)” in Chapter 5 of the second edition of Reinforcement Learning.

First, a little introduction to what we are doing.  For the detailed description see Reinforcement Learning.

A strategy $$\pi$$ is a function that maps the current state (what we can observe) into an action that we can take.  There is a very simple (at least conceptually) way to compute how good any particular strategy is.  We can simulate many games using this strategy and compute the value of the game.  The value of the game is simply the average result that we can expect.  In our game, we will say each game can result in a -1 (loss), +1 (win), or +1.5 (win with a Blackjack, or a natural 21).  Of course, the other way to calculate the value of a strategy would be to calculate the probability of each outcome, and multiply those probabilities times the corresponding outcome – the sum of these would be the expected value of the outcome, what we call the “value” of the game.  If I give you a strategy for Blackjack, which would be a table of what to do based on what you can see at each deal of the cards, and asked you to calculate the probability of winning, you can immediately see that this would be fairly complicated.  For each deal, you would have to compute the probability of getting that deal, then the probability of getting each additional card (if you hit), and so on, each time following the strategy I gave you.  On the other hand, the Monte Carlo method is much simpler.  I just have the computer play lots of games using that strategy, and just add up the results.  I am much less likely to make a mistake in computing things, because I am just counting, not computing probabilities.  Of course, the answer I get is not exact, and I may have to do a lot of simulations to get reasonable precision, but why not let the computer do all the work, instead of me?

So we can see how to compute the value of a given strategy $$\pi$$, but how would we go about finding the optimal strategy, usually written $$\pi^*$$?  The idea is one of “policy improvement.”  If we have a policy $$\pi$$, can we improve it a little bit?  The answer is yes.  The idea is that when we calculated the value of a particular strategy, we computed a function $$q_\pi(s,a)$$, which was the expected value (or return) when starting in state s and taking action a.  The idea of policy improvement is that we make a slight change to $$\pi$$ by making it greedy with respect to the $$q()$$ function – that is

\begin{align} \pi(s) = \arg \max_{a} q(s,a) \end{align}

This says that we take the action $$a$$ at a state $$s$$ that maximizes the value, according to the current $$q$$ function.  So we could see a way to keep improving our strategy:  (1) compute the q for our best strategy so far, then (2) improve that strategy with a greedy choice of actions, and then keep repeating steps 1 and 2.  The problem with this is that the step 1 takes a long time (forever if we want an exact answer), and thus each iteration is quite exhausting.

It turns out, that we can make each step shorter:  we do, instead, the two steps: (1) move q towards the proper q for our current strategy $$\pi$$, and (2) improve $$\pi$$ slightly with respect to the new q.

This is the Monte Carlo ES algorithm and it can be shown to converge to the optimal strategy.  We start with Reinforcement Learning Section 5.3 and make some slight modifications.

We use $$Q_T$$ as the total return for a state,  action pair (over all our games) and $$Q_C$$ as the count, meaning the number of times we have visited that state, action pair.  Thus we can compute the average return as $$Q_T/Q_C$$ for any state, action pair.

In the following, $$S$$ is the set of all states, and $$A(s)$$ is the set of all actions you can take for a particular state $$s$$.

 Initialize, for all $$s \in S, a \in A(s)$$: set $$Q_T(s,a)=0$$ and $$Q_C(s,a)=0$$ for all $$s,a$$ set $$\pi(s)$$ = some random action for all $$s$$ Repeat Forever: Choose $$S_0 \in S$$ and $$A_0 \in A(S_0)$$ such that all pairs have probability > 0 Generate a game starting from $$S_0, A_0$$, following $$\pi$$, with terminal reward $$R$$ For each pair $$s$$, $$a$$ appearing in the game: $$Q_T(s,a) \mathrel{+}=R$$ $$Q_C(s,a) \mathrel{+}=1$$ Update the policy:              $$\pi(s)=\arg \max_{a} \frac{Q_T(s,a)}{Q_C(s,a)}$$

 

Our implemention looks like this in Python.  I have set it up so that it computes the strategy two times: (1) for decks=None (meaning infinite decks) and decks=1 (one deck).

#!/usr/bin/env python"""Use Reinforcement Learning to compute Blackjack strategyFollowing Reinforcement Learning: An Introduction, by Sutton & BartoCopyright (C) 2022 Solverworld    This program is free software: you can redistribute it and/or modify    it under the terms of the GNU General Public License as published by    the Free Software Foundation, either version 3 of the License, or    (at your option) any later version.    This program is distributed in the hope that it will be useful,    but WITHOUT ANY WARRANTY; without even the implied warranty of    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the    GNU General Public License for more details.    You should have received a copy of the GNU General Public License    along with this program.  If not, see <https://www.gnu.org/licenses/>.Classes Defined:State is the state of the gameDeck is the deckGamecards are 1-10, 10=facecards, 1=aceEnd Game values:  -1  # lost (-2 if doubled)  +1  # win  +1.5 #win with black jackSome issues:* If we are looking for states with zero count, they will not be in our q[][] dict  because they are only added when we count.* We are also including state counts for impossible to reach states, e.g. soft 4. This  is because we add the dictionary {'hit':0, 'stand':0} whenever we need to add just one  of these.* How do know if we are done?  What are some convergence metrics?  Could use visit counts for estimation of variance,  Look at lowest visits count states, depends on probability of visiting that state as well.  Compute a smoothed  game value?* Have not added double, splitting of pairs.  Will need to worry about handling permitted actions at each stateThis program will use the Monte Carlo ES (Exploring Starts) strategy from Sutton & Barlo,every 2 million games played, it will: output the current strategy and game value to the screen, and the* output the q-values to a pickle file* output the current strategy and game value"""import collectionsimport pickleimport randomimport argparseimport tqdmimport numpy as npimport timeimport statisticsimport copyimport sysfrom dataclasses import dataclassfrom typing import List# random.seed(1)# control thorp with global variable sutton - changes to hit on 18 vs Acesutton = Falseparser = argparse.ArgumentParser(  description="Computer learns to play blackjack",  epilog='''''')#The eval option evaluates the Thorp strategy, otherwise we calculate the optimal strategyparser.add_argument('--eval', action='store_true', help='do a policy eval')options = parser.parse_args()# A policy is a mapping State->action.  We will use State.hash() value as the key to avoid issues with# immutability, etc.  Actions are: hit, stand, double, splitactions = ['hit', 'stand', 'double', 'split']class GameOver(StopIteration):  passdef print_policy(policy, tag='', prev_policy=None):  # print out the policy, compare to prev_policy if provided  for soft in [False, True]:    print(f'TABLE soft={soft} {tag}')    for dealer in list(range(2, 11)) + [1]:      print(f'{dealer:6}  ', end='')    print()    for your_tot in reversed(range(12, 21)):      print(f'{your_tot}: ', end='')      for dealer in list(range(2, 11)) + [1]:        state = State(my_total=your_tot, dealer_shows=dealer, soft=soft)        ch = '*' if prev_policy and policy[state] != prev_policy[state] else ' '        print(f'{policy[state]+ch:7} ', end='')      print()class Deck:  # The set of 13 cards  the_cards = list(range(1, 11)) + [10, 10, 10]  def __init__(self, decks=None, shuffle_limit=12):    # Create a new deck with decks the number of decks.  None means infinite i.e. with replacement    # For efficiency, keep the deck object around and draw repeatedly from it.  It will be reshuffled    # when less than shuffle_limit cards are left.  Call start_new_game for this check to be done at the    # start of a game if you do not want a reshuffle to happen mid-game; otherwise no need to call start_new_game ever    self.decks = decks    self.cards = None    self.shuffle_limit = shuffle_limit    self.shuffle()     # Assign self.cards  def shuffle(self):    if self.decks is not None:      deck: List[int]      deck = Deck.the_cards * 4 * self.decks      # create the list of cards once so we can just pop off one for each card dealt      random.shuffle(deck)      self.cards = deck    else:      # the deck with replacement, i.e. infinite deck size.  Let's put 52 for consistency      self.cards = random.choices(Deck.the_cards, k=52)  def start_new_game(self):    if len(self.cards) < self.shuffle_limit:      self.shuffle()  def deal(self):    try:      # EAFP      return self.cards.pop(0)    except IndexError:      self.shuffle()      return self.cards.pop(0)class Game:  def __init__(self, deck=None):    # Must pass in a Deck object    assert deck    self.deck = deck    self.dealer = [self.deck.deal(), self.deck.deal()]  # only first card is showing    self.player = [self.deck.deal(), self.deck.deal()]    self.reward = 0  # will be filled in when game is over    self.player_total = 0    self.soft = False    self.update()  def start(self):    # Need to call this before playing in order to figure out if anyone wins before a play    # and we want the object to exist, so can't raise exception in __init__    assert len(self.player) == 2    dealer_total = self.compute_dealer_total()    if self.player_total == 21:      if dealer_total != 21:        self.reward = 1.5        raise GameOver      else:        # tie        self.reward = 0        raise GameOver    else:      if dealer_total == 21:        self.reward = -1        raise GameOver  def play(self, action):    # player plays action    if action == 'hit':      self.player.append(self.deck.deal())      self.update()      if self.player_total > 21:        self.reward = -1        raise GameOver      if self.player_total < 21:        return      # total is 21, so we are done (could we double down on soft 21?)    # assume stand is other, game is over    dealer_total = self.compute_dealer_total()    while dealer_total <= 16:      self.dealer.append(self.deck.deal())      dealer_total = self.compute_dealer_total()    if dealer_total > 21:      # dealer bust      self.reward = 1      raise GameOver    if dealer_total > self.player_total:      self.reward = -1      raise GameOver    elif dealer_total < self.player_total:      self.reward = 1      raise GameOver    else:      self.reward = 0      raise GameOver  def __repr__(self):    return f'dealer {self.dealer} player {self.player_total} {self.soft}'  @property  def state(self):    # state=State(my_total=self.player_total,dealer_shows=self.dealer[0],soft=self.soft,num_cards=len(self.player))    state = State(my_total=self.player_total, dealer_shows=self.dealer[0], soft=self.soft)    return state  @staticmethod  def all_states():    # return a list of all possible states    ret = []    for player in range(4, 21):    # 4-20, 21 means game is over (could we double down?)      for dealer in range(1, 11):  # 1-10        ret.append(State(my_total=player, dealer_shows=dealer, soft=False))        if player >= 12:          ret.append(State(my_total=player, dealer_shows=dealer, soft=True))    return ret  def update(self):    # update the player total and soft attributes    s = sum(self.player)    self.soft = False    if 1 in self.player and s <= 11:      s += 10      self.soft = True    self.player_total = s  def compute_dealer_total(self):    s = sum(self.dealer)    if 1 in self.dealer and s <= 11:      s += 10    return s  def result(self):    return self.reward'''Note: splitting is tricky, maybe create 2 games?  But results of those 2 games have to be combinedto give the value for the current state '''# Contain the state of a blackjack game, we want it hashable so it can be used as a dict key@dataclass(eq=True, frozen=True)class State:  my_total: int  # my total so far  dealer_shows: int  # dealer upcard  soft: bool  # whether my total is soft  # num_cards: int     #number of cards I have (important for splitting)  # def hash(self):  #  return (self.my_total, self.dealer_shows,self.soft)def print_statistics(q, tag=''):  # print some stats on the q_counts[state][action] function  values = [y for x in q.values() for y in x.values()]  hist, bin_edges = np.histogram(values)  print(f'histogram of state visit counts {tag}')  for c, edge in zip(hist, bin_edges):    print(f'{edge=:8.1f} {c:8d}')def print_some_counts(q, tag=''):  # sort the states by counts and print from each end of the list  # we have to create a dict with a single key from the q[][] configuration  # note that we will have some zero counts because we add both hit and stand  # when we create a new entry.  d = {(k1, k2): v2 for k1, v1 in q.items() for k2, v2 in v1.items()}  keys = list(d)  keys.sort(key=lambda x: q[x[0]][x[1]])  print(f'high and low counts {tag}')  for i in list(range(10))+list(range(-5, 0)):    s1 = f'{keys[i][0]}'    s2 = f'{keys[i][1]}'    print(f'{s1:50} {s2:5}  {d[keys[i]]:10}')def evaluate_progress(returns, num_games=0):  # compute mean return and some statistics for it from list of returns  # We divide returns into 10 groups of equal length, and compute the mean and stddev of the groups  n = len(returns)//10  # This little bit of zip python magic makes groups of length N  means = []  for group in zip(*(iter(returns),)*n):    means.append(statistics.mean(group))  m = statistics.mean(means)  d = statistics.pstdev(means)  print(f'GAMEVALUE {num_games} {len(returns)} {m:8.3f} {d:8.3f}', flush=True)def monte_carlo_es(count=100.0, decks=None):  # return the optimal policy pi[], smoothed game value, use exploring starts (ES)  count = int(count)  permitted_actions = ['hit', 'stand']  # because of exploring starts, we cannot compute the value of a game this easily  # we can turn off ES for 200K games every so often and use the current strategy to evaluate it  # we will also calculate the variance of the returns of 10 groups of 1/10 the games  num_games_eval = int(2e5)  eval_returns = []   # store them for eval  eval_mode = False   # in this mode we will store the returns and not do ES  # This returns the default dict to initialize a new q[state]  def default_q():    return {'hit': 0, 'stand': 0}  def update_policy(state_list, policy):    # update policy p at states using q_total, q_counts    qmax = -100    amax = None    for state in state_list:      q2 = q_total[state]      c2 = q_counts[state]      for a in permitted_actions:        if c2[a] > 0:          q = q2[a] / c2[a]        else:          q = 0        if q > qmax:          qmax = q          amax = a      # set the policy to the maximum q value      policy[state] = amax  # exploring starts to estimate optimal policy  policy = collections.defaultdict(lambda: 'stand')  # maps state->action, start with stand  prev_policy = None  # keep track of a previous one for print out comparisons  # Instead of keeping a list of the returns for each state, we will keep the totals and the count  # We do not use a defaultdict because we want to know how many states have zero counts and there are only about 360  # states anyway (18*10*2)  # q_total = collections.defaultdict(default_q)  # q_total[state][action] to value (total)  q_total = {k: default_q() for k in Game.all_states()}  # q_counts = collections.defaultdict(default_q)  # q_counts[state][action] to count  q_counts = {k: default_q() for k in Game.all_states()}  game_total = 0  game_count = 0  deck = Deck(decks=decks)  for i in tqdm.tqdm(range(count)):    # Generate a game, will raise exception when done    # Need to save the states    if i > 0 and i % 2000000 == 0:      # do some intermediate reporting      print_policy(policy, tag=f'games={i//1000000}M', prev_policy=prev_policy)      prev_policy = copy.deepcopy(policy)      print_some_counts(q_counts, tag=f'at games {i//1000000}M')      print(flush=True)      # save q to a file so can evaluate later.  Do not need policy because it can be generated from q      with open(f'q-{i//1000000}-{decks}.pickle', 'wb') as fp:        pickle.dump([q_total, q_counts], fp)      # start eval period      eval_mode = True    if len(eval_returns) >= num_games_eval:      eval_mode = False      evaluate_progress(eval_returns, num_games=i)      eval_returns.clear()    states = []    actions = []    game = Game(deck=deck)  # create a new Game, use existing deck for computation efficiency    deck.start_new_game()   # make sure enough cards to finish this game    try:      game.start()  # need to determine if anyone won before a move      while True:        if eval_mode:          action = policy[game.state]  # pick an action by the current policy        elif len(game.player) == 2:          # we pick a random action for the first play (Exploring Starts)          action = random.choice(permitted_actions)        else:          action = policy[game.state]  # pick an action by the current policy        states.append(game.state)        actions.append(action)        game.play(action)  # apply action to game    except GameOver:      terminal_reward = game.result()      game_total += terminal_reward      if eval_mode:        eval_returns.append(terminal_reward)      game_count += 1      for state, action in zip(states, actions):        q_total[state][action] += terminal_reward        q_counts[state][action] += 1      # set the policy to the max q value      update_policy(states, policy)  return policydef evaluate_policy_mcts(policy, count=100):  # First-visit MC (cannot revisit a state anyway)  # Following First-visit MC prediction for estimating V for a policy  # policy is a function that maps state to action  # Instead of keeping a list of the returns for each state, we will keep the totals and the count  permitted_actions = ['hit', 'stand']  returns_total = collections.defaultdict(float)  returns_count = collections.defaultdict(int)  game_total = 0  game_count = 0  for _ in tqdm.tqdm(range(count)):    # Generate a game, will raise exception when done    # Need to save the states    states = []    deck = Deck(decks=None)    game = Game(deck=deck)  # create a new Game    try:      game.start()  # need to determine if anyone won before a move      while True:        states.append(game.state)        action = policy(game.state)  # pick an action by the current policy        game.play(action)  # apply action to game    except GameOver:      terminal_reward = game.result()      # G=terminal_reward  # no intermediate rewards and no discount factor      game_total += terminal_reward      game_count += 1      for state in states:        returns_total[state] += terminal_reward        returns_count[state] += 1  return returns_total, returns_count, game_total, game_countdef mimic_dealer(state):  if state.my_total <= 16:    return 'hit'  else:    return 'stand'def thorp(state):  #compute the Thorp strategy  if state.soft:    return thorp_soft(state)  if state.dealer_shows == 1 or state.dealer_shows >= 7:    stand = state.my_total >= 17  elif state.dealer_shows in [4, 5, 6]:    stand = state.my_total >= 12  else:    stand = state.my_total >= 13  return 'stand' if stand else 'hit'def thorp_soft(state):  #compute the Thorp strategy for soft totals  stand_on_19 = [1, 9, 10] if sutton else [9, 10]  if state.dealer_shows in stand_on_19:    stand = state.my_total >= 19  else:    stand = state.my_total >= 18  return 'stand' if stand else 'hit'def evaluate(use_policy, set_sutton=False):  # compute game value for a policy  global sutton  sutton = set_sutton  num = int(5e6)  returns_total, returns_count, game_total, game_count = evaluate_policy_mcts(use_policy, count=num)  # returns_total,returns_count,game_total,game_count=evaluate_policy(mimic_dealer,count=num)  print(f'after running {num}')  print(f'mean game value {game_total / game_count:8.5f}')  return game_total / game_count"""  keys = list(returns_total.keys())  keys.sort(key=lambda x: x.dealer_shows * 1000 + x.soft * 50 + x.my_total)  for state in keys:    st = repr(state)    print(f'{st:64} {returns_count[state]:4d} {returns_total[state] / returns_count[state]:8.3f}')"""if options.eval:  print(time.strftime('%c'))  print('Thorp, infinite deck')  ret = []  for i in range(10):    val = evaluate(thorp, set_sutton=False)    ret.append(val)  m = statistics.mean(ret)  p = statistics.pstdev(ret)  print(f'mean={m} pstddev={p}')  print('Sutton, infinite deck')  ret = []  for i in range(10):    val = evaluate(thorp, set_sutton=True)    ret.append(val)  m = statistics.mean(ret)  p = statistics.pstdev(ret)  print(f'mean={m} pstddev={p}')  print(time.strftime('%c'))  sys.exit(0)for use_decks in [None, 1]:  print(time.strftime('%c'))  games = 250e6  print(f'decks={use_decks} games to be played={games}')  opt_policy = monte_carlo_es(count=games, decks=use_decks)  print_policy(opt_policy, tag='at end')print(time.strftime('%c'))

This is the result after running 248,200,000 games for decks=None (infinite)

GAMEVALUE 248200000 200000   -0.024    0.007TABLE soft=False at end     2       3       4       5       6       7       8       9      10       1  20: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   19: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   18: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   17: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   16: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     15: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     14: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     13: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     12: hit     hit     stand   stand   stand   hit     hit     hit     hit     hit     TABLE soft=True at end     2       3       4       5       6       7       8       9      10       1  20: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   19: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   18: stand   stand   stand   stand   stand   stand   stand   hit     hit     hit     17: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     16: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     15: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     14: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     13: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     12: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit

This is the decks=1 result.

GAMEVALUE 248200000 200000   -0.020    0.004TABLE soft=False at end     2       3       4       5       6       7       8       9      10       1  20: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   19: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   18: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   17: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   16: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     15: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     14: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     13: stand   stand   stand   stand   stand   hit     hit     hit     hit     hit     12: hit     hit     stand   stand   stand   hit     hit     hit     hit     hit     TABLE soft=True at end     2       3       4       5       6       7       8       9      10       1  20: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   19: stand   stand   stand   stand   stand   stand   stand   stand   stand   stand   18: stand   stand   stand   stand   stand   stand   stand   hit     hit     stand   17: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     16: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     15: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     14: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     13: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit     12: hit     hit     hit     hit     hit     hit     hit     hit     hit     hit

You can see that the only difference between the 2 results is the Soft 18 vs. a dealer Ace (1). For 1 deck, you should stand on soft 18 vs A, which matched Thorp’s 1966 book. For infinite decks, you should hit soft 18 vs A, which is the result that Sutton & Barto get in Reinforcement Learning (for an infinite deck).

As a quick check on our results, Thorp says that if you follow the “Basic Strategy” with no doubling or splitting, the “casino edge will be only about 2 percent.” We get a game value of -0.020, which seems like a good match.

Note that we have not included the ability to split pairs or double down, we will leave that as an exercise for the reader.

Just to see how close these two results are, here is a graph of the q value (average) for both deck types as we play the 250,000,000 games.

# DIY Hand Sanitizer – Methanol is Deadly

With the current COVID-19 crisis, most retail and online stores have been unable to keep hand sanitizer in stock. This has created a crisis for people that need to clean their hands and are unable to access soap and water. The CDC recommends hand washing for 20 seconds frequently to avoid contamination 1. If this is not possible, they recommend using a hand sanitizer with at least 60% alcohol.

Because of the shortage of commercial hand sanitizer, recipes have been circulating on social media for a homemade or DIY (Do It Yourself) version. While there is rarely any scientific standing behind these recipes, there is also a long standing recipe published by the World Health Organization (WHO). Putting aside any possible problems people may have following these recipes precisely enough to make an effective hand sanitizer, I want to address one deadly consequence that could occur from faulty or improper manufacturing practices.

It is possible that someone attempting to make their own hand sanitizer in this crisis situation makes one simple, deadly, mistake. They use an alcohol containing a small amount of methanol.

First, some background. The alcohol that is used in what are commonly called alcoholic drinks is ethyl alcohol (sometimes called ethanol). It is taxed by the Federal government (in the US) at a rate of $13.50 per proof gallon 2. A proof gallon is 1 gallon at 100 proof, or 1.25 gallons of 80 proof, and so on. Two other kinds of alcohol are isopropyl alcohol and methanol (methyl alcohol). Isopropyl alcohol is what people commonly buy as rubbing alcohol, with percentages of alcohol ranging from 70% to 95%. Methanol is sold for use as fuel in camping stoves (alcohol stoves) and as a thinner for various paints, such as shellac. The problem with methanol is that it is deadly toxic to humans. It only takes 10 mL (about 1/3 of an ounce) to cause blindness in an adult, or 30mL to cause death 3. It is the impurity in moonshine (or any improperly distilled spirit) that can make it deadly. The other property of methanol of note is that it is absorbed through the skin. In fact, it is absorbed at the rate of 2mg/cm2/hr [See update below]. That may not seem like much, except that the hand surface area is about 2% of your total body surface area, which amounts to roughly 1000 cm2 for two hands 4. It is left as an exercise for the reader to determine just how long you can rub methanol on your hands before suffering disastrous consequences, including neural toxicity, blindness, and death. Hint: not very long and not something anyone should try. There is one more important fact before we come to the punch line. In order to sell ethyl alcohol for non-drinking purposes and avoid the payment of the excise tax, the Federal government allows it to be adulterated with certain substances that make it undrinkable. This alcohol is then called “denatured alcohol” and allowed to be sold without the payment of the excise tax, which would be$6.75 per quart of pure (100%) ethyl alcohol otherwise. What are these things added, called denaturants? It depends on the purpose of the alcohol, and there are whole set of recipes pre-approved by the Federal government, under CFR (code of Federal Regulations) Title 27 Part 21 5. For example, S.D.A (Specially Denatured Alcohol) 23-H is made by adding acetone and methyl isobutyl ketone (MIK). S.D.A. 30 is made by adding methanol (approximately 9%). I hope you can now see the problem – if someone makes hand sanitizer, and uses a denatured alcohol containing methanol, the most common type sold in hardware stores, either without knowing or without caring, that is going to have deadly consequences when people rub it on their hands.

California banned the sale of denatured alcohol last year.

In case you think this is just so ridiculous that it could not happen, I have bad news. There have been cases reported in the past 6. And with the current pressure on supplies of hand sanitizer and the reported price gouging by some retailers, it is bound to become more common.

The FDA has relaxed its rules on the production of hand sanitizer, allowing pharmacies and others to produce hand sanitizer in the current situation.

Please make sure you trust your supplier when you buy hand sanitizer. Anybody can print a label and put it on a bottle – that’s no guarantee of what’s inside.

Update 13 Feb 2022:

Thanks to the astute reader who pointed out my previously incorrect estimate of hand surface area. It has been corrected. I apologize.

Also, there are variations in absorption rates reported. Two references to 0.192 mg/cm^2/min are 7 and 8. The one I used (for 2.0 mg/cm^2/hr) was 9. Please note the different time units in the two estimates.

Also, there is a wide range in the reported amount of methanol that can cause blindness. This reference gives 3.16-11.85 g/person 10. I don’t look at that and say, I guess I will ingest 3.06g because it’s under the limit! So please be safe.

# An Investor’s View of ICOs

ICO? Initial Coin Offering.  Remind you of an IPO (Initial Public Offering)?  It is meant to.  Bancor raised $147 million in a few hours. Block.one sold$185 million of EOS tokens in 5 days.  Tezos has raised over $100 million in 2 days, with 10 more days to go and no cap. There are currently almost 20 ICOs planned for the month of July. See the sites at the bottom of this post for lists of ICOs. There are plenty of analyses to be found online about the pros and cons of the various ICOs (or token sale, or whatever name they are actually called). What I want to do is give you the perspective of a private investor on these deals – and how they compare to more traditional private investments. I have made dozens of private investments in startup companies, been involved in dozens more as a member of an investment committee, and seen probably hundreds, if not thousands, of other deals cross my monitor. There are a number of dimensions along which a deal gets evaluated. Some are: • Business plan Is this a good business to be in? • Team Are these the right people to execute the plan? • Market Is the market for this product large enough or growing fast enough to be interesting? • Price What price am I paying to get in on this opportunity? • Terms What are the fine print details on the deal? As I said above, there are plenty of other places debating the business plans and teams. Today, I want to discuss the terms. Generally, these do not get much attention in a private deal. Not because they are not important, but because they are so important that standard terms have evolved over time, and people assume that these standard terms will be included. Each one has a reason for being included. ### A Private Investment Term Sheet Let’s go through a standard private investment term sheet and see what kinds of things are in there. We will call our company the imaginative name NewCo. We will call our new coin NewCoin. I am leaving out a lot of the gory technical language here; if you want to see a full model term sheet, check out the National Venture Capital Association.  Price Example: Investors will invest$2 million in NewCo, at a $10 million fully-diluted post-money valuation, including a employee pool of 20% of the post-money capitalization. They will receive Series A Preferred Stock. Explanation: Investors will get 20% of NewCo after the financing, including all other convertible securities, such as options (this is the fully-diluted part). NewCo is setting aside 20% its shares for issuance to employees in the future. The Series A Preferred Stock is really just a label – its properties are defined by the rest of the term sheet. Dividends How and when dividends might be paid. Important to prevent the company from paying out all it’s cash in dividends to the common stockholders and not giving any to the investors. Capitalization The company’s capital structure before and after Closing is set forth in exhibit A. So everyone knows who owns what. Liquidation Preference Describes how liquidation or acquisition proceeds are split with investors. Generally there is some preference to investors. In our example, if the company were sold a month after the investment, for$5 million, the investors would receive 50% of their investment (losing $1m), but the founders/employees would walk away with$4 million, without doing anything.  To prevent this, various preferences for the investors have evolved called non-participating or particpating Preferred Stock.  Generally, the first proceeds of a liquidation go back to the investors until they receive their investment back. Conversion This allows investors to convert to Common Stock when it become advantageous. This is how they can exit the deal in an IPO, for example. Antidilution This specifies how the investors purchase price can change if the company issues shares later at a much cheaper price.  Without this, there would be no way to stop a company from issuing lots of cheap shares after an investment and diluting the investor. Information Rights Details how information will be provided to investors over time, for example, monthly financial reports, annual audits, etc.  This forces the company to keep providing information in the future. Voting Rights Specifies how various votes will be taken. Registration Rights Very technical details on how shares should be registered, relative to public offerings. Representations and Warranties Statements about the company that the founders declare are true, E.g. here are all our debts, we have no pending lawsuits, etc. Board of Directors How many members will be on the board, who will they be, how can they be changed in the future.  They are generally elected by the stockholders. Future Financing How can future financings be conducted?  Will the current investors have the right to participate in a pro rata share?  Do they have the right to approve the financing? Vesting If a Founder leaves the company 2 days after financing, should they keep all their shares?  Since the investors presumably invested because of the specific founders, they would not want to to see them leave.  Typical terms might be 25% of the stock vests after 12 months, with the rest vesting over another 3 years.  Also Founder’s rights to sell shares is restricted (also note that investors generally have restrictions on their sales as well). Employee Matters Each Founder should sign a non-compete agreement, and non-disclosure agreement (NDA), and a non-solicit agreement (that they will not hire away current employees if they leave).  Also, they agree that intellectual property developed by them for the company belongs to the company (IP Assignment).

In a private investment, money goes into NewCo, which is managed by a Board of Directors, who hires the CEO.  If all the employees walk out, they don’t take the funds of the company with them, and they lose their shares according to the vesting schedule.  While obviously not what was intended, the money would still be there, and the BOD (elected by the shareholders) could hire a new CEO and team.  They could even give the funds back to the investors and call it a day.  The BOD has a fiduciary responsibility to the investors to act in their best interests.  They could not, for instance, divide the money up among themselves.

### Compare versus ICO

Let’s now look at a typical ICO and see how these items relate.

Generally ICOs specify the price you pay for each NewCoin, generally in bitcoin (BTC) or ethereum (ETH).  But each NewCoin does not get you any ownership in a business.  The business that NewCoin is in could do well, NewCoin could become very popular and fashionable, but NewCoin has to increase in value for you to see a return.  You also do not own any future IP or technology developments that the founders/developers create [compare with IP Assignment].   Since code is almost always open-source, there is nothing proprietary about it.

In an ICO, investors have no oversight or control over what the Founder team is doing. In the case of a foundation being created, this might be up to the foundation’s directors, but again, they might not be elected or answerable to the investors.  There should be something comparable to an elected BOD.

What prevents founders and developers from walking away?  Where is the vesting?.  The distribution of NewCoin to founders is generally opaque (no cap table is generally provided) so you don’t know who owns what.  If they developers want to develop a better, competing coin based on what they learned at NewCoin, can they? [compare to non-compete agreement].  To the extent founders hold NewCoin, there would be an incentive for them to help out NewCoin, but the distribution is unknown, and without vesting they might just sit back and see others do the work.

Coins kept by founders running NewCoin could be considered like the stock held by founders in a PI.  They have incentive to make the coin increase in value.  The issue is there is no lockup to prevent them from selling, no vesting terms, and no public knowledge of their ownership.  They could sell their coins and no one knows.

ICOs are somewhat more akin to a Limited Partnership where the investors generally do not have control over the operations and details, except that in an Limited Partnership (1) there are provisions for removing General Partners (GPs) in extreme cases, and (2) the GP has a fiduciary responsibility to the LPs.  An ICO should have a mechanism for providing such a responsibility.

What if the NewCoin needs more money to finish development? In a private investment, there are ways to raise additional funds, but it is not as obvious how NewCoin does this.  They could sell some of the reserved coins (if there are any and they have value), but often the supply is fixed.  In an ICO you might be stuck.

Private companies generally stage their investment.  At each stage, money is raised to reduce risk and bring the company to a milestone that greatly changes its value.  This is more capital efficient and less risky for all concerned.  Rather than raise 100M on an idea, companies might raise $1-5M in a series A to prove out an idea or concept, then raise 10-20M in Series B, and so on. Each phase should be reducing some risk. ICOs are generally a one-shot deal. There is no first proof-of-concept stage, with later money coming with results. Lists of ICOs: # Lending Rates for Bitcoin and USD on Cryptoexchanges We are going to take a look at the current interest rates in bitcoin lending. Why is this important? In addition to being a way to make extra income from your fiat (“real” currency) or bitcoin holdings without selling them, these interest rates indicate traders sentiment about the relative values of these currencies in the future. We will look at bitcoin in this post, because it is the largest cryptocurrency, both in terms of market cap and trading volume. As of the date of this writing here are the top 4 cryptocurrencies by total market cap. Information from http://coinmarketmap.com.  Crypto-Currency Market Cap ($USD, billion) 24 Hr Trading volume ($USD, billion) Bitcoin (BTC) 45 1.0 Ethereum (ETH) 32 0.78 Ripple 11 0.18 Litecoin (LTC) 2.5 0.37 The Bitfinex Exchange is one of the exchanges that allows margin trading, and also customer lending of fiat and crytpocurrencies. They call the lending exchange funding and it works with an order book just like a regular trading exchange. Customers submit funding offers and requests, and the exchange matches the orders. So, for example, I could offer$USD800 for 4 days at .0329% per day (12% annually).  This get placed on the order book.  Someone else could accept that offer, and then the loan happens.  As the lender, I get paid interest daily at the contract rate, paid by the borrower.  Now, what does the borrower do with the proceeds?  They can’t withdraw the money from Bitfinex, this is not a general personal loan.  They could use this money to buy bitcoins on margin.  There are specific rules on how much they can borrow at purchase, and how much margin they must maintain in the future.  We won’t go into the specifics of those rules in this post, but just be aware that the exchange can liquidate, or sell, a position to maintain margin requirements.  In fact, this is what caused the recent meltdown in the Ethereum market at one exchange (GDAX) Ethereum Flash Crash, as traders had their positions liquidated automatically for margin calls.

So, show me the data!

In this chart, fUSD is the funding rate as an APR for \$USD, and fBTC is the funding rate for bitcoin (BTC). These are based on a small sample of 2 day loans actually traded on that day. The first thing to note is that the rates are quite volatile, reaching highs of over 100% (USD) and 50% (BTC) during the last year. The current rates (June 2017) are around 40% (USD) and 4% (BTC).

One more thing we might want to look at.  We might wonder about the difference between the two rates and what that means.  In economics, there is something called uncovered interest rate parity, which normally looks at the difference in interest rates between 2 currencies, say the USD and the Euro (EUR).  This difference is related to the relative inflation expected between the 2 currencies in the future.  In fact, futures contracts for the currencies should have a price related to this difference in such a way as to make arbitrage not possible.  Here is the difference graph.

Note that it switches around both sides of zero.  Between approximately 15 Mar 2017 and 1 May 2017, it was negative, meaning people were demanding higher rates to lend BTC than to lend USD.  Perhaps this is an indication that people wanted to short BTC at that time – although we don’t know which fiat currency they were shorting it against.

Another thing that we can pull from the data is the yield curve.  In the bond market, this indicates the interest rates for bonds at different maturities, for example, a 30 day, 60 day, 1 year, etc.  In the case of Bitfinex, the loans can only be made for 2-30 days, so we have a more limited set of possibilities.

You can see the curves are relatively flat, but they only go out to 30 days, so we can’t say much about 1 year rates. Note that most of the volume is at the extremes, that is 2 days or 30 days, so the numbers in between are not that meaningful.

In post to come, we will look at Bitcoin futures, to see how their pricing might be related to the uncovered interest rate parity. Stay tuned.