How does that limit for e work?

You may have seen the following equation:
\begin{align}
\lim_{n \rightarrow \infty} \left( 1 +\frac{1}{n} \right)^n = \mathrm{e}.
\end{align}

which expresses the fundamental constant e as a limit. Wondering how that can be proven?

Consider this graph:

One key fact we will use is

\begin{align}
\int_1^{x} \, \frac{1}{t} \mathrm{d}t = \ln x
\end{align}

Looking at the graph, you can see that the area under the \(\frac{1}{t}\) curve on the interval \((1,1+\frac{1}{n})\) is bounded by

\begin{align}
\frac{1}{n}\left( \frac{1}{1+\frac{1}{n}} \right) \leq \int_1^{1+\frac{1}{n}} \, \frac{1}{t} \mathrm{d}t \leq \frac{1}{n} \left( 1 \right)
\end{align}

Using the definition of \(\ln x\) above, we have

\begin{align}
\frac{1}{1+n} \leq \ln \left( 1+\frac{1}{n} \right) \leq \frac{1}{n}
\end{align}

Raising \(\mathrm{e}\) to the power of each term,

\begin{align}
\mathrm{e}^{(\frac{1}{1+n})} \leq 1+\frac{1}{n} \leq \mathrm{e}^{(\frac{1}{n})}
\end{align}

Raising each term to power \(n\),
\begin{align}
\mathrm{e}^{(\frac{n}{1+n})} \leq \left( 1+\frac{1}{n} \right)^n \leq \mathrm{e}
\end{align}

As n approaches infinity, the left hand term approaches \(\mathrm{e}\), therefore
\begin{align}
\lim_{n \rightarrow \infty} \left( 1 +\frac{1}{n} \right)^n = \mathrm{e}.
\end{align}

QED.

Linear Solve, Part 1

Note: This article assumes you know about matrix and vector math. If not, you might want to peruse an introduction to the subject [note self: write one].

A set of linear equations in 2 unknowns \(x_1, x_2\)

\begin{align}
b_1& =a_{11} x_1 + a_{12} x_2 \\
b_2& =a_{21} x_1 + a_{22} x_2 \\
b_3& =a_{31} x_1 + a_{32} x_2
\end{align}
can be written as
\begin{align}
Ax=b
\end{align}
where
\begin{equation}
A =\begin{bmatrix}
a_{11}& a_{12}\\
a_{21}& a_{22}\\
a_{31}& a_{32}\\
\end{bmatrix}
\end{equation}
\begin{equation}
x =\begin{bmatrix}
x_1\\
x_2
\end{bmatrix}
\end{equation}

\begin{equation}
b =\begin{bmatrix}
b_1\\
b_2\\
b_3
\end{bmatrix}
\end{equation}

In general, let a be an \(m \mathrm{x} n \) matrix (that means m rows and n columns). Then we are given the A and b and are asked what values of x “work” here. There are a number of things that x could mean, and various configurations, depending on the sizes of the b’s and x’s

  • There might be no solution, i.e. no value of x makes Ax=b. We might want to know what value of x makes this as close as possible. This is the famous least squares problem, or linear regression.
  • There might be many solutions; we would be interested in characterizing those solutions
  • There might be exactly one solution, we would be interested in finding it.

Note: In MATLAB or octave, this would be called solving for x using the command: x=A\b

The linear equation \(Ax=b\) has been called the “most important equation in the world” 1. This book is an excellent reference for learning linear algebra – no need for another book. The free online course, MIT Introduction to Linear Algebra, taught by Gilbert Strang, uses this book.

Want to use linear equations for approximations? See Using Linear Equation for Approximations.

 

Notes:

  1. Strang, Gilbert. Introduction to Linear Algebra, Fifth Ed. Wellesley: Wellesley-Cambridge Press., 2016.

Using Linear Equations For Approximations

Suppose you forget how to do linear curve fitting and want to derive the formula from scratch. Or say you want to find the best parameters to describe some data.

First you make a model, with unknowns \(x_i\). Assume that your parameters are a and b. You might postulate a model of the form

[Note: put more realistic example in here]

\begin{align}
y = a x_1 + a^2 x_2 + b x_3 + a b x_4
\end{align}
Note that the equation does not need to be linear in the parameters a and b, only in the unknowns.

Now suppose that we have 100 data points, each with corresponding a,b,c and d’s. We can write
\begin{align}
y=Ax
\end{align}
where
\begin{align}
\begin{bmatrix}
y_1\\
y_2\\
\vdots \\
y_m
\end{bmatrix}
=
\begin{bmatrix}
a_1 & a_1^2 & b_1 & a_1 b_1 \\
a_2 & a_2^2 & b_2 & a_2 b_2 \\
\vdots & \vdots & \vdots & \vdots\\
a_m & a_m^2 & b_m & a_m b_m \\
\end{bmatrix}
\begin{bmatrix}
x_1\\
x_2\\
x_3\\
x_4
\end{bmatrix}
\end{align}

Now we ask the question. What values of \(x_i\) match our data most closely. If we say:

Problem:

Find the values of \(x_i\) that minimize the error \(e=y-Ax\)

\begin{align}
\DeclareMathOperator*{\argminA}{arg\,min}
x_{opt} = \argminA_x \| e \| = \argminA_x \left\| y – Ax \right\|
\end{align}
As we know, the solution to this problem is our good old linear solve. You can get the answer various ways:

  1. In MATLAB or Octave, use linear solve A\y
  2. For a closed-form expression, use \(x_{opt} = (A^T A)^{-1} A^T y\)
  3. Any other way you want to solve \(y=Ax\)

Review of How Not to Be Wrong: The Power of Mathematical Thinking

How Not to Be Wrong: The Power of Mathematical Thinking, by Jordan Ellenberg 1

I know what you are thinking. You don’t need this book. You are already never wrong. I know I am. Math is funny that way.

But you are wrong. You do need this book. It has plenty of examples of people who were wrong simply because they did the math wrong.

This book tries to explain why you should learn mathematics. It gives an introduction to some basic principles and demonstrates what awaits those ignorant of those concepts. The basic principles are not taught by giving you series of eye-blurring equations, but rather some broader concepts.

The book entices with

  1. The myth of if a little of something is good, more is always better
  2. The Baltimore Stockbroker, or which mutual funds are good?
  3. How to win the lottery
  4. Does cancer cause smoking?

The mathematical terms surrounding these ideas are linear vs. nonlinear relationships, inference, expected values, and regression. See, that wasn’t so bad, was it?

The book is similar to Freakonomics, with more emphasis on the math and how results are obtained. Ellenberg likes to point out basic flaws in reasoning that caused people to be wrong, and how math presented the correct way out.

One example early in the book is of Abraham Wald, a brilliant researcher at a classified program during World War II in New York City called SRG (the Statistical Research Group). By way of illustrating how much brain power was situated at SRG, the author says that Milton Friedman (the future Nobel Prize winner in Economics at that time) was often the fourth-smartest person in the room.

One problem that military leaders brought to Wald was “where should they put the armor on planes?” If you put no armor on the planes, they would be shot down easily. On the other hand, if you put too much armor on the planes, they couldn’t fly very far. Thus (an example of a nonlinear relationship), there must be a happy medium in between the extremes that was the “best,” (what mathematicians call the optimum) and they wanted to know what that was.

What they had were the data from planes that made it back from Europe. Obviously, some were shot down, and did not return, so they analyzed the ones that returned and found things like 2

Engine 1.11 bullet holes per square foot
Rest of plane 1.8 bullet holes per square foot

The generals said, obviously you want to put more armor on the rest of plane, see how many more bullet holes it got, but how much more?

Wald’s conclusion, after his analysis, was actually the reverse. You should “put the armor where the bullets are not”. A simple way to understand this is to imagine this scenario:
Suppose a single bullet in the engine takes a plane down, but large numbers of bullets elsewhere are not harmful. (Mathematicians get to imagine these scenarios when solving problems). In this case, ALL the planes coming back from Europe would have ZERO holes in the engine (because otherwise they would have been shot down) and only holes elsewhere. In that case, you would want to put armor only on the engine, and NONE anywhere else (because it was not critical). This simple type of reasoning, explored in detail (as Ellenberg provides a glimpse at the complex equations Wald used in his report), provided the answer the generals were looking for, albeit in a surprising direction. This reasoning is an example of survivorship bias, which is what happens when you look at samples that are not representative, but instead have a bias. In this case, they were looking at the data for surviving planes only, not all planes. This effect makes evaluating mutuals funds hard, as the “bad” ones may have a tendency to go away.

The book is amusing and entertaining, and well-written. Interspersed with examples are notes about historical figures that bring the examples to life.

Some other examples from the book that stood out in my mind are:

How three teams (including one of M.I.T. students) beat the lottery in Masschusetts. Using math.

How A could be correlated with B, and B correlated with C, and yet A is NOT correlated with C. Yes, you need to read the book. It has to do with the geometric interpretation of correlation.

The story of the Baltimore Stockbroker. I like this story, but I will retell it here as my father told it to me in the 1970s (so it predates the 2008 BBC show The System, which popularized this idea). A tout (tipster) gives our friend three tips as to the winning horse at a racetrack, one each on Monday, Tuesday, and Wednesday. Each one of those turns out to be correct. So on Thursday, when the tout offers to sell the man the name of the winning horse that day for $10,000 he jumps at the chance. What happened? Well, unknown to our friend, on Monday, the tout gave 1000 people tips to a race with 10 horses entered. To 100 he gave horse #1, a second one he gave horse #2, and so on. Then, on Tuesday, he ignored the 900 people who he gave the wrong Monday tip to, and focused on the 100 that got the correct tip. Of those 100, he gave them each a second tip: 10 got horse #1, another 10 got horse #2, and so on. Well, you see what happened. Our friend is the unlucky person that got 3 correct tips in a row. He doesn’t know about the 999 people who got bad advice. And he is going to pay for it.
When Ellenberg tells this story, he uses a stockbroker giving tips, but it’s the same idea.

Sometimes we take for granted some discoveries and inventions from the past, such as what is a bit of information (without which phrases like ’10 megabit download speed’ would be so much gibberish), or what is the correct way to compute probabilities. Ellenberg tries to highlight some historic struggles to find the correct way to address these problems.

One of the major themes of the book is how “math extends common sense.” Perhaps you will find your common sense extended after reading this book.

Notes:

  1. Professor of Mathematics at the University of Wisconsin-Madison. Published by Penguin Books, 2014.
  2. Not to be a spoiler, but look for the reference to ‘The Journal of Things I Totally Made Up to Prove a Point