2012-03-16
One of the most common mistakes people make when tackling a problem, is to assume too much. This is especially true when we think about things for which we don’t have a good innate intuition. Infinity is a good example of this, but today I’d like to talk about sets, aka collections of objects. The inspiration for this material comes from the book Basic Set Theory, which I highly recommend you check out. Note that this being a math book with a seemingly innocuous title, is fairly involved and highly sophisticated (much like any other seemingly harmless mathbook).
Those of you familiar with the history of Set Theory as it arose know that it was initially mired with a good deal of contempt by the mathematicians of the time (late 1800s). And for good reason. The concepts purported by the theory involved “different kinds of infinities,” for example, and I myself might have called this stuff witchcraft and wizardry, had I been living at the time. Georg Cantor, was the mathematical genius that started the set theoretic revolution, and it seemed for a while that the more he published, the more opponents he got. His opponents weren’t all dumb either; Leopold Kronecker was a prominent one.
The take-home message from this brief history digression is that set theory is anything but natural. Let’s see why with a few axioms. To start, imagine you are a pioneer in Set Theory and you first need to rigorously define what a set is. You might come up with the following.
The (Unrestricted) Axiom of Comprehension
$$ \exists y \forall x ( x\in y \leftrightarrow \phi(x)) $$
The way you should read this axiom is, given some statement or function \(\phi\), there exists a set \(y\) that contains all the things in the world that satisfy \(\phi\). And you’ve defined a set! Just kidding. The kicker is, this axiom is wrong. A simple counterexample is known as Russel’s Antinomy. Define \(\phi(x) = x\not\in x\) so that \(\phi\) defines the set of all things that do not contain themselves. Try to convince yourself that the existence of such a “set” leads to a contradiction before reading on.
Let \(y\) be the set such that \(\forall x(x\in y \leftrightarrow x\not\in x)\). Now, since \(x\) is a free variable, take the instance where \(x=y\). This produces
$$ y\in y \leftrightarrow y\not\in y $$
so if the hypothetical set \(y\) contains itself, then it doesn’t contain itself. This kills the axiom, and was discovered by Bertrand Russel in 1903. It is not the first antinomy proposed to negate the axiom of comprehension, but it is the simplest. An exercise in Basic Set Theory asks to prove that \(\phi(x) = \lnot \exists u(x\in u \wedge u\in x)\) is another antinomy. I’ll entrust this problem to you for now and come back later with a solution in a later post. In addition, try to come up with your own antinomy.
The failure of the axiom of comprehension leaves us wondering how to go about defining sets in the first place. Modern set theory solves this problem by constructing new sets from previously defined sets using what is called the Axiom schema of specification. I don’t have time to go through this at this point but the wiki article is quite good. For now, let’s go to another unintuitive axiom:
The Axiom of Choice
For every set \(a\), there exists a function \(f\) on \(a\) such that for every non-void \(x\in a\), \(f(x)\in x\).
Layman’s definition
For every set \(a\), we can choose a well-defined element belonging to every non-empty subset of \(a\).
When I first heard this axiom in college, I thought to myself “man, the mathematicians really went bonkers on this one.” I mean, how could an such an innocent statement as this be untrue or useful? It wasn’t until I started trying to break the axiom via example that I realized this axiom isn’t as useless as it initially seemed.
For finite sets, the axiom of choice holds almost trivially. If the set has \(n\) elements, then there are \(2^n\) subsets, and for each one, we can explicitly write out the element that our function \(f\) chooses. So, let’s consider infinite sets now. If we take \(a = \mathbb{Z}\), the set of all integers, we can still manage to come up with a function that selects an element from every subset of \(a\). The easiest thing to do would be to select the least element in the subset (or the greatest). But what happens if we let \(a = \mathbb{R}\)? Our method of choosing the least or greatest element no longer works because (think fast, why?) open intervals are also subsets of \(a\) and these subsets don’t have a least or greatest element. Small modifications like choosing the midpoint of each subset don’t work either because of subsets of \(a\) that look like \((0, x/2)\cup (x/2, x)\). Take a moment to convince yourself of the fact that it is, at least, very difficult to construct a choice function on the reals. “But surely,” you might quip, “every subset of even non-well-ordered sets has something in it. It should be possible to just choose an element out of that something!” This is actually the basis of the acceptance of the axiom of choice. It seems reasonable to think it might be true, despite the choice function not always being specifiable.
This marks the end of the first part of this short series. I hope to tackle more implications of the Axiom of Choice in the next part. Please leave comments!
2012-03-09
I still remember a lot of the “math team” problems I encountered In high school. I go back to them often to get a fresh perspective, and also to keep my problem solving ability sharp. One problem that I have visited numerous time is the problem about a confused bug from an AIME I took in, tenth grade was it? Anyways, the problem is as follows:
A bug sits at vertex \(A\) in the triangle pictured below.
A
/ \
/ \
B------C
(sorry for the ghetto ASCII art!)
After thinking a bit, the bug proceeds to walk to either vertex \(B\) or vertex \( C\) with equal odds. From that vertex, he repeats his random walk either back to vertex \(A\) or to the remaining vertex with equal odds. After 12 such steps, what are the odds that the bug ends up back at vertex \(A\)?
Method 1 – Brute Force
Most people would start a problem like this by drawing probability treed or by counting (somehow) the number of paths with length twelve that start and end on \(A\). In doing so, they would likely try out the problem for a smaller number of steps, say, 4 or something, to get a good feel for the problem. I want to emphasize here that there is nothing wrong with this. Finding and identifying patterns is fundamental part of learning math and problem solving. You are welcome to try this on your own and see if you can get the solution yourself.
On the first step, if the bug is at \(A\), then it moves to \(B\) or \(C\) with fifty-percent certainty a piece on the second step. On the third step, the bug might be at \(A\) with fifty-percent certainty or at one of the other two vertices with twenty-five-percent certainty a piece. To organize one’s thinking, we can do something like the following:
| Iteration | \(A\) | \(B\) | \(C\) |
| 0 | 1 | 0 | 0 |
| 1 | 0 | .5 | .5 |
| 2 | .5 | .25 | .25 |
| 3 | .25 | .375 | .375 |
| \(\vdots\) | \(\vdots\) | \(\vdots\) | \(\vdots\) |
| 12 | THE ANSWER | stuff we don’t care about | probably equal to the other stuff we don’t care about |
but of course, this doesn’t feel satisfying at all. Can we do better? The answer is, as usual, of course! Many of you might have jumped directly to the solution already, but feel free to try to find it now.
To find a cleaner (but still brutish) solution, let’s introduce some notation. Let \(P_n(x)\) be the probability that the bug is at vertex \(x\) at iteration \(n\). Then, we can write the following inference.
$$ P_{n+1}(A) = \frac{1}{2}(1-P_n(A)) $$
because the probability of the bug being at \(A\) on the \(n+1\)’th step is one-half the probability that it was at either \(B\) or \(C\) on the step before. We can actually get to this relation by solving for it from the more obvious equations:
$$ P_{n+1}(A) = \frac{P_n(B) + P_n( C)}{2} $$
$$ P_{n+1}(B) + P_{n+1}( C) = P_n(A) + \frac{P_n(B) + P_n( C)}{2} $$
(verify that these two equations are true and that they lead to the first one yourself). At this point, we can expand out the recursive definition thirteen times (starting from zero) and solve.
$$ \frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-\frac{1}{2}\left(1-P_0(A)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right)\right) = \frac{683}{2048} $$
So hardcore, is math.
Method 2 – Matrix Diagonalization
Fast forward a few years, I was taking my first course on Linear Algebra. I randomly stumbled upon this problem again, leafing through some old notes, and to my delight, I realized I could apply a new technique to arrive at the correct answer. Consider the following matrix:
$$ M= \begin{array}{c|ccc}~&A&B&C\\ \hline A & 0 & 1 & 1 \\ B & 1 & 0 & 1 \\ C & 1 & 1 & 0\end{array} $$
NB. I would use the \bordermatrix command but MathJax does not support this command yet so bear with the array notation. To interpret this matrix (called an adjacency matrix), let the row label be the source and the column label be the destination, although having the reverse interpretation is fine too. The elements then, tell you how many paths there are between the source and the destination. Neat-o right? Let’s see what happens if we take powers of the matrix.
$$ M^2 = \begin{bmatrix} 2&1&1\\1&2&1\\1&1&2\end{bmatrix} $$
$$ M^3 = \begin{bmatrix} 2&3&3\\3&2&3\\3&3&2\end{bmatrix} $$
Believe it or not, the entries of the matrix \(M^n\) represent the number of paths between the source row and destination column with length \(n\). Actually, let’s not just believe it. Let’s prove it! Inductively!
The case for \(n=1\) is trivial so we have our base case. Let’s now assume that the above statement holds for \(n=k\). The burden is now on us to show that the statement holds for \(n=k+1\). The number of paths from vertex \(A\) back to vertex \(A\) in \(k+1\) steps is simply the number of paths from \(A\) to \(B\) in \(k\) steps plus the number of paths from \(A\) to \(C\) in \(k\) steps. Similarly, the number of paths from \(A\) to \(B\) in \(k+1\) steps is simply the number of paths from \(A\) to \(A\) plus the number of paths from \(A\) to \(C\) in \(k\) steps. Symmetric arguments apply for all other permutations of source and destination vertices. However, it is clear that the matrix \(M^{k+1}\) satisfies these requirements for:
$$ M^{k+1} = M^k\cdot M = \begin{bmatrix} A\rightarrow_n A & A\rightarrow_n B & A \rightarrow_n C \\
B\rightarrow_n A & B\rightarrow_n B & B \rightarrow_n C \\
C\rightarrow_n A & C\rightarrow_n B & C \rightarrow_n C \\
\end{bmatrix} \cdot \begin{bmatrix} 0&1&1\\1&0&1\\1&1&0\end{bmatrix} = \begin{bmatrix}
A\rightarrow_n B + A\rightarrow_n C & A\rightarrow_n A + A\rightarrow_n C & A\rightarrow_n A + A\rightarrow_n B \\
B\rightarrow_n B + B\rightarrow_n C & B\rightarrow_n A + B\rightarrow_n C & B\rightarrow_n A + B\rightarrow_n B \\
C\rightarrow_n B + C\rightarrow_n C & C\rightarrow_n A + C\rightarrow_n C & C\rightarrow_n A + C\rightarrow_n B \\
\end{bmatrix}
$$
(the notation here is made up and indicates the starting vertex, the ending vertex, and the number of steps taken between them). From this, it should be clear that all the entries are correct as is and the induction proof is finished (a more general proof exists, of course, for arbitrarily ranked adjacency matrices). That means, all we need to do, is take the twelfth power of \(M\) and our answer should be
$$ \frac{M^{12}_{11}}{M^{12}_{11}+M^{12}_{12}+M^{12}_{13}} $$
since we want the fraction of paths that start from \(A\) and end on \(A\) out of the total number of length 12 paths emanating from \(A\). But who wants to exponentiate a matrix to the power of twelve by hand? Cool kids, that’s who! But not by multiplying it out. By diagonalizing the matrix in the form
$$ M = S\cdot J\cdot S^{-1} $$
where \(J\) is diagonal and \(S\) is obviously invertible. I won’t go through the entire diagonalization process here, but you are strongly encouraged to do it by hand for practice or learning if you aren’t completely comfortable with it. In my case, I did it to remind myself that I can do it at all. Then, use the fact that
$$ M^n = S\cdot J^n \cdot S^{-1} $$
(prove it to yourself that this is true) to find that
$$ M^{12} = \begin{bmatrix} 1366 & 1365 & 1365 \\ 1365 & 1366 & 1365 \\ 1365 & 1365 & 1366 \end{bmatrix} $$
and the answer agrees with our previous answer since
$$ \frac{1366}{1366+1365+1365}= \frac{683}{2048} $$
Why is this method awesome you ask? Wasn’t the first way enough? Well, this method can be used to solve this problem for much more complicated networks. Try solving the problem for a bug on a square with a single diagonal (start it from some corner) with the same movement rules. You’ll find that this approach produces an answer fairly quickly. Plus, computers handle matrices like a boss.
Method 3 – A method so cute, it’s downright dastardly
I’ve saved the best for last. Sometimes, you come across a solution so pretty, you remember it for life. I came across this witty solution browsing some random problem solving message boards in high school, so I can’t claim credit for it. Thinking about the bug though, if you had to guess the odds that it’d end up back on \(A\) after a million steps, what would you guess?
A third, right? In fact, the answer in the twelve step case is approximately \(0.333496\dots\) so it’s already pretty darn close. Also, we know that the bug takes twelve steps, so we can expect some sort of \(2^{12}\) power in the denominator. Now, \(2^{12}\) is not divisible by \(3\), but it does break down almost evenly into three parts:
$$ 2^{12} = 4096 = 1366 + 1365 + 1365 $$
Look familiar? These were the entries in the first row of \(M^{12}\)! Furthermore, we can deduce that because the original vertex is “unique”, the answer must be \(1366/4096\) and not \(1365/4096\). Great scott! That’s crazy! Of course, this isn’t a “proof” that the answer is what it is, but it’s a good heuristic and I’ve used this trick whenever possible just to get a sense for what the answer might be.
To wrap up this short (which might have gone too long, sorry), note that this last inference suggests that the telescoping series from method 1 should converge to a third. Let’s check!
$$ x_{n+1} = \frac{1}{2}(1-x_n) $$
Assuming this converges for the seed value we choose (you should check this), for very large \(n\), \(x_n\) is approximately constant.
$$ x = \frac{1}{2}(1-x) \Rightarrow x = \frac{1}{3} $$
And all is right with the world. So in summary, we took a basic math problem from high school, solved it. Learned to solve it another, more robust way. Then, we learned that we could apply this matrix method to more general networks. Finally, we learned a simple dirty trick that produced the correct answer via our intuitions. This intuition is then verified with our initial discovery about the evolution of \(P_n(A)\). So much insight from just one problem! If you take nothing else away from this, know that the world of math is vast and contains more excitement and opportunity for creative thinking than you could ever imagine.
Happy proofing!
edit:
Reddit user megiston has pointed out a combinatorial solution by representing the bug’s movement as a binary string, where \(0\) represents a clockwise rotation and \(1\) represents a counterclockwise solution. If the difference between the number of \(1\)s and \(0\)s is divisible by three, the bug is back at vertex \(A\). The string can then contain \(0\), \(3\), \(6\), or \(12\) \(0\)s to be a valid path ending back at \(A\). The path to the solution from this point is very short. Thanks!
I should also mention that the adjacency matrix is closely related to Markov Chains (see http://en.wikipedia.org/wiki/Markov_chain).
2012-03-08
When I took my first couple theoretical math courses (the ones with proofs and \(\Box\) s and whatnot), the professors would often start a problem, only to immediately begin to prove that a function was “well-defined.”
The problem was, as a student feeble in the ways of understanding mathematical proofs, I had no idea what “well-defined” meant! It’s often the case that professors become so accustomed to an idea (usually an especially simple one), they forget that they have to explain it at all. So, I am authoring the post to explain more-or-less what it means for a function or mapping or what-have-you to be well-defined. In addition, I will give some examples of things that are not well-defined for comparison.
So first, a pseudo-definition.
A function is well-defined if for equivalent inputs to the function, the function produces equivalent results. Also, the results of the function should like in the specified range, and the function should actually handle all possible inputs in its specified domain.
Now many budding mathematicians might be thinking, “how on earth could this definition possibly be useful?” This is probably one of the most commonly asked questions in math in general, and it is usually a good question that demands an explanation. The reason a function might not be well-defined is if elements in the domain or range can be expressed ambiguously. As an example:
Ex 1. – The digit swapping “function”
Let \(f\) be a “function” from \(\mathbb{R} \rightarrow \mathbb{R}\) that takes the decimal representation of a real number and swaps the first and second digits immediately to the right of the units digit. So, our “function” does things like
$$ f(1.01) = 1.10 $$
$$ f(25.0993) = 25.9093 $$
and so on. If you can, give yourself a minute or two to see if you can figure out what makes \(f\) an ill-defined function.
The answer, of course, is that decimal representations of real numbers are ambiguous. Remember that, for example, \(0.\overline{99}\) is mathematically equivalent to the number \(1\) in the reals. Here’s a case where \(f\) does some nasty things!
$$ f(.8\overline{99}) = .98\overline{99} = .99 \neq .09 = f(.90) $$
So we see that because a real number doesn’t have a unique decimal representation, we can imagine all sorts of silly functions that aren’t well-defined at all. Now, how about an example from algebra?
Ex 2. – The “exponent” map from \(G\) to \(\mathbb{R}\)
Let \(f\) be a map from the cyclic-5 group generated by \(x\) (which I will call \(G = \langle x \rangle\)) to the reals. The function \(f\) takes an element, \(x^n\) say, to the number \(n\).
Of course, this function suffers from the obvious issue that elements in G are equivalent mod five but elements in the reals are not. To end this mathematical short, let’s look at a problem where we show that a function is indeed well-defined for practice (also an example from algebra).
Ex. 3 – Cyclic groups of a given order are isomorphic
A common proof that shows up in an introductory course on abstract algebra is the one that proves that all cyclic groups of a given order, say \(n\), are isomorphic. The proof I will give here is constructive (aka, we actually construct the isomorphism). Suppose you have two groups \(G = \langle x \rangle \) and \(H = \langle y \rangle \) with \(|G| = |H|\). Let \(f\) be the map from \(G\) to \(H\) that takes an element \(x^i\) and produces \(y^i\). The claim is that this is an isomorphism.
But wait, we have to show that the map is well-defined first! Because we know now, that the ambiguous representation of elements in the domain and range could pose an issue. In particular, we need to show that
$$ x^i = x^j \Rightarrow f(x^i) = f(x^j) $$
From \(x^i = x^j\), we know that \(x^{i-j} = 1\) so the order of \(x\) which is \(n\) must divide \(i-j\). Thus, we can express \(i\) as \(an + j\) for some constant \(a\). This gives us the following chain of inferences:
$$ f(x^i) = f(x^{an+j}) = y^{an+j} = y^{an}y^j = y^j = f(x^j) $$
and we have shown that the map is well-defined.
From here, you can prove fairly quickly that the map is a homomorphism (easy), and that it is also a surjection, and hence, because the orders of the groups are the same, an isomorphism. N.B. this is the proof for cyclic groups of finite order. Proving this fact for cyclic groups of infinite order is easier because you can find a trivially well-defined isomorphism.
edit:
One of the readers commented that the term “well-defined” is itself not well-defined. This is a valid point and well-defined is almost used as the mathematical version of “good.” When you do your proofs, use some common sense and examine the context to see where the well-defined-ness of the map or function in question might be compromised. Happy proofing!