The FiveThirtyEight blog runs a weekly column, called “The Riddler” which poses a problem each week for readers to solve. They’re mathy, or nerdy, or both, and they usually have one, correct answer. Furthermore, the author only recognizes one of the many correct submissions. This week is a little different. Here’s the description of the problem taken from the website (reproduced here without permission – please don’t be angry, Riddler).

In a distant, war-torn land, there are 10 castles. There are two warlords: you and your archenemy, with whom you’re competing to collect the most victory points. Each castle has its own strategic value for a would-be conqueror. Specifically, the castles are worth 1, 2, 3, …, 9, and 10 victory points. You and your enemy each have 100 soldiers to distribute, any way you like, to fight at any of the 10 castles. Whoever sends more soldiers to a given castle conquers that castle and wins its victory points. If you each send the same number of troops, you split the points. You don’t know what distribution of forces your enemy has chosen until the battles begin. Whoever wins the most points wins the war.

So, each submission will battle against all the others and whichever wins the most battles wins the game. Barring a tie, there can only be one winner. Although probably still fairly small, I think this will be my best chance to actually win. Oh, and there’s a twist – the puzzle already ran back in February, and everyone’s submissions and rationale were published ; ).

**Incomplete Information**

Since we don’t actually know what this round of submissions will look like, I started by seeing what it would take to just play every possible submission against every other one, however, there are a little over 6 million (!) partitions of the number 100 (the number of available troops), and since each castle counts differently, there are 10 factorial ~ 3.6 million permutations, totaling over 22 million possible solutions. I would have to play 22 million factorial games to exhaust all the possibilities and since I don’t have access to a quantum computer yet, I need a different method.

**Genetics – Again!**

I need to search a space that’s just way too big to search. Back in 2012, I solved a similar problem – trying to play through every possible position in a board game with way too many positions to consider – by using a genetic search, so I tried it again. The idea is actually fairly simple:

- Begin by randomly selecting a bunch of candidates
- Play them all against each other
- Let the strong survive and pass on their traits to future generations
- Repeat

That’s it. Since this is a static environment (not like, say, a rainforest), eventually the algorithm will settle on a set of really good solutions. At least, that’s the idea…

**Measuring “The Strong”**

What does it mean to be “strong” here? I began by playing each candidate in my pool against all of the others, and allowing the ones which won the most to survive, however, after a day of processing, the algorithm had made a clear choice. Just out of curiosity, I ran this candidate against the February results and found out that it was, well, less good than I’d hoped – placing around 200th of about 1300 submissions. From the graphic below, you can see that it casts a pretty wide net for the mid-range castles, but really goes for number 10 hard, devoting fully one third of its available troops to that castle.

But, apparently, humans aren’t all that random – ¯\_(ツ)_/¯ – so, I changed my strength measurement to score against the human entries instead of against the random pool. The algorithm very quickly settled on a set of candidates which all won enough matches to easily coast past the previous first place entry by more than 100 matches. They almost all looked something like the one below, which, I think, was my actual submission:

This result tells me that the humans tended to over-allocate troops to castles 1-4, 8, and 10. My submission happily grants them almost all of those points, focusing instead on the following group of castles: [5, 6, 7, 9], the sum of which is almost enough to claim victory, per se. This strategy, interestingly, didn’t even find the token allocation of a single troop to defend castles 1 or 3 worth removing just one troop from the heavily fortified castle 9. Again, one third of the troops go to just one castle, essentially guaranteeing a win there. For the fatally curious, I’ve posted all of the Python +_SQL code which generates the results above.

**Final Grade**

I’m purposefully posting after the submission deadline (5/28/2017), but before the contest is scored (projected 6/2/2017). Afterwards, I’ll conclude that:

As should have been obvious from the start, computers do a much better job of solving problems in the realm of game theory than humans, so only the computer generated solution could have won. Recognizing this, it’s time to start accepting this approach in other domains (I’m looking at you, healthcare).

OR

Of course, all of this number-crunching failed to consider human groupthink and the variability of results on such a small data set. While this was a fun and interesting exercise, I never expected it to really come close to winning.

Just before posting this, I found at least one other computer-generated modeler who took a similar approach, and got a little bit better result on the sample set. In fact, his rationale and write-up are eerily similar to mine, so it will be fun to see which performs better on the real data.

]]>Shortly after discovering, sometime in the late eighties, that magically, mysteriously, I could use a joystick and button to control a spaceship or a keyboard to command a boy to explore a castle, I found myself wishing that I could make that magic happen myself. In grade school I made some plans. In high school I read some books. In college I took some classes. In grad school I studied algorithms and I even interviewed with video game companies, maintained a stats site for an existing game, built AI that learns to play a board game, and so on, but never quite turned the corner.

**Scoping Problem**

A number of friends had expressed interest at some time in my life in helping me make a game. None of us ever did. I made attempts—built a rudimentary physics engine, contacted my favorite band to ask if I could use their music, and wasted a great deal of effort on other equally fruitless tasks. I should have recognized this problem for what it was, a disease commonly afflicting software, called scope creep. In short, it happens when we try to do everything at once, and it prevents us from doing anything ever. Somewhere deep inside, I felt like making a game meant making a blockbuster.

**Extending My Reach**

I decided that I needed to reach outside of my base of friends and family. Fortunately, I wasn’t the first in my town to have this thought. Again, however, I spent a fair amount of time learning new tools, toying with mechanics, thinking, discussing, and now, networking. All of these things were useful, but I still hadn’t quite reached the milestone of having made my own. Then, my scope tightened.

**Competitive Building**

There is a game development competition. It’s called Ludum Dare. The premise is to build a game in two to three days. That’s it. The prevention of scope creep is baked right in to the rules. I participated in Ludum Dare 33, and did in seventy-two hours what I had previously failed to do in twenty-five plus years. Was it great? Absolutely not. Was it good? No! Was it terrible? Almost certainly. Take a look for yourself. I asked my wife to draw some artwork on a white sheet of paper, scan it, and send it to me. I borrowed some generic assets for a background. I neglected to add music or sound.

Then I asked my five-year-old son to show it off to a small crowd of people collected to celebrate what had been built that weekend. And the amateurish mechanics, unpolished images, and childish puzzles are still magic. I built an unambiguously complete game and learned a breadth of things that would have otherwise taken months.

**A Moral?**

Given the technology available today, and the growing base of people like me, who grew up with gaming as infrastructure, not as new technology, I’m excited to see what happens as it, like books and television before it, becomes more integrated into our daily lives. I think it will do much greater things than either of those predecessors and I have a feeling I’ll have more to say about that in a future post. In the meantime, I’ll certainly be making more games myself, maybe even some good ones.

]]>A few years ago, a co-worker participated in a local cricket league. Each team played all of the others in the league twice, and the top few qualified for a playoff. He wanted to know how many games his team needed to win in order to guarantee an appearance in the playoffs. This type of scheduling goes by the name “round-robin”.

I was interested in the following questions:

- In the worst case, how many wins would a team need in order to advance? This is the case my colleague asked about.
- In the best case, how few wins could a team achieve and still make the playoffs?
- What is the average case? How many wins would be necessary to make the playoffs 50% of the time?

I don’t remember the specifics of my colleague’s tournament, so let’s just solve the problem in general.

**Average Case**

Taking these questions in reverse order, let’s start with number 3. As it turns out, I wasn’t curious enough to figure it out, so it remains unanswered (at least to me). Perhaps I’ll revisit it someday. Problem solved! On to number 2.

**Game Over Nerds Win (Best Case)**

Another way to describe this is something like the luckiest case. How abysmally can your team perform and still mathematically make it to the playoffs? I’ll assume that it’s possible for two teams to have the same record, while one advances and the other does not (this is usually the case when there is no time for another game, e.g., the sixth and final tie-breaking condition for the Big East NCAA basketball conference championship is just a coin flip).

In this scenario, there must be two groups of teams, the jock teams and the nerd teams (and, you guessed it, your team is a nerd team). The jock teams always beat the nerd teams, and the jocks occupy all but one of the qualifying spots, leaving all of the nerds to fight for the last one. Let’s call the total number of competitors and the number of qualifiers. Then there are jocks and nerds. If each of the nerds splits their matches with all of the others, there will be a tie between all of them, resulting in one team advancing with a tie-breaker (coin flip). This means each nerd team wins games, and this is the lowest win total possible to still advance. You can’t give up any more wins to the jocks, so If you went any lower, one of the other nerds would have at least one more win than you, and you wouldn’t qualify.

**Jocks Lose (Worst Case)**

The worst case is actually the question I was asked, and it represents the nightmare scenario – where your team performs exceptionally, but still fails to advance on the tie-breaker. Let’s call up the jocks and nerds again. This time there are more jocks than qualifying spots, by one. The jocks still beat all of the nerds twice, then they split their remaining games against each other, for an additional wins each. This gives each jock team a total of wins. But even if you are one of the jocks, you could still lose the tie-breaker! So we need one more win – that is, , to guarantee that we advance.

**What Does It Look Like?**

Well, I don’t have any cool artwork for this one, so I thought it might be fun to at least see what these functions look like on a 3D graph. With old friend Maple 8 (yes, the one from 2002), it’s pretty easy to generate a 3D plot with both functions in Q and C. In the image below, the worst case is represented by the “top” of the structure, and the best (luckiest) case is represented by the strange-looking shaded region.

You can see that as the number of qualifiers shrinks relative to the number of competitors, the required win total really skyrockets. For example, if just two out of five teams qualify, six wins would not be enough to guarantee a playoff spot.

I don’t have much else to say about this, except that I think this is one reason why data nerds really love baseball – there are so many games that there are few surprises, and even ties are broken with yet another game, instead of some random device. Statisticians hate small sample sizes, and a single or double match round-robin, while theoretically fair for all participants, just doesn’t provide enough data for a statistically significant result.

]]>Which image below looks more disorderly? Which seems more chaotic?

Perhaps these seem like trick questions—they’re not. They may seem like leading questions—they most certainly are. Feel free to answer with your gut on this. (Hint: That image on the right looks fairly orderly to me.)

**Entropy**

Entropy is commonly referred to with the terms *disorder*, *randomness*, and *chaos*. In fact, many textbooks use the term *disorder* explicitly to introduce the concept; however, this description tends to lead to misunderstanding of the concept.

Entropy is a real, measurable quantity, just like volume, energy, or momentum. So what does it measure? I find it best to consider it not as a measure of *disorder*, but as a measure of *uniformity* in a system. Thank you to Dr. Leno Pedrotti at the University of Dayton, who first introduced me to this idea. Now, let me explain what I mean.

Let me begin by creating out of nothing a very unscientific term called “clustering” of energy. I’m certainly not going to worry about defining it precisely—just consider the following qualitative description instead.

Systems with a large amount of clustered, usable energy have very low entropy (think dry baking soda and vinegar in separate containers), but as a closed system evolves over time, that usable energy dissipates into useless, “unclustered” heat (think about the uniform mixture of baking soda and vinegar long after the “explosion”).

**Quantifying Entropy**

Now it’s time to do one of my favorite things—stretching an analogy to the point of nearly breaking, and then breaking it. Let’s focus on this make-believe idea of clustering for a moment in hopes of gaining some insight into how entropy works.

It won’t be immediately clear why I’m doing this, and that’s ok, but consider a very simple system of six balls, two of each of the colors red, blue, and green. Furthermore, let’s suppose we can distinguish between balls of the same color, perhaps because one of them has a decoration on it. The balls might look something like this:

Supposing that I randomly order the balls in a straight line, I’d like to know exactly how many “clusters” of balls of the same color I can expect to get, on average. To find out, we just have to count up all the possible configurations and weight them accordingly.

**Fully Clustered**

There could be up to three clusters in a configuration. That is, the two reds, two blues, and two greens are each together. The first cluster could be any of the three colors, the second could be any of the remaining two, and the third must be the final color.

That’s ways. In the image above, the grey balls could be either red, blue, or green, then the white balls could represent either of the remaining two colors, while the black balls must represent the last color.

Furthermore, each cluster could be formed in two ways— remember that balls of the same color are distinguishable. That means there are ways to rearrange each of the 6 configurations above, totaling possible three-cluster configurations.

Let’s now consider the case of two clusters. The non-clustered balls could of course sit at the ends, but they could also sit at positions 1 and 4, or positions 3 and 6 (see below).

For each of these cases, the colors could again be distributed in 6 ways, and the same-colored balls rearranged in 8 ways, yielding two-cluster configurations.

Next, consider the one-cluster option. This one is slightly tricky because we need to account for a lot of cases, as you’ll see. First, the cluster could be at the far left. Next it could occupy positions 2 and 3. Good so far?

Now, when it occupies positions 3 and 4, we can actually distribute the other balls in two patterns (A, B, cluster, B, A) or (A, B, cluster, A, B).

Finally, the cluster could be positioned at 4 and 5, or 5 and 6. That’s 6 total cases, again, multiplied by 6 color configurations and 8 rearrangements, to get total configurations.

Finally, we’ve found configurations so far, but there are a total of , leaving possible zero-cluster configurations.

So, on average, we have cluster. In other words, if we randomly order the balls, we typically expect to see only one of the three possible clusters.

**I Thought We Were Talking About Entropy**

Absolutely, so let’s get back to it. If we think about the clusters as usable energy (this is the stretchy part of the analogy), we can see that the fewer clusters, the higher the entropy. The interesting thing about the calculation above is that no matter how the balls are initially configured, if they are free to randomly distribute themselves without any constraint, they are likely to settle into a fairly low-clustered (high entropy) state. Now, think about what would happen if we had even more balls, and even more colors, and even more dimensions. All of these things would further reduce the amount of clustering.

This is exactly what happens at the particle level in thermodynamics. Given that all states are equally likely, since there are many, many more states with lower energy levels, particles tend to organize themselves in such a way that the usable energy lessens, and thus the entropy rises over time. We might not expect this, and it’s way beyond the scope of this little blog to prove it rigorously, but that’s really what happens. Particles do assort themselves randomly as if they were selected from a Powerball machine, and it just so happens that there are enough configurations with a low-energy state, that they are way more likely to occur, so much so that we never, ever, ever observe entropy decreasing.

**What About the Pictures?**

Here is why I prefer the term *uniformity* to *disorder* when describing entropy.

Look again at the pictures at the top of this post. See how the one on the left exhibits much more clustering than the one on the right? Also, remember how you answered the question, “Which one looks more disorderly?” The left image, which is more disorderly (the more highly clustered), actually represents the lower entropy state. The image on the right, the one which is the most uniform, represents the higher-entropy scenario.

So it is in thermodynamics. The more uniformly distributed the energy, the higher the entropy, and despite the textbook definitions, chaos and disorder are really the opposite of entropy.

**Closing Thoughts**

Two final notes.

First, for those curious, I used a tiny bit of C# code to generate the two images. The first is a randomly generated image with some “clustering” constraints. The second is simply the mundane repetition of red, green, and blue pixels in sequential order—as uniform as it gets.

Finally, one interesting property of entropy is that it always increases. Because of this, all closed systems eventually lose any ability to transfer energy. You may have heard of the heat death of the universe. That’s really what the image on the right represents. It’s the state of highest possible entropy, when all the usable energy has been, well, used, and it is the predicted end state of the assumption that entropy always increases. If you really believe in it, and that the universe is a closed system, then everything we ever do will eventually end with a uniformly distributed universe where nothing interesting will ever happen again. Fun!

Images by: Seth Johnson

]]>A short time ago, I watched my hometown college football team come back to tie a game in the fourth quarter, triggering overtime. Because overtime is so much fun to watch, I immediately wondered how long it would last. That particular game ended in two overtimes, but I decided I’d like to know how many overtimes one can usually expect.

As it turns out, the NCAA publishes every overtime game including the number of overtimes. We’ll come back to this later, but I decided to ask a more interesting question instead. I wondered whether regulation time performance matches overtime performance well enough to predict the average number of overtimes.

For reference, let’s pause to describe the rules of overtime. Briefly, they are as follows:

Each team gets a chance to score from the opposing team’s 25 yard line. A tie at the end of the round invokes another round. After the second round, any team scoring a touchdown must attempt a two-point conversion.

Although there are other rules, for instance, if the team playing defense scores on a turnover, the game automatically ends, but this is the basic outline.

Well, it would really be nice if we had a large data set of teams starting a drive from the opposing team’s 25 yard line to see how often each type of score occurs from that starting position. As it turns out, the NCAA publishes this data on a regular basis, and even better, the good folks at cfbstats.com actually compile it all into wonderful little year-by-year file nuggets.

Using the data from 2005-2011, I found 975 such drives. I eliminated twenty of them, because they resulted in the end of the half, meaning that the team either didn’t play offense (took a knee), or didn’t have enough time to score, neither of which ever happen in overtime. The breakdown of the remaining 955 drives looks like this:

As you might have expected, there are a lot of touchdowns and field goals, and relatively few scoreless drives, which happen for various reasons.

This gets me most of the way to learning what the score might be after a drive from the 25, but I also need to know the hit rate for extra points and two point conversions. Thankfully, the NCAA provides that as well. Here are those numbers for the same years (2005 – 2011):

There is not much variance in extra point hit rates over the years, and only a moderate amount in two-point conversion rates. Still, it’s best to take the largest sample you can, so I’ll use the cumulative averages as my hit rates.

Now it’s time to cook these data! For each round, we need to estimate the probability of ending in a tie, or put another way, whether the game continues. First, let’s note that any drive can result in a score of 0, 3, 6, or 7 for rounds one and two of overtime.

The data above are enough to calculate the probability of each type of score. Then, I can just square each one and add them all up to get the probability of a tie. In rounds three and higher, I can do the same, but use the two-point conversion rate instead of the extra point rate. The result: a transition probability matrix!

That’s right, we’re going to treat overtime like a Markov chain and use it to find the average number of steps before the game ends (absorption time in Markov parlance). Since I’ve done similar work for my analysis of Monopoly, I’ll space the gory details and get straight to the results. If you’re really nerdy and want to see all of the calculations, including the original drive chart analysis, take a peek at my excel worksheet.

To the right, you can see the average number of overtimes it takes to get to the end of the game. So, for instance, starting from the second overtime, it takes 1.46+ rounds, on average, to reach a conclusion. That means that by this method, the average number of overtimes is about 1.51. Remember I told you the NCAA actually publishes the overtime data? Well, let’s compare and see how we did. It turns out the average number of overtimes for the same period I used in my calculations was about 1.41, for a difference of about 7%. Well, I’ll count that as a win (even though my home team lost their OT game)!

The very critical reader will have noticed that I made a couple of implicit assumptions. I took care not to state them when writing the analysis, but I think they’re worth mentioning.

- It can be argued that teams really do perform differently in overtime because at the end of the game, they are more tired, thus, the outcomes of any given drive would be different. This is a fair point, however, I don’t have a good way to quantify it, so I have to assume that it affects both teams equally, and doesn’t affect whether a given OT session will end in a tie.
- If the first team scores a touchdown, the second team will not go for a field goal. This may indeed account for some or all of the difference between my result and the actual results, but again, I have no really slick way to quantify this information, so I’ve left it alone. Furthermore, this matters in fewer than 25% of all overtimes.

All in all I was surprised by the average being so low. I honestly expected it to be more than 2 overtimes, however, I was still pretty pleased with the result of this analysis. Oh, and if you’re still wondering about my poor hometwn team, they actually did pretty well this year.

]]>Here is a very fun and surprising result which is not really new, but if you haven’t seen it, you really should. The problem looks like this:

One-hundred men throw their hats into a closet. Each one randomly selects one of the hats until all of them have a hat again. At the end of the process, how many men do we expect to have their own hat?

I had a professor who gave us the answer to this question, and put it on every test he gave, but never actually proved it, so today I’m going to do it. I’m actually going to solve this problem (and the more general one, with men and hats) in two different ways, but let’s start with a brute force approach.

*If you don’t feel like reading the details, skip to the end now for the answer.*

First, consider the trivial case of one man and one hat. Well, obviously he gets his hat back so we expect, and in fact it has been uniquely determined as the singular outcome, that one man will end up with his own hat.

The case of two men is the first interesting one. Here, there are two outcomes:

- Each man gets the other’s hat, and hence no men get their hat back.
- Each man gets his own hat.

Both events are equally likely, so they occur with probability , and we now have enough information to calculate the expected value:

If is the number of men who get their own hat back,

This is a neat result, but still doesn’t give us enough info to establish a pattern. Let’s give three men a try and see what develops.

This will require a small amount of bookkeeping, but essentially, there are three things that could happen:

**Everyone gets his hat back**: This only happens in one of the possible hat permutations.**One man gets his hat back**: Here, the other two trade. This can happen three ways, one way for each man.**No man gets his own hat back**: This can happen two ways, either each man passes his hat to the right, or each man passes his hat to the left.- Notice that there is no way for
**exactly two men to get their own hat**, because this would leave the third man with his hat as well, which is already covered by case 1 above.

We’re now in a position to calculate the expected value:

. Interesting! Perhaps a pattern is forming after all, though not what we might have expected. Let’s brute force one more case and then try to generalize:

For four men, there are permutations. We’ll consider five sub-cases corresponding to zero, one, two, three, and four men who get their own hat. Each configuration has two groups of men – those who got their hat back, and those who didn’t. Let’s examine the possible outcomes through this lens. My approach here will be to answer the following two questions for each case:

- How many ways can we select a unique group who get their hats back?
- Once that group is chosen, how many ways can we rearrange the hats in the other group so that
**none**of them get their own hat back?

The number of arrangements for each of the five configurations will be the product of the answers to these two questions. And the probability of each configuration is this product divided by twenty-four, the total number of permutations. Let’s look at some hard numbers for clarity:

**All four men get their hat back**: The number of ways this can happen is simply or . Obviously there are no men who don’t get their hat back, so that can only happen in one way. That gives us configuration in this category.

**Three men get their hat back**: This can happen ways, but this should leave one man in the group of men who don’t get their hat back. This can’t happen (or we could say it can happen in zero ways). That means there are configurations in this case.

**Two men get their hats back**: Of course, this can happen in ways, but we must also be sure the other two don’t get their hats back, which can only happen one way – if they trade. So, in total, we have configurations in this category.

**Only one man gets his hat back**: This can happen in ways, but again, we must guarantee that the other three each get someone else’s hat. We’ve already seen that for three men, there are two ways in which none of them get their hat back (passing to the left and to the right). That gives total cases in this category

**None get their hat back:**Since we’ve already accounted for of the cases, that leaves only in this group.

Again, we can calculate the expected value:

This should give us the strong suspicion that the answer is always going to be one. Think about that. No matter how many people randomly shuffle their hats, can it really be true that, on average, one person gets their hat back? Well, without a general formula, we’ll never know, so let’s build that.

The groundwork has already been laid, but we’re missing one piece, a function to represent the answer to that second question, how many ways can we rearrange the hats so that no one gets his own back? For one, two, three, and four men, the number is zero, one, two, and nine. Is there a function that follows this pattern? As it turns out, there is! It’s called the subfactorial and it measures the number of something called derangements of a set. How can that fail to be awesome? It can’t.

A derangement of an ordered set is a permutation that leaves no element in the same position in which it started. But that’s exactly what we’re trying to measure! The number of derangements of a set of elements is represented by the subfactorial function, . The formula for this function can be represented in many ways, but probably the simplest is:

Where [] is the rounding function and is Euler’s constant.

Now, we can write a general formula for men:

I won’t go about proving that this always equals one because it’s sort of hard to type it all out and I’d much rather play video games than do that. I will give you a place to get started, though. Try running the following query at wolfram alpha:

**sum(subfactorial(n-x)x(binomial(n,x))/n!,x=0..n)**

That’s as far down that path as I’ll walk today, but before ending the conversation let me show you a much easier way to solve the problem, even if it doesn’t include any funny mathematical terms.

For this proof, let’s define as the event that the man gets his own hat back. Let’s also introduce a fairly boring function, called the indicator function, , which evaluates to if the event is true, and otherwise. Now, let’s find the expectation value of . Assuming that we are dealing with hats and men:

Now, let’s see why I did all that. Finally, define , the total number of men who get their own hat back, and calculate its expected value:

I’ve actually done a lot of things here that need justification, but I’m skipping them because they’re not fun and I’m not in school any more so I don’t have to. At the very least, though, you now have a good idea why I could get away with mindlessly scrawling on the page every time I was asked this question. Have fun quizzing your friends with this one.

* For those who skipped*: The answer is no matter how many men are involved! If you don’t believe me, I know a great blog you should read…

*Edit – 06/14/2013: A previous version of this post contained a unappealing mess of images created by the author with MS Paint. They have been graciously upgraded by Seth Johnson*

“Is this always true??? I mean the math part… Stephen???”

Knowing that I have an irrational love of rage faces as well as math, he knew I couldn’t resist trying to prove or disprove the algorithm presented in the comic, which I’ve displayed below (without permission) in case you didn’t follow the link.

So, here goes:

In both examples, twenty-five is subtracted from the number to be squared (let’s call it ) and the result is used as the first two digits of the answer. This can be represented by the expression . The last two digits come from simply subtracting the number to be squared from fifty, then squaring it, . So the algorithm is simply:

where is the square of .

The first question, and the one posed in the email is, does this really work? That is, does ? A few steps of algebra get us:

So, we can say that if the algorithm is performed according to the formula above, yes, this method always works, however, the comic does not directly use any formula, so let’s see where the two methods diverge, if anywhere.

Consider a concrete example by letting . Certainly, we wouldn’t want to resort to such a complicated algorithm for such a basic operation, but suppose we did use the formula, we’d get , so the formula version checks out, but how about the mechanical algorithm? Well, , so our first two digits are and . Of course, this is already problematic, and now consider that has more than two digits. We might end up with something like .

This places a constraint on , namely that it must be greater than or equal to twenty-five.

One might also suspect numbers larger than fifty to introduce the same sort of trouble, however, the parity of the square function saves us. For example, consider squaring the number fifty-one in this way. , , and , so the comic boyfriend would get the correct answer of , provided he padded the single digit answer with a leading zero.

The subtraction here does give us a constraint, however, because we need the square of the result to be less than , otherwise, we’d have more than two digits. Imposing this limit, we get . This is actually a subset of the previous boundary, and thus represents the complete set of possible values for for which this method will “work”.

I was curious about the choices of twenty-five and fifty, so I wanted to see what other combinations might work. As it turns out, there are none! Let’s see why. Suppose we could choose any two numbers to perform this trick, call them and . Then we’d write . Since we require the sum of all terms other than the to be zero, it should be apparent that and . That is, the algorithm uniquely determines the choice of the constants and .

Although its proof can be reduced to a simple algebra problem, the algorithm is non-obvious, but simple. Whoever designed it had a very nice insight. Feel free to post similar ones you may have found in your travels. Bonus points if your submission is in the form of a rage face comic. : ).

]]>Back in 1998, I overheard my high school math teacher musing to another teacher about the following problem:

Given a line segment of unit length, select any two points to cut the line into three parts. What is the probability that the resulting three line segments will form a “good” triangle.

By a “good” triangle, he meant one which completely encloses a non-zero area. That is, if the longest side is longer than the sum of the other two sides, it’s impossible to use those three segments to enclose an area. The trivial case exists where the longest side is equal to the sum of the shorter sides, but this triangle encloses no area.

**A Delayed Response**

Several years later, after I had learned to program myself, I wrote a Monte Carlo algorithm to simulate this situation, but much to my dismay, after 1000 trials, I got a probability of a little more than . To me, this certainly meant that the answer should really be – I just needed a few more trials. Unfortunately, as I increased my trial size, the number started converging to . This really disturbed me, but at the time, I was weak, so I gave up.

**Renewed Strength**

Many years later, in 2007, after graduate school, I decided to approach the problem from a theoretical perspective, and after some fumbling around in an attempt to create a probability distribution function, an insight from a former professor lit my path. He suggested that I think of the cuts as a two-dimensional x-y graph, where any point on the graph, represents cuts at locations and on the line segment. Let’s see what this looks like:

Assuming the line segment is of unit length, then the two cuts, and both lie in the interval . As an example, the point would correspond to cuts at locations and on the line segment. This happens to result in a middle piece of length , meaning that it corresponds to the trivial case of a bad triangle noted above.

This was useful because now I could analyze the graph for the regions which create good triangles, and the ones which create bad ones. I started by considering how the cuts might go awry. The simple answer is that any time one of the segments is larger than , I get a bad triangle. This can happen, of course, in three ways:

Segment is too large when both cuts are greater than . This corresponds to the area bounded by the lines and . Segment is too large when both cuts are less than , or the area bounded by and . The middle segment, was a little more tricky. I am looking for the region where , which really means two different areas – one bounded by and the other by . Take a look at the result of cutting out these “bad” sections out of the graph.

The bounds described above are labeled with the corresponding line segment and I provided the value of each bounded area for reference. It was then fairly easy to see that each offending segment reduces the overall “good” area in the graph by . In total, that leaves only of good area left, so this was my result. That is, one fourth of the time, the two random cuts will result in a good triangle. This is so clear from the graph that my Monte Carlo simulation must have been wrong.

**Time To Get Serious**

Recently, and after a few years of experience as a software developer, I decided to rewrite the simulation in C#. Whereas once it took me a full day to write, I was now able to reproduce it in a few minutes. Here is the core of my rewrite:

for (int i = 0; i < numberOfCuts; ++i) { List<double> sides = new List<double>(); double firstCut = rng.NextDouble(); double secondCut = rng.NextDouble(); double lowCut = Math.Min(firstCut, secondCut); double highCut = Math.Max(firstCut, secondCut); sides.Add(lowCut); sides.Add(highCut - lowCut); sides.Add(1.0 - highCut); sides.Sort(); if (sides[0] + sides[1] < sides[2]) { bad++; } else { good++; } }

To my delight, after trials, I get ! Now I was feeling very confident about both my theoretical approach and my algorithm, so I wanted to introduce a little more complexity.

For efficiency pedants, I will concede that instead of checking the sum in the “if” condition above, I could simply check that the longest side does not exceed , but over the course of one run of trials, I doubt it would make any tangible difference.

**A New Challenger Emerges**

To create a slightly more interesting problem, I imposed the restriction that the first cut alone is responsible for defining segment . Then the second cut can only act on the line segment to the right of the first cut. How would this change the result, if at all?

Well, now that I had a working simulation, I only really needed to change one line of code to account for this new criterion. Instead of generating a random number between zero and one for the value, I simply generated a random number between the location of the first cut and one. Effectively, I was simply imposing the condition that for the second cut. Here’s the one-line code change:

double secondCut = rng.NextDouble();

becomes

double secondCut = firstCut + (1 - firstCut) * rng2.NextDouble();

After running this slightly modified algorithm over trials, I got a probability of . But that’s almost exactly what my old algorithm gave me! I felt pretty sure I had found the bug in my original program. I had somehow implicitly assumed that must always be larger than . Incidentally, after looking over my old code, I was indeed imposing that restriction by only letting the second cut act on the second piece of string created from the first cut.

To confirm the new solution, I simply calculated it the old fashioned way, primarily because the geometric solution is not nearly as obvious for this problem, which we’ll see later. Let me reproduce that logic for you:

Again, I considered the bad regions first, and ruled them out. Of course, any cut where is disqualified because this also forces , resulting in segment being longer than . Segment would be larger than when , so the only “good” values left fall in the range where and , keeping segment from getting too large.

What is the probability of falling in this range? Well, given any particular value of , let’s call it , the random number generator will give me a which lies in the range . I am only interested in those values which lie in the range , so the probability of this event is simply the quotient of the ranges. That is, if I define the event as taking on a value such that a “good” triangle is formed, then .

But I’m not interested in a single value of . I need to integrate over all values of to find the solution. Doing so, gives:

And with the nifty substitution , I get

It will probably not surprise you that this evaluates to , giving yet another confirmation that the simulation was a success (as well as the .NET random number generator).

**One Final Piece To Explain**

Now that I know the equations of the boundaries, I can look at how the graph has changed.

Notice that, since there is now a restriction on , namely, that it must be less than , the area which makes too large is quite big, half of the entire graph in fact. Also, note that the area corresponding to segment being too large hasn’t changed a bit. Further, all of the “good” area lies in a single contiguous region, which looks a little nicer, doesn’t it? But what about that area? It certainly looks like doesn’t it? Well, in fact, it is. Does this mean I’ve made another mistake?

As it turns out, the graph above is technically correct in that, given the stated conditions, it represents every possible pair of cuts that will yield a good triangle. The reason the actual probability is higher is simply that the distribution is no longer uniform. That is, certain points in the “good” area are actually more likely to be sampled than points in the “bad” space. This is because of the fact that an increasing value of pinches the range possible values of , so that larger values of are much more likely to get sampled than lower ones. While some of these will still form a bad triangle, the ones that form a good triangle outweigh them enough to skew the probability up from up to almost .

**What I Learned**

Although it’s somewhat embarrassing that it took a student with a graduate degree in mathematics fourteen years to solve what amounts to a fairly simple probability problem, I enjoy sharing it because it reminds me of the frustration and reward of mathematics. Certainly whatever you love to do has made you feel the same, so I hope you enjoyed my story too.

]]>A college friend and roommate of mine once showed me how he used a GA to solve the two bishops chess endgame scenario from any position. I decided that it would be fun, and within my capabilities to focus on a simpler game, Mancala.

**What’s a Genetic Algorithm?**

A GA is just a search. If a search space is simply too large to enumerate in a reasonable amount of time, a GA provides a means of finding a “good enough” solution without searching through every possible one. In practice, this usually means waiting quite some time (maybe a few days) for the algorithm to run, but the reward is usually a damn good solution. A GA attempts to model biological evolution by simplifying it to three stages. After starting with a potential pool of solutions, we assign each of them a fitness value, select the ones with the “best” fitness (selection), use them to create “offspring” solutions (crossover), optionally mutate the offspring (mutation), and start again with the newly created pool of solutions. Let’s see how this applies to the Mancala game.

**What’s a Mancala?**

Mancala is a very simple two-player, turn-based board game where each player has a row of buckets with some stones in them and a scoring bin. Each player takes a turn by emptying one of the buckets and placing one stone in each consecutive bucket (including the scoring bin) until no stones remain. There are a few other rules, but the object is to finish with the most stones in your scoring bin.

In a somewhat unfortunate circumstance, I found out after implementing my algorithm that Mancala is a solve game, meaning that someone has figured out the perfect move for every position, and given that both players know the perfect move, the outcome is predetermined. The most familiar example of a solved game is probably tic-tac-toe. It’s easy to guarantee a draw, even if you don’t get the first move. Now, forgetting that we ever saw this, let’s go searching.

**How to Teach a Computer to Play:**

We need just a few things to build an algorithm – a way to value the current game board (this is the AI part, which I will henceforth call a “strategy”), a way to select the best strategies, a way to create new strategies from old ones, and a way to mutate strategies. These methods will define a GA which helps us efficiently search the space of all possible strategies.

**Choosing a Valuation Strategy**

I chose to value the current state of the board based upon three properties – the net score (), the net number of stones on the player’s side of the board (), and the net number of stones which must eventually advance to the opponent’s side of the board (). For example, in the following position, the player controlling the bottom six bins scores in the rightmost bin.

That player would calculate the following values for the parameters described:

The player’s score is . The opponent’s score is , so the net score would be . The player controls stones, while the opponent controls stones, netting . To calculate the overflow parameter, note that choosing bin would result in one score plus one stone on the opponent’s side of the board. Choosing bin would drop one stone in bin , score one, and result in one overflow, while choosing bin would also yield one overflow, bringing the player’s overflow value to . The opponent’s overflow would be , using the same method, so the net overflow for this player in this position is . I reverse the order of subtraction here because overflow is actually bad for the player, who wants to score without giving the opponent a chance to do so. A given strategy () consists of creating a set of weights for these parameters.

I’ve chosen to restrict the weights to the range for no other reason than simplicity. Now, we can give the board a value (), based on our strategy:

If we just so happened to choose values , for our weights, we’d get a board valuation of:

, leading us to believe that this position is pretty favorable. Since it is, in fact, not favorable at all, we would expect this strategy to lose quite often, and therefore not survive the round of selection, described below.

I should quickly note why this valuation is important. Since, Mancala is a two-player, turn-based game, we can employ a minimax algorithm when deciding to make a move. This roughly means that on each turn we look at the move that maximizes the board value for us, while leaving the opponent the least room to improve their board value. The deeper we search in the move list, the better we can limit our opponents moves. I chose to search seven moves deep mostly because I didn’t want to run the algorithm for any more than two days, : ).

The minimax algorithm could be the topic of its own post, but I’ll save that for a later date because we need to get on to the business of selecting strategies for crossover.

**Assigning Fitness**

I chose to have every strategy in the pool play against every other strategy twice, alternating who goes first, and define “fitness” for each strategy as the number of wins it achieved that round. Once fitness is determined for each strategy in the pool (i.e. every match has been decided), I chose two parent strategies at random, but proportional to their fitness, for the crossover phase, which creates the next pool of strategies.

For a simple example, suppose the pool size were . That means there are twelve matches played in each round. Suppose the first strategy won all six of its games, and the remaining three split their matches. The fitnesses would then be , resulting in selection probabilities of . Now we assign each potential parent a slice of the range . Strategy 1 gets the range , strategy 2 gets the range , strategy 3 gets the range , and strategy 4 gets the final slice, .

We need four parents, so we sample this distribution four times. Note that we could get the first strategy each time we sample, but we could also get strategy 3 for all four parents, illustrating why a pool size of four is generally a bad idea. Since playing the actual matches is the most time-consuming part of the process, the choice of pool size actually represents a trade-off between the time we’re willing to spend letting the algorithm run and the variety of strategies we’d like to search. I chose a pool size of forty.

**Making Babies – Digitally**

I’ll describe the algorithm for creating child strategies from parents shortly, but first, I’m going to name the strategies for ease of reference. I’ll call parent 1 Wes, parent 2 Mara, and the two child strategies will be named Miles and Sophie, who come about in the following way:

Miles gets the average of the weights of Wes and Mara. Sophie gets from Mara and and from Wes. As an example, suppose Wes and Mara have the weights below:

Wes:

Mara:

Then Miles looks like this:

Miles:

And Sophie gets:

Sophie:

This may seem a somewhat odd choice. I would agree, but since I’m not using a binary coding for the system, I didn’t think a simple one-point crossover (where Miles would have simply gotten the values not assigned to Sophie) would provide enough variety. So, I chose this hybrid crossover/average method for creating children instead.

**Mutation**

Each child then gets a chance for a slight variation in each of the three parameters. In my case, I chose to shift each parameter up or down by a random amount sampled from a normal distribution centered at zero with a standard deviation of . Even though this usually leads to a small mutation, I have imposed an artificial bound of on each parameter, so if the mutation goes outside of that, I simply rein it back in.

After mutation, we now have a fresh pool of strategies to start all over again, but when will it end? I simply chose to run the algorithm for a static number of generations (75), keeping track of the most fit strategy (the one with the highest number of wins) over the course of the entire run. Each run took about three days, so I only did three of them.

**What Did My Computer Learn?**

The best strategies I found over the course of three runs were as follows:

1:

2:

3:

There are some similarities among all three, and very obviously among the last two. For instance, score is always weighted highest, just a little higher than balance of stones, and roughly twice as high as the overflow parameter. Since I could have appealed to intuition to tell me that score is the most important factor, these results make me reasonably confident that the algorithm found a good solution. While I had no prediction for the relative weights of the balance and overflow parameters, the results are at least not shocking. Controlling more stones is really quite important, though not quite so important as scoring, and overflow, while certainly bad, can’t be expected to match the value of the other two parameters, and it clearly doesn’t. In fact, based on the results, I might try an even simpler board valuation technique that only considers score and balance.

The real test, however, is how competitive the strategy is against a real human. The best strategy gave me a real challenge. It was several matches before I found a weakness, and several more before I was able to beat it, but even then I abused my knowledge of how it played to do so. The really fun part about using a GA is that I never told any of the strategies the object of the game. I merely asked them to value the state of the game in different ways, and then pitted them against each other to see whose valuations were best.

I’ve made the code available, so if you’re inclined, you can try it yourself. The project was built with VS 2008 as a C++ project. It’s fairly shoddy code, but it works and it’s fun! Enjoy.

]]>**Setup**

I created two cells, each of which invokes the random number generator to give a number between one and six. Next, I set up a column to keep a tally of every roll of each die, as well as their sum. This was a little tricky, but was ultimately made possible by using recursion set to go one level deep. You can see this in the worksheet linked above. Finally, I graph the result for each die and the sum of the dice in a histogram. Below are the results for ten thousand rolls:

**The Sum of Uniform Random Variables is Not Uniform Random**

Even though each die has a uniform distribution (the graph is flat), the sum of the dice does not. It’s much more like a normal distribution. The reason for this is just counting. For a single die, there is only one way to roll a five, or a four, or any other number, but with two dice, there are six ways to roll a seven and only three ways to get a four:

This means we should expect to see seven rolled more often than four, and indeed our simulation confirms this.

**These Dice Are Not Due**

Another often misunderstood feature of a probability distribution is the catch-up feature, or really the lack thereof. For instance, if one observed two sevens in a row, one might expect that the next roll would be less likely to yield a seven. Since the long term distribution is eventually a bell curve, we can’t just keep getting sevens forever, right?

Perhaps counter-intuitively, this is not the case. In fact, even if we only observe rolls after two sevens in a row, the distribution is the same. Seven gets rolled with probability one-sixth, no matter what was rolled before. In the simulation of ten thousand rolls, I counted the number of times seven was rolled twice in a row (224) For each of these, I made a new graph of the distribution of the roll *after* the double sevens.

Does the seven look like it’s suffering here? It doesn’t, and we could have argued this on the simple basis that we know that dice have no memory! Beware any gambler who mentions dice (or a game board, etc) being “due”. It’s just not true.*

For those curious, the Excel logic for this was a bit tricky as well, but it’s all there in the sample worksheet.

**Infinity Is Relentless**

The previous result is an example of the fact that the long term distribution isn’t affected by short term results, or even purposeful deviations from the distribution. To help illustrate, let’s give the tally for a roll of two a head start of, say fifty. We expect this head start to skew the results for a few rolls, but in the long run, we won’t even know the difference. How long is that? Let’s find out. After ten rolls:

That looks pretty bad. Let’s try 100 rolls:

A normal distribution is forming, but it’s still dwarfed by the head start. After 1000:

Almost all of the other numbers have overtaken the 2 now. We could almost believe that this happened legitimately by chance. Let’s go one order of magnitude higher. Onward to 10,000:

Our normal distribution is back. The 2 does look a little off, but then again, so does the 6. Which one had the head start again? In the limiting case (infinite rolls) we would never be able to tell whether any result had a head start, no matter how large.

I will describe the Excel trick to achieve the head start, because it’s not in the workbook. I simply created a cell with a value of 50, then added that cell to my formula which calculates the tally for a roll of 2. After the first iteration, I changed that 50 to a zero so it no longer had any effect.

It Doesn’t End There

Did you know that if you rolled two dice forever, you could guarantee an arbitrarily long string of snake eyes. That’s right, you could guarantee yourself 1000 rolls of two ones in a row, if you just roll long enough. Again, we’re talking infinitely many rolls, but it is guaranteed! I won’t run that simulation here, but proofs of all of these claims can be found in any introductory textbook on probability. It’s just more fun to see them in action.

* It might be the case that a slot machine is “due” depending on how it was programmed. It is conceivable that some slot machines guarantee a hit after every so many spins, perhaps as a marketing tactic, but this breaks the assumptions of a probability distribution.

]]>