Solving IOI 2011 Race

It’s been a while since the last one, but today we’ll tackle a problem from the first day of the 2011 International Olympiad in Informatics called “Race”. If you don’t know the problem and want to read its statement, here’s the official one: http://www.ioi2011.or.th/hsc/tasks/EN/race.pdf . If you want to run your source code against the official test cases, you can download them along with the template here: http://www.ioi2011.or.th/hsc/testdata/race-test.zip and if you want to submit it on an online judge, try this one: http://wcipeg.com/problem/ioi1112 .

Before we start, I’d like to make a few comments. This problem is sort of a special one for me since I was a participant when it was presented at the IOI. At the time I did poorly on this problem, which wasn’t unexpected since it was the third most difficult problem from that year (judging from the overall scores) and I was still unexperienced and young. However, I don’t feel like this problem is harder than Ideal City, for example, so I’d rank it a 3.5 in a scale from 1 to 5. Having a partial score (even the maximum before 100) is rather easy, but the 100 points can be tricky if you don’t know the right techniques.

Small Disclaimer: The analysis I present here is original (I wrote it and code it independently), but the ideas and concepts have been established for a while. This solution is very similar to the official one you can find at the IOI website, but I wanted to do my own. Also, when I was training to the 2012 IOI I remember trying to implement this problem and had some difficulties with details that were ignored by the official solution as well as how to simplify my implementation.

Task Summary

We’re given an edge weighted tree with N nodes and an integer K. The problem asks us to compute the path with fewest edges such as the sum of the weights of the edges is exactly K.

Solution

As always, I’ll solve the problem incrementally. Even though the solutions for different subtasks are quite different, they give us some insight into the problem.

Subtasks 1 and 2

In the first two subtasks N is less than 100 and 1000, respectively. So a brute force approach is sufficient here. The idea is to perform a single depth first search starting on each node and find which of the found paths that sums to K has the fewest edges.

That said, this is a simple algorithm that can be easily implemented in O(N^2) (since the number of paths in a tree is of this order). The first subtask was made to accommodate solutions that were poorly implemented and ended up being O(N^3), for example, if one was to recompute the distances between each pair of nodes instead of calculating them incrementally.

Subtask 3

The third subtask was still a relatively standard one. For this subtask N can be as large as 200 000, but K is limited by 100. This means we can play around with K as long as we keep our solution linear on N. The idea is to apply a rather well known algorithm, dynamic programming on trees.

We start by rooting the tree in some node u, for example, on node 0. If you are unfamiliar with this concept, rooting here means setting u as the “start” of the tree (from where the rest of the nodes will branch out) and thus we calculate the children of each node and the parent of each node (where the root has no parent). Now we create a dynamic programming array of N by K dimensions. The idea is that for each node v, we have: DP[v][l] = min(DP[c][l – cost(v, c)] + 1) for all c, where c is a child of v and cost(v, c) is the cost of the edge from v to c. In other words, the path with the minimum number of edges that sum l and ends at node v is the minimum of the paths that sum to l – cost(v, c) over all children c of v plus one (the own edge {v, c}). We can easily fill this matrix in O(N * K) if we iterate the tree in a depth first search fashion starting from the root and for each node filling in all values from K to 0. Also, don’t forget the base case, that for all nodes v, DP[v][0] = 0 (and DP[v][x] = ∞ for all x different from 0, we will fill these with the above recurrence).

After we fill the dynamic programming matrix, our answer is the minimum of DP[v][K] for all nodes v. Thus our approach here is O(N * K) and is enough for subtask 3 (notice, however, that this approach doesn’t necessarily work for subtask 2, since K isn’t bounded by 100, so to get full marks on the first three subtasks you’d have to implement both approaches).

Subtask 4

The final subtask requires some more creativity. Now K is no longer bounded by 100, so our solution will have to be linear on both N and K. We could try to approach this in a bunch of different ways, but one way that works well is called centroid decomposition. Instead of simply describing the technique and then trying to apply it, we’ll progressively get there more organically throughout the rest of this analysis.

An initial important observation is that our first solution tries a lot of paths multiple times. I say so because when starting a depth first search on a certain node we are basically “shifting” our previous search one node further. To use this in our favor, we’re going to first select a node from the tree and consider all of the paths pass through that node. To calculate these it is not enough to do a simple depth first search like we did for subtasks 1 and 2, but rather something slightly more ingenious.

Let’s fix a node v and calculate all paths that cross v. If we do a depth first search starting on each neighbor of v (careful to not go back through v), we can store what’s the minimum number of edges necessary to make a path of length l, for all l between 1 and K. Now, if we keep an array A[] of K elements (initially all at infinity, but the element 0, that will be at 0), which we’ll fill with the minimum number of edges to get a path of sum l as A[l]. When we do a new depth first search starting on a new neighbor of v, if the current length of the search is l and we got here using p edges, then there is a path that sums K with p + A[K l] edges. We have to be careful to not consider overlapping paths, so in practice we do two depth first searches from each neighbor of v, one to calculate the minimums and one to fill the array A[].

If we do the above calculation, we can divide the graph into separate components by removing the node v from the graph (along with all adjacent edges) and recursively calling our method to calculate the answer on each component. We can do so because all paths between components have been considered in the calculation of the above paragraph (since they have to use node v). So essentially we have found a way of dividing our original problem into smaller problems, in a sort of divide and conquer way. But now the question arises: which node should we select as v in order to have the best efficiency possible? Note that the method of the above paragraph is of complexity O(T), where T is the number of nodes in the component we are applying the method (initially this is N), so the better we divide the graph into smaller graphs, the better the complexity. Here is where the idea of centroid decomposition enters. The centroid of a tree is the node whose maximum subtree is minimum, or in other words, that when removed partitions the graph in components whose maximum size is the minimum possible (note that the centroid is not the same as the center, where the eccentricity is minimized instead of the maximum sized subtree). It is sort of the center of mass of the tree and can be generalized to weighted graphs, but here we are interested in the unweighted decomposition, to be able to take advantage of the following result. The following image shows an example of such (notice that 6 would be the center of the tree, but 4 is the centroid).

Tree and its centroid in red

Tree and its centroid in red

Why the centroid? Because there is a known result that says that if the original tree has N nodes, the centroid partitions the tree into subtrees each of size at most N/2. It is not very difficult to prove, if a certain node v partitions the tree into a subtree of size larger than N/2, then choose the node from that subtree that neighbors v as the new possible centroid. If we repeat this, eventually there is no subtree is larger than N/2. This helps us because if always select the centroid as the vertex from the previous paragraph, we always divide the tree into at most two trees of size N/2 (or a lot of trees of sizes smaller than N/2 that sum to N). Thus we can apply the divide and conquer idea and get a complexity similar to the merge sort O(NlogN) complexity, for the same reasons.

To find the centroid there are a number of different algorithms. The one I feel is the simplest, is to first root the tree in any node. Afterwards, pre-compute the size the subtree of each node (the subtree that contains all its descendants) using a standard depth first search. Finally, iterate through all the nodes and select the one where: max(T – size[v], size[c1], size[c2], …) is the least possible, where T is the number of nodes of the tree and c1, c2, … are the children of v. This corresponds to the size of all the subtrees starting on v.

So to summarize everything, the algorithm first computes the centroid of the tree in O(N), then calculates all paths that cross the centroid in O(N) and finally divides the tree intro subtrees by the centroid and recursively computes the same (unless the tree has size 1, which is the base case). The overall complexity of this algorithm is O(NlogN), which is enough for the full 100 points.

Implementation Details

I implemented a solution in C++ using the method described for subtask 4. I didn’t implement the methods of the previous subtasks since they are less interesting, but they are simpler. You can find the code here. The code isn’t very complicated, it is actually pretty much divided into the tasks described in the explanation of the algorithm. However, there are a few details I think I should highlight.

First of all, in order to simplify and shorten my implementation, I didn’t actually root explicitly any tree (meaning I didn’t calculate children or parents, like in my previous code for Ideal City 2012). What I actually did was keep an adjacency list and also a processed nodes list. Now, whenever I did a depth first search, I kept the previous node and used it as the parent of the next node in order to prevent infinite recursion. The processed nodes list was used in order to “close” the recursed subtree, that is, whenever I did any recursion after calculating a centroid, I marked the centroid as processed and made sure I never got there again in any of the next calculations, since this node is no longer required. That way, I didn’t have to do expensive operations or pass the graph as an argument to the function.

Another thing that I did and didn’t mention on the explanation above, was the way I dealt with the array with K elements that stored the minimum number of edges needed to make a path of size l for all l from 1 to K. After each recursion step, we need to clean up this array, but we can’t simply memset it or iterate through it, since that would be O(K) and would ruin our lovely complexity. What I did was use a standard (though clever) technique that considers an auxiliary array initially filled with 0. Then, we use an extra variable, initially at 0. Whenever we need to reset the array, we increment the extra variable and only if A_auxiliary[value] == extra variable do we consider the actual value on A[value], if not, we ignore the current value on A[]. Also, whenever we update the value of A[value], we place the value of A_auxiliary[value] = extra variable.

Obviously, another thing to look out are all kinds of overflow. If you implement it in the right way, you shouldn’t need more than a C++ int, but that means capping the depth first searches when the cost sum is larger than K (not on the centroid one, of course).

Concluding Remark

Alas, we have reached the end. This solution is a very specific application of the technique of centroid decomposition that, as you might have guessed by now, is the process of applying divide and conquer on the tree by using the centroid to divide it. There are not a lot of other applications, but I think this one is good enough to be worth learning. The problem is very interesting mainly because it looks much more difficult than it actually is (once you know how to solve it). There were a few 100s in the official contest, so it is a tractable problem, but at the same time it isn’t trivial.

As always, I’m open to comments and corrections you might have.

Advertisements

Solving IOI 2012 Ideal City

Today my goal is to provide an analysis of a problem from the second contest day of the International Olympiad in Informatics 2012 called “Ideal City”. There are multiple sources with the problem statement, but I prefer to use the official one: http://www.ioi2012.org/wp-content/uploads/2011/12/city.pdf. If you also want to run your source code against the official test cases you can either download the contest environment and test cases here: http://www.ioi2012.org/competition/tasks/materiali/ioi2012data.tgz, or use an online judge like the following: http://wcipeg.com/problem/ioi1221.

Before we start, I’d like to make a general comment. If I could rank this problem in a scale of difficulty of 1 to 5 I’d say it is a 4. After knowing the solution it is very intuitive, but notice that when this problem was presented it was one of the hardest problems (judging by the number of 100 point solutions). However, I don’t feel it is an impossible problem, its solution actually very easy to follow and very interesting as well.

Small Disclaimer: The analysis I present here is original (written by me and coded by me), but the ideas and concepts are pretty general. Some parts of the solution are inspired by other sources like the solution hints presented in the official website. I wrote the article as an exercise and also to provide a more thorough and detailed analysis of the solution.

But enough talking, lets tackle the challenging problem and solve it!

Task Summary

We are given a set of N squares on a infinite grid, represented by a pair of coordinates. We’re told that the given squares describe a set of connected cells (two cells are connected if they share an edge) and also that the empty cells, that is, the non-square cells also form a set of connected cells.

The distance between two cells is given as the minimum number of hops (meaning going from one cell to an adjacent one) required to go from the one cell to the other. The distance between squares is analogous to the cell case, but only considering squares (it is not possible to hop to a cell if it doesn’t have a square).

The statement asks us to compute the sum of all the distances between all the distant pairs of cells that contain squares.

Solution

I’m going to solve this problem incrementally since, like many other IOI problems, solving the first subtasks helps solving the following ones. So we’ll address each subtask one at a time.

Subtask 1

The first thing to notice is that obviously the set of squares forms a connected graph. Each square is a node and there is an edge between two nodes if the corresponding squares are adjacent. Thus, to calculate the distance between each pair of nodes it suffices to use the classic Floyd-Warshall algorithm to obtain the matrix of distances. This is a O(N^3) solution and since the N is at most 200 for this subtask it’s more than enough.

Subtask 2

Now the Floyd-Warshall algorithm doesn’t work anymore, it is too slow. However, we’re not using the fact that the graph is not weighted, thus we can do better. A simple idea is to do a BFS (Breadth First Search) from every square and collect the sums of all distances found by each BFS. In the end the answer is the total sum divided by two, since we find each distance twice, one in the BFS for the first point and another in the BFS for the second one.

Each BFS is O(N) and there are N squares, thus this solution overall is O(N^2). Since N is at most 2000 it’s enough.

Subtask 3

So far so good, all very simple, but let’s complicate a bit.

For this subtask the BFS approach is too slow. However we are said that for every pair of squares, if they share the same X coordinate then all the cells in the grid between them contain a square and likewise if they share the same Y coordinate then all the cells in the grid between them contain a square.

This means that all the possible inputs will have a “triangle” form, like the following images show:

Examples of input for subtask 3

Examples of input for subtask 3

More formally it means the squares will form a convex polygon. Now, how can we use this in our favor? Let’s start by solving a simpler case like the following image shows:

Simpler example

Simpler example

If we consider the distance between a random pair of squares we can decompose it into two components: one vertical and one horizontal. Let’s first consider only the horizontal distance.

For squares in the same column it is trivial that the horizontal distance is null. Furthermore, the horizontal distance between any two squares is uniquely determined by the column that they belong two. This means we can group all squares in the same column and create a node weighted graph where each group of squares is represented by a node whose weight is equal to the number of squares in the corresponding group and where two nodes are connected if they represent adjacent columns. The notion of graph here is just a matter of notation since it is obvious that this is simply a linear graph, but it will be handy in the next section.

Given this, to calculate the sum of all horizontal distances we need to calculate the distance between each two nodes of the graph times both weights, that is: \sum w(v) \times w(u) \times d(v, u) for all nodes v, u. The reason why is simply combinatorics, we’re matching every square from the group represented by a certain node v with every square from the group represented by a certain node u and multiplying by the (horizontal) distance between them.

We need to calculate this in O(N) since O(N^2) is too slow, so simply taking all pairs of nodes and applying the formula isn’t enough. However, we can use the fact that the graph is a linear graph. To do so, start by observing that the contribution made by the first node (let’s label each node with integers from 1 to K from left to right in the grid, where K is the total number of nodes, and thus the first node will be node 1) to the sum is as follows: w(1) \times (w(2) + 2 \times w(3) + 3 \times w(4) + \ldots) (w(i) is the weight of the i-th node), since the distance between node 1 and node 2 is 1, the distance between node 1 and node 3 is 2, etc. Similarly, the contribution from node 2 is as follows: w(2) \times (w(3) + 2 \times w(4) + 3 \times w(5) + \ldots) (we don’t consider node 1 here because we only want to count each distance once). There is a clear pattern here we can use: start by calculating w(2) + 2 \times w(3) + 3 \times w(4) + \ldots with a simple for loop and assign this value to a variable \mathrm{half\_counter}. Now loop through each node i in increasing order of label and first add to the total (horizontal) distance w(i) \times \mathrm{half\_counter} and then subtract w(i) + 2 \times w(i + 1) + 3 \times w(i + 2) + \ldots from \mathrm{half\_counter}.

Obviously we can’t recalculate w(i) + 2 \times w(i + 1) + 3 \times w(i + 2) + \ldots every time we need to use it (because it would make the previous algorithm O(N^2)). However it is fairly easy to pre-calculate it using prefix sums. Assuming \mathrm{prefix}(i) = w(i) + w(i + 1) + w(i + 2) + \ldots is the common prefix sum, then if we set \mathrm{inc\_prefix}(i) = w(i) + 2 \times w(i + 1) + 3 \times w(i + 2) + \ldots then: \mathrm{inc\_prefix}(i) = \mathrm{inc\_prefix}(i + 1) + \mathrm{prefix}(i).

With all this we have successfully calculated the total horizontal distance between all pairs of squares. We still have to calculate the total vertical distance. Fortunately, this is an analogous case to the horizontal one, so if we build the same type of graph by grouping all square in the same row and considering a node per group we can use the same arguments we’ve used to calculate the total vertical distance. Obviously the answer is the sum of both values.

This approach is O(N) (we strove to do so) and fits the subtask 3 constraints just well. However, we assumed the graph is linear, which is not necessarily the case for subtask 4, thus it is not enough for full marks. (Note that this technique fails subtask 1 and 2 for the same reason it fails subtask 4, thus to score the points for the 3 subtasks it is required that we implement this approach together with the one from subtask 2).

Subtask 4

We’re finally at the end of the quest we set to complete. As seen in the previous section, the method we described cannot be used to solve this subtask. Although, we can do similar observations.

First of all notice that again we can divide the distance between a pair of nodes into a horizontal component and a vertical one. Note that in this case each component isn’t necessarily the vertical or horizontal direct distance (like in the Manhattan distance) like was the case in the previous section.Let’s again consider only the horizontal distance first and as we’ve seen the vertical will be analogous.

For the general case (as in this subtask) we can no longer group all squares in the same column like we did previously. We can however consider groups of connected squares with the same X coordinate (connected in the sense there is a path between each other using only squares from the group). With this, we construct a node weighted graph as we did, but this time a node represents a group of squares as described where the weight is equal to the number of squares in the group and there is an edge between two nodes if at least one square from the corresponding group of the first node is adjacent to a square from the corresponding group of the second node. The following image shows an example and the corresponding graph (the numbers in each node correspond to its weight and the red boxes limit the square groups):

Subtask 4 example horizontal graph

Subtask 4 example horizontal graph

For this particular case we can see that the graph is a tree, but it is easy to show that that will always be the case. If a set of squares is represented by a non-tree graph, then there is a loop somewhere and that loop implies that the set of squares it represents have a “hole” in the middle, that is, there is a set of isolated empty cells. Since this cannot happen under the constraints of the problem, the graph will always be a tree.

Great! Now all that is left is to calculate the same quantity as above, \sum w(v) \times w(u) \times d(v, u) for all nodes v, u. We can’t use the same strategy as in the previous section because it was based on the fact that the graph is linear and doesn’t apply to a general tree. To avoid doing O(N^2) calculations we can use an interesting observation. For each edge e consider S1(e) to be the sum of the weights of all nodes in one of the connected components of the graph generated by removing the edge e from the tree and S2(e) the sum of the weights of all nodes in the other component (note that the two components are complementary). Then the total horizontal distance is given by: \sum S1(e) \times S2(e) for all edges e. First we will see why this is true and then we will see why this is useful.

Consider a certain pair of nodes, v, u. Since the graph is a tree there is only one path between them (which is obviously also the shortest path). Let’s look at all the edges in this path. For each edge, v is in one of the connected components of the graph generated by removing that edge from the tree and u is in the other one. Hence, for such an edge e, we have that S1(e) includes w(v) and S2(e) includes w(u). Them a term w(v) \times w(u) arises once in the formula S1(e) \times S2(e). Since the number of edges in the path is d(v, u) by definition, in the total sum the term w(v) \times w(u) \times d(v, u) arises once. If we consider any edge not on the path between v and u, then both nodes will be on the same connected component of the newly disconnected graph. Thus, the coefficient of w(v) \times w(u) is exactly d(v, u), which means the initial formulas are the same.

Now to why this is useful. It is easy to pre-calculate S1(e) for every edge in O(N) total time. If we root the tree in an arbitrary node, using a simples DFS (Depth First Search) on the rooted tree we can consider S1(e) as the sum of all weights for the subtree rooted in the lower node of e (this means, the node with the higher depth of the two). Since the two components are complementary, S2(e) = \mathrm{total\_weight} - S1(e), where \mathrm{total\_weight} is the sum of all weights (which in this case is N by definition). Then with a simple for loop we can use the pre-calculated values of S1(e) and S2(e) for each edge to calculate \sum S1(e) \times S2(e) in O(N) total time.

As I said, this solution is O(N) and works for the general constraints of the problem, thus is enough to obtain the full 100 points. Don’t forget, however, to calculate the vertical distance as well in the same way.

Implementation Details

I’ve implemented a solution in C++ that scores 100 points using the method described in the subtask 4 section. I didn’t implement the method described in the subtask 3 section since it is similar to the subtask 4 one. I tested it against the official test cases to make sure it is correct. You can find the code here. The code seems bigger than it actually is because besides including some comments and an header, it is pretty much the same thing twice, one for the horizontal distance and another for the vertical.

I found the actual implementation to be a tad more difficult than it appeared from the solution. It is a very interesting and educational exercise to implement this in any language (the STL helped a bit I admit). It is a good way to show that practice and theory are not equivalent (unfortunately maybe).

I commented the code to give an idea of where is what in the code, but I feel like I should point out a couple of points to bear in mind should the reader decide to implement it him/herself.

First of all read the (official) problem statement carefully. Something I forgot while implementing was “Since the result may be too big to be represented using 32 bits, you should report it modulo 1 000 000 000 (one billion)”. Additionally it is advised to use long long integer (or 64 bit integers) to do the intermediate calculations. Overall what I’m saying is: be careful with overflows.

Another interesting point is that it is helpful to build a graph using an adjacency list first and only then do a DFS to root the tree (to root the tree I used the struct in the beginning, for each node I consider its children and its parent).

Moreover, I used the map from the STL to map each coordinate to an index. I used a simple hash to assign a key to a coordinate: X + Y \times \mathrm{max\_value}, where \mathrm{max\_value} is the maximum value any coordinate can be (it is specified in the problem statement). This was used to know which nodes from the built tree are adjacent to each other.

Finally, the code can be a little bit overwhelming with multiple labels, one for the original index, one for the graph and weights, one for the rooted tree, but with careful implementation it is easy to code.

Concluding Remark

I’m certain that the coded solution is correct by the official test cases (maybe it fails a certain edge case not thought of before, but hopefully not). Obviously the analysis can contain some errors or inconsistencies, but in theory it is correct. I’m open to comments and correction of course.

A note about future posts – IOI Solutions

Hello. I haven’t posted any articles recently, but I hope to slowly change that. I have a couple of “big” articles half written, but I sense they will take some time to be finished. So instead I decided to start a series of posts that describe, analyze and solve problems from past IOIs, International Olympiad in Informatics.

The reason why I’m doing this is two fold. Firstly, IOI problems are usually met with interesting solutions and are always a nice way to learn new techniques and to train problem solving. Additionally, I feel like the IOI community (a community I think I am part of) has yet to have an organized way to display all problems and solutions from previous years. Some years provide analysis, others only a handful of notes and other not even that and even though there are some analysis spread out online I hope to make my contribution and at the same time learn some things.

I won’t be following any particular order, just what I feel like I want to solve. The first one is going to be posted in a short time.

Monte Carlo Methods – Part 1

Something that I’ve always found pretty interesting are Monte Carlo Methods. I admit that initially it was partially due to its awesome name, but as I delved deeper into its theory I was even more interested. So I’ll attempt to sum up some of the my recent findings.

First things first, what is a Monte Carlo Method? I’ve came across multiple definitions some more formal or restrictive than others, thus I’ll just use the one I found more accurate: a Monte Carlo Method is any method that uses stochastic simulation, in other words that uses an ensemble of random numbers to simulate something. As it is possible to conclude from this definition, a Monte Carlo Method is a very general thing and can be applied in several domains. Hence I’ll give  a few examples of popular uses of this method.

Approximate Pi

One simple example that is used mainly for illustration is approximation of areas. This can be used to estimate the value of Pi (note that is not, by far, the best method to estimate Pi). The actual method is as follows: consider a square of side r and a quarter circle inscribed on this square (thus the circle has radius r as well). The following image shows the setup:

Monte Carlo Pi Setup

Monte Carlo Pi Setup

Now imagine we’ll calculate the ration between the area of the quarter circle and the area of the square: \frac{\frac{\pi}{4} r^2}{r^2}, which obviously equals \frac{\pi}{4}. Thus, by calculating this ratio and multiplying it by 4 we get an approximate value of \pi.

To calculate that ratio we’ll consider the following experiment: we’ll choose random points on the image (this usually is compared with throwing darts to the picture). Now if we consider the points that fell inside the quarter circle to be an approximation for the area of the quarter circle (in the limit we’d have infinitely many points, but that is obviously impossible). Similarly, the points that fell on the whole image approximate the area of the square. Thus, we can approximate the ratio between the areas as: \frac{\# \textup{points inside the quarter circle}}{\# \textup{total points}}. Now it suffices to consider a number of points to select and randomly place them.

I implemented this in C++ and tried it out a couple of times (I used a radius of 1 to facilitate calculations). The command “./mcpi <n>” estimates pi with “n” selected points. I show my results bellow.  Also, I implemented it using GNU ocatve and generated the image shown above for n = 10,000.

$ time ./mcpi 100
Pi ~=        3.3200000
Actual Value:    3.1415927
Absolute Error:    0.1784073
Relative Error:    5.679%
./mcpi 100  0.00s user 0.00s system 0% cpu 0.002 total

$ time ./mcpi 10000
Pi ~=        3.1404000
Actual Value:    3.1415927
Absolute Error:    0.0011927
Relative Error:    0.038%
./mcpi 10000  0.00s user 0.00s system 0% cpu 0.004 total

$ time ./mcpi 1000000
Pi ~=        3.1374920
Actual Value:    3.1415927
Absolute Error:    0.0041007
Relative Error:    0.131%
./mcpi 1000000  0.07s user 0.00s system 98% cpu 0.069 total

$ time ./mcpi 100000000
Pi ~=        3.1417105
Actual Value:    3.1415927
Absolute Error:    0.0001179
Relative Error:    0.004%
./mcpi 100000000  6.03s user 0.01s system 99% cpu 6.049 total

Monte Carlo Pi Result

Monte Carlo Pi Result

The C++ source code can be found here: https://gist.github.com/gangsterveggies/6511104 (I didn’t release the GNU octave code, but I’ll do it upon request).

Note that even though we used random numbers to estimate the area, this could have been done by a linear distribution of points inside the square and it would work just fine. We used random numbers for two reasons: to illustrate a monte carlo method (we’ll do more interesting things in a while) and because the random distribution of numbers is linear enough to work well but more scalable (if we want more precision we just get more random numbers, in a exactly linear distribution it is not so easy).

Monte Carlo Integration

The last example was really just a way of showing what monte carlo is, so lets generalize it and do something more interesting.

First of all, numerical integration is an important operation that estimates the value of the integral of a function given two boundaries and a way of knowing the values of the function between those boundaries. It is useful because of many reasons, namely that the function may be only partially known, that the integral function doesn’t have a simple form or simply that the actual function is not defined in an “integrable way” (for example if the function is given by a piece of programming code) and finally if simply you just need to check your calculations.

There are several deterministic methods for estimating an integral. One idea to do so is to consider a finite number of rectangles blocks bellow the function (if the function has an higher dimension think of prisms or hyper-prisms, which is basically the idea of the Riemann Sums). Then we’ll estimate the area of those rectangles by taking, for example, the value of the function in the middle of the rectangle. This is called midpoint rule and is the simplest way (as far as I know) to estimate an integral. More interesting ideas consider trapezoids instead of rectangles to calculate the area of the “blocks”. Even more interesting ideas consider a polynomial function between two edges of “block” (the higher the polynomial degree the better in terms of accuracy). These are called interpolation methods. There are other methods, but they are more complex and out of the scope of this article.

The problem of all the methods I described in the previous paragraph is that their error depends on the dimensions of the domain, thus to get the same error for a higher dimension it implies exponentially more points. This means the methods don’t scale very well and for high dimension functions are unfeasible even though they work pretty well for lower dimensions. So we need a different method for estimating integrals and that’s where monte carlo enters the scene. The idea of monte carlo integration is not much different from what we did in the previous example, but the goal here is to select N random points of the domain of the integral and use them to evaluate the result. We’ll define M_N as the value of the monte carlo integral using N points. So the formula goes like this:

\int_\Omega {f(X) dX} \approx \frac{V}{N}\sum_{i=1}^N{f(X_i)} = M_N

Here V = \int_\Omega{dX} is the volume of the domain (for example, in the 1D case it is (b - a), if a and b are the boundaries). Each X_i is a random point that is included in \Omega.

We can easily see how this method is more scalable than the previous one, but the question is: how good is it then? To answer it we’ll try to estimate the error of this estimation. The standard error of a distribution is defined as the standard deviation of the distribution, thus it suffices to calculate it. First of all, it is important to note that: M_N = \frac{V}{N}\sum_{i=1}^N{f(X_i)} = V\left \langle f \right \rangle, where \left \langle f \right \rangle is the expected value (AKA, the mean). Now we’ll first consider the formula of the standard deviation: \sigma_f = \sqrt{\left \langle f^2 \right \rangle - \left \langle f \right \rangle^2} and expand from here:

\sigma_{M_N} = \sqrt{\left \langle {M_N}^2 \right \rangle - \left \langle M_N \right \rangle^2} = V \sqrt{\left \langle \left \langle f^2 \right \rangle \right \rangle - \left \langle \left \langle f \right \rangle ^2 \right \rangle}

Now we can use a formula that says: \sigma_{\left \langle f \right \rangle} = \frac{\sigma_f}{\sqrt{N}}, which we’ll prove using the variance (the variance, Var(f) is equal to the square root of the standard deviation: Var(f) = \sqrt{\sigma_f}). The proof assumes the basics properties of the variance:

Var(\left \langle f \right \rangle) = Var(\frac{1}{N} \sum_{i=1}^N{f(X)}) = \frac{1}{N^2} Var(\sum_{i=1}^N{f(X)}) = \frac{1}{N^2} \sum_{i=1}^N{Var(f(X))} = \frac{1}{N} Var(f(X))

So from here we can finalize our analysis:

V \sqrt{\left \langle \left \langle f^2 \right \rangle \right \rangle - \left \langle \left \langle f \right \rangle ^2 \right \rangle} = V \sqrt{\frac{\left \langle f^2 \right \rangle - \left \langle f \right \rangle ^2}{N}} = \frac{V \sigma_f}{\sqrt{N}}

And we conclude that the error is proportional to N^{-\frac{1}{2}}. This means that there is no dependency on the number of dimensions (except, obviously, on the volume variable, but it is not a “real” dependency on the number of dimensions). An informal explanation is that for the interpolation methods, we need a linear distribution of points in the domain and that means that we cannot just take any number of points (except in the 1D case) because they need to linearly fill the volume of the domain. Hence the dependency on the number of dimensions.

There are better ways of doing this, for example through a method called quasi monte carlo integration, that use distributions that are just random fluctuations of a linear distribution. Other methods adapt the number of random samples depending on the integral.

Now we get to the practical part. I implemented this method in C++ and did a few test runs. To improve the quality of each calculation, I repeated it a couple of times and averaged it to prevent random noise from ruining a result. I only tested it with 1D functions so the results aren’t as good as they could have been. The following shows my results using the function f(x) = e^x x. I printed the result obtained using the described Monte Carlo method and using a simple Midpoint Rule method along with their error percentages with relation to the real value:

./mcint
Input number of iterations: 10
Input number of samples: 100
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 572.239578, error: 2.392%
Integral (using the midpoint rule): 575.419, error 1.850%

./mcint
Input number of iterations: 10
Input number of samples: 1000
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 591.545209, error: 0.901%
Integral (using the midpoint rule): 585.173, error 0.186%

./mcint
Input number of iterations: 10
Input number of samples: 10000
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 585.938782, error: 0.055%
Integral (using the midpoint rule): 586.154, error 0.019%

./mcint
Input number of iterations: 10
Input number of samples: 100000
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 586.134884, error: 0.022%
Integral (using the midpoint rule): 586.253, error 0.002%

It is easy to see that the simpler Midpoint Rule got better results. As said above, it is due to the fact that for small dimensions, deterministic methods work better. For example, for the 1D case, the error of the Midpoint Rule is proportional to N^{-1} (for a general kD dimensions it would be: N^{-\frac{1}{d}}). To showcase the dependency of the Monte Carlo Integration error on N^{-\frac{1}{2}}, I implemented it in GNU octave and plotted the results. The blue line represents a function proportional to N^{-\frac{1}{2}} (namely 0.2 N^{-\frac{1}{2}}) and the red line represents the experimental results obtained:

stuff

Monte Carlo Integration Error Result

Even though the red line is slightly noisy, it is possible to notice an overall dependency on the blue line.

The C++ source code can be found here: https://gist.github.com/gangsterveggies/6511133 (Again, GNU Octave code on request only).

To wrap up the discussion about Monte Carlo Integration we’ll talk about a different way of calculating the integral non-deterministically. The idea is similar to the previous Pi estimation: suppose we have an volume that fully encapsulates our integral’s volume, if we sample a set of random points and check whether they are inside the integral’s volume or not, we can do the same as we did and use the fraction of points inside relative to the total number of points to calculate the integral. This is (V is the total volume):

\int_\Omega {f(X) dX} \approx V \frac{\#\text{points inside V}}{\#\text{point}}

It is possible to prove that similarly to the first method, the error of this method is proportional to N^{-\frac{1}{2}}, although I’ll leave this one to the reader.

I implemented this method as well and tested it out. My implementation was a bit crude since I calculated the outer volume by looping through values of the function to get a maximum value (I assumed the minimum was 0, but if I wanted this to work for all functions I would have to account for functions that span negative values obviously). The results I got weren’t too different from the previous method:

./mcint
Input number of iterations: 10
Input number of samples: 100
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 569.071836, error: 2.932%

./mcint
Input number of iterations: 10
Input number of samples: 1000
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 587.156768, error: 0.152%

./mcint
Input number of iterations: 10
Input number of samples: 10000
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 588.172188, error: 0.325%

./mcint
Input number of iterations: 10
Input number of samples: 100000
Input the boundaries (a, b): 2 5
Real Value: 586.264000
Integral: 586.951429, error: 0.117%

The source code for this one is here, but I don’t recommend it very much: https://gist.github.com/gangsterveggies/6511175

I’ll end part one here. On part two I pretend to look over more advanced uses, namely something called a Monte Carlo Search Tree, which is a method to calculate next moves in games (like Chess or Arimaa). Thus it is a powerful AI tool and it is widely used in games like Go where “traditional” method (Alpha-Beta Pruning) don’t work so well.

Isometric Engine

I have recently decided to take up a couple of challenges a bit different from what I usually do. So I coded an Isometric Tile Based Engine with C/C++ and SDL.

The source code can be found in my github page: https://github.com/gangsterveggies/isometric-engine

Some of the features it includes:

  • It’s own object class, so that you can add objects by simply calling a method called draw;
  • A render queue, which draws all the objects with depth, meaning that objects in the front obscure the ones in the back;
  • A height map, which allows you to define the heights of the tiles and draws the objects in the appropriate height;
  • A map file and parser that can load a file into the engine;
  • A camera that controls the x and y of the tiles (I’m working on zoom as well);

I haven’t documented it or anything, but I’m planning to use it perhaps on a small project and then possibly add more features and improve it. If it works well I might write an API or something. Meanwhile, if someone really wants to use it just comment here and I may try to help.

Here’s a couple of screenshots of the engine in action.

Iso Sreenshot

Iso Sreenshot

Playing with Game Theory

Today I have a treat for you: Game Theory. I have been playing with a couple of algorithms related with decision games, namely:

  • The Minimax Rule;
  • The Negamax search;
  • Alpha Beta Pruning;

And so I thought about writing an article about the three topics. I’ll try to shape it as a tutorial but with an intro, content and a conclusion/further reading. In the end I’ll present a case study that I did based on the popular game of Tic-Tac-Toe, that although simple, allows us to play with these algorithms without worrying about the actual game interface (or at least minimizes that pain). So let’s star by the beginning.

Introduction

Game Theory is a very broad field and has seen many developments over the years. According to the great mathematician John von Neumann Game Theory started when the Minimax Rule was proved. So it is easy to conclude that it is the base of all the field.

But before delving into these interesting algorithms and theorems I would like to define a couple of things. First of all, all the algorithms I will be describing are, as far as I know and as the scope of this article goes, formulated to work on Two-Player Games, that is, as the name describes, finite games with a finite set of possible moves from every possible game state to be played between two players, where there is a discrete outcome from the game (for example a integer). More specifically, I will talk about Zero Sum Two-Player Games, which are games where the total losses (or wins) from a player is totally balanced by the other player’s wins (or losses), that is, they add up to zero. Even more specifically will talk a lot about 0-1 games, that is, games where there are two possible outcomes: Win or Lose (and in some cases a Draw is also acceptable).

Minimax

In the field of zero-sum games the maximin rule or strategy states the following:

The best possible play for player 1 in a given state is the the one that minimizes the best possible play for player 2 from that state;

This means that we can conceive an algorithm that calculates the best possible play from a given state by looking for the worst best possible play from all of the possible subsequent moves. Of course this does not apply for terminal moves, that is, moves where the game ends, where their value is simply the outcome of the game in that particular state.

We can then set the following algorithm for game playing:

function minimax(State, Player)
  if (State is terminal):
    Return Result(State);
  if (Player):
    bestValue = -inf;
    for each possible play P from state:
      value = minimax(P);
      if (value > bestValue):
        bestValue = value;
    return bestValue
  else:
    bestValue = +inf;
    for each possible play P from state:
      value = minimax(P);
      if (value < bestValue):
        bestValue = value;
    return bestValue

In the above algorithm we replicate what the Minimax rule gives us in order to find the best play (Note that the “if (player)” means if the player to play is the one the algorithm is trying to maximize and “if !player)” is the opponent).

Before advancing we shall check an interesting result this rule allows.

If we apply the Minimax rule to 0-1 games, we can derive the following rules:

If a state is a terminal, then it’s value is the winner or loser according to the result of the state;

If it is not a terminal state, then it is a winner state if there is at least one losing state reachable from the current one and a loser state otherwise;

This means we can describe what is called a WL-Algorithm (Win-Lose Algorithm):

function win(State):
  if (State is terminal):
    Return Result(State);
  for each possible play P from State:
    if (!win(P)):
      Return Win;
  Return Lose;

The algorithm above searches all possible plays for a losing play, thus it explores the game space until there is a losing branch.

Negamax

The Negamax is a variation or perhaps an extension of the Minimax rules. It relies on the fact that the maximum of two objects is the negation of the minimum of the negatives of the objects. It states the following:

If from a given state, the minimum value of the best plays of the possible adjacent moves is P, then the value of the state is -P.

So we will describe a Negamax Algorithm as such:

function Negamax(State, Player):
  if (State is Terminal):
    Return Player * Result(State);
  bestResult = -inf
  for each possible play P from State:
    bestResult = max(bestResult, -Negamax(P), -Player);
  Return bestResult

The algorithm assumes that Player 1 (the one we are trying to maximize) has value +1 and the other  -1. Hence it can convert a score to negative or positive according to the respective player (in the “Player * Result(State)”). Another interesting thing is that we can change the player by negating its value.

Alpha-Beta Pruning

All the above algorithms are Complete Search algorithms. This means they’ll search the full space (at least the one applicable to the current state) trying every possible play to find the best one. The Alpha-Beta Prune is a clever Branch and Bound algorithm that prunes branches of the game tree that are guaranteed to be worse than the current best.

It works by initially defining a value \alpha and \beta, which are respectively: the best value that the maximizing player is guaranteed to have and the minimum value the minimizing player is guaranteed to have. Through the Minimax rule we can see that if at any moment \alpha \leq \beta we can prune out the rest of the branches. The reason is that when \alpha \leq \beta it means there  is no better play than the best already found, which incurs from the Minimax principle, thus it is not necessary to keep looking. So, the art here is to search for the plays that will possibly rise \beta or lower \alpha, forcing early prunes to that branch. Thus, we can define some heuristics like plays that are likely to be better (such as in Chess take a piece or in Tic-Tac-Toe a move that will put two symbols in a line).

We can now have the following algorithm:

function AlphaBeta(State, Alpha, Beta, Player):
  if (State is terminal):
    Return Result(State);
  if (Player):
    for each possible play P from State:
      Alpha = max(Alpha, AlphaBeta(P, Alpha, Beta, !(Player));
      if (Beta >= Alpha):
        Return Alpha
    Return Alpha
  else:
    for each possible play P from State:
      Beta = min(Beta, AlphaBeta(P, Alpha, Beta, !(Player));
      if (Beta >= Alpha):
        Return Beta
    Return Beta

We can see that this algorithm strongly  resembles the original Minimax Algorithm, although it now includes the Alpha and Beta prunes. The prunes are called Cut-Offs (Alpha Cut-Off and Beta Cut-Off).

Even though the previous algorithm was based on the Minimax one, we can simplify it by doing the same to the Negamax as below:

function AlphaBeta(State, Alpha, Beta, Player):
  if (State is terminal):
    Return Player * Result(State);
  for each possible play P from State:
    Alpha = max(Alpha, -AlphaBeta(P, -Beta, -Alpha, -(Player)));
  if (Beta >= Alpha):
    Return Alpha;
  Return Alpha;

Notice the inversion on the Alpha to -Beta and Beta to -Alpha. Again this comes from the Negamax rule.

As a final remark you can see that the WL-Algorithm described earlier is just a sort of application of this idea (but we prune when we find a lose/win because there is no better result).

Conclusion

All these algorithms can be greatly improved using Dynamic Programming since there are a lot of repeating subgames that can be stored and then simply looked up later.

There much more algorithms related to these ones. A very interesting one is the Aspiration Search, also known as Scout Search, that sets up a window of search for the Alpha-Beta Pruning hoping (“aspiring”) to find the result inside that window. If there are considerable cases that follow that the algorithm can be much more efficient (like 10% more), but if not it is more inefficient since it needs to adapt the window and research. A variation of this algorithm is the NegaScout, which as the name indicates is adapted to the Negamax. Other algorithms include the MTD-f, the SSS* and the PVS (Principal Variation Search), which similar reasonings (even though the SSS* is slightly different).

There is a lot more to Game Theory. The algorithms I presented are Branch and Bound algorithms and highly based in Complete Search, Pruning and Heuristics, but there are certain specific games that allow very neat efficient solutions using the game of Nim and the Sprague-Grundy Theorem (using Grundy Numbers).

Implementation in C++

I used C++ to implement a console based game of Tic-Tac-Toe. It asks for the first player to play and asks if you want to clear the screen after each play. This only works in a Linux like environment (because I used a system call to a Linux command “clear”), if you want to use it in other environments delete that line or change it to “cls” (for Windows only).

The implementation is of the Negamax and Negamax with Alpha-Beta-Pruning. The code might require some correction, but I have tested it a few times and it worked.

Notice that in this particular case I could have used a simple WL-Algorithm, but I purposely did not to try out the other algorithms.

Here is the link to the file: https://gist.github.com/4156450

(I know that the tic-tac-toe is perhaps too simply to use this, but I wanted to save time and try it out. I plan to build a player of hex in the future [https://en.wikipedia.org/wiki/Hex_%28board_game%29] with a Negamax Alpha-Beta Prune algorithm and a WL-Algorithm, but I currently have no time)

References

I did not use a lot of websites, mainly Wikipedia to search for ideas from the pseudocodes, but here is a list of interesting sites:

http://chessprogramming.wikispaces.com/Alpha-Beta ;

http://chess.verhelst.org/1997/03/10/search/ ;

http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=algorithmGames ;

– And of course Wikipedia for all the algorithms I talked about;

The Art Gallery Problem

Recently  I’ve come across an interesting and fun problem called the Art Gallery Problem (or The Museum Guard Problem). I plan to talk a little bit about the problem, introduce some useful notation and lemmas and finally present an approximate algorithm that solves the problem.

The problem  has many statements since it fits a couple of different scenarios, but I’ll adopt my favorite one (and one of the most useful I guess):

Given a simple polygon shaped room, how many guards would be necessary to guard the whole room?

Before attempting to solve the problem let’s define some notations. A simple polygon is a collection of ordered points (called vertices) that are connected contiguously by non intersecting segment lines. A triangulation of a polygon with n vertices is a set of non-intersecting diagonals and edges of the polygon which partitions the polygon into n - 2 interior disjoint triangles. Let’s prove this claim:

Lemma 1: Every polygon with n vertices can be triangulated and each triangulation has n - 2 triangles.

Proof: We’ll first prove existence and then that the number of triangles is n - 2. We’ll proceed by induction: if the polygon is a triangle the proof is trivial, if it’s not, pick a diagonal and divide the polygon in 2. These 2 new polygons will have at most n - 1 vertices. Inductively do this until there are only triangles.

To prove the second part we will also use induction. Let’s call T(P) the number of triangles that the triangulation of polygon P has and N(P) the number of vertices of polygon P. Let’s call the initial polygon of P_{0} and the same as above calling the two new polygons P_{1} and P_{2}. We know that the triangulation of P_{0} is the conjunction of P_{1} and P_{2}, so it is obvious that: T(P_{0})=T(P_{1}) + T(P_{2}) which by induction is:  T(P_{0})=N(P_{1}) - 2 + N(P_{2}) - 2. Since P_{1} and P{2} only share a diagonal from P_{0} it follows that: N(P_{1}) + N(P_{2})=N(P_{0}) + 2, substituting in the above formula gives: T(P_{0})=N(P_{1}) - 2 + N(P_{2}) - 2=N(P_{0}) - 2.

This finalizes our proof. ■

There is a linear time algorithm to triangulate any polygon by Chazelle which is outside the scope of this article (I may expand on this topic in a further article).

A graph G(V, E) is a network modeled as a mathematical object composed of a set of nodes V and a set of edges E, which are the connections between nodes. Two nodes v and u are said adjacent if there is a pair \{v, u\} \in E. A graph can either be directed or undirected depending, as the name indicates, if an edge has or not direction. The chromatic number of a graph G, written as \gamma(G), is the minimum number of colors needed to color a graph. A k-coloring of a graph is an assignment of one of k colors to each node of the graph in such a way that no two adjacent nodes share the same color (interesting result that I’m not going to prove, but had a slight ideia before reading it: the chromatic number of a graph is equal to the clique number of the graph, cool hum?).

We’ll define the Triangulation Graph of P (the \tau(P)) as the graph G(V, E) where latex V$ are the vertices of the polygon and E are all the diagonals and segments used on the triangulation (used as connections between vertices). We shall now prove that the chromatic number of any triangulation graph is 3.

Lemma 2: \gamma(\tau(P))=3, \forall P \in \mathbb{R}^{2}.

Proof: We’ll proceed by induction on the triangles in \tau(P). As a base case consider any triangle in \tau(P). It is trivial to show that to color a triangle we need three colors, one for each vertex. Now we inductively build the polygon P, meaning we’ll consider a subset of \tau(P) that is also a polygon and consider that the claim holds for this subset. We can continue building P by adding a vertex which does not belong to the considered subset of \tau(P) but is adjacent to some vertex in the subset of \tau(P). It is obvious that it is adjacent to two vertices of the subset of \tau(P), because if it weren’t it would either form a square, an isolated vertex or an intersection of diagonals. We can then assign to this new vertex the color different from its neighbors. If we continue constructing the polygon we’ll eventually get the original one, which proves the claim. ■

Finally we defined what it means to be guarded by some set of vertices. We say that some set of  vertices v guards a polygon P if all point are visible from at least one v_{i} \in v. two vertices are visible from each other if the line connecting them is inside the polygon P.

It also important to note that besides different statements, this problem also has different versions such as:

  • Minimum Vertex Guard Problem, which is the problem of placing guard only on the vertices of the room polygon;
  • Minimum Point Guard Problem, which is the problem of placing guards anywhere inside the room polygon;
  • Minimum Edge Guard Problem, which is the problem of placing guards only  on the edges of the room polygon;
  • Minimum Half-Guard Problem, which is the problem of placing guards with 180º fixed field of vision anywhere inside the room polygon;

In my analysis of the problem I shall focus on the first version: The Minimum Vertex Guard Problem.

This problem is quite complex and solving it for the exact answer isn’t obvious in most cases. In fact, it has been proved by D. T. Lee and A.K.Lin that the minimum vertex guard problem is NP-Complete. But we can try to approximate an answer. We start by drawing an upper bound for the problem. Chvátal gave one that is called the “Chvátal’s Art Gallery Theorem”. Its proof is somewhat complex so I’ll present a simpler version by S.Fisk. It states the Following:

Chvátal’s Art Gallery Theorem: To guard a room with n vertices it is sufficient (and sometimes necessary) \left \lfloor \frac{n}{3} \right \rfloor stationary guards.

Proof: Let P be the n sized Polygon that represents the room and T=\tau(P) its due Triangulation graph. From lemma 2 we derive that \gamma(T)=3. We can now partition all vertices from P into three classes, each one with a different color from the 3-coloring of T. We’ll call these classes C_{1}, C_{2} and C_{3}. We’ll prove that the set of all vertices belonging to one of the above classes effectively guards P.

It is obvious that the minimum degree of each of the nodes in T is 2, because if weren’t we’d have an isolated line or point in P. Thus it is clear that all vertices belonging to one of the classes are adjacent to at least a vertex from one of each of the other two classes. This means that if we pick any vertex from one of the classes it we’ll be visible to all the vertices of a triangle of T. They are obviously disjoint because if they weren’t we’d have to have a triangle with two vertices of the same color, which is impossible. Therefore the vertices in one of the classes are visible to all the vertices of T.

We finally prove that since each triangle is visible by at least one vertex in each one of the three classes.The proof is rather simple since by definition a convex polygon is a polygon where the lines connecting any two points inside the polygon are inside the polygon. Since a triangle is a polygon this proves it.

We can now conclude that the claim is true since by the pigeonhole principle one of the three classes has at most \left \lfloor \frac{n}{3} \right \rfloor elements. ■

We can now conceive a simple algorithm to solve this problem:

Input:\;A\;Polygon\;\textbf{P}

Function\;VerterxGuard(P)

01: Partition\;\textbf{P} into\;\textbf{T}

02: Color \;\textbf{T}

03: Print\;the\;minimum\;Class

To show the runtime we have to analyze each line of the algorithm. The first one we know that we can do it in O(n) with Chazelle’s algorithm (actually, this algorithm is a bit complicated to implement in practice, but we can always use the O(nlg(n)) by Garey, Johnson, Preparata and Tarjan). The second line can also be done in O(n) with a very neat algorithm. It can be shown that a triangulation has at least a vertex node with degree (number of adjacent vertices) of 2. If we color this one with a color and the other two with other two colors we get a fully colored triangle and another one with two colored vertices (the ones adjacent to the vertex with degree 2). We can then color this one with the third color and again we get a situation similar to the previous. If we continue doing this we’ll color the whole graph. The third line is trivially done in O(n).

I’ll end the article here saying this is a very interesting problem prone to much discussion. Personally, I’m analyzing a different problem: The Minimum Point Guard Problem. I find this problem rather interesting as well and yet again it is very open to discussion. I’ll further update this topic in a recent future (hopefully). I’m sorry for any possible mistakes and I’m open to criticism and ideas.

References:

http://correo.matem.unam.mx/~urrutia/ArtBook.html/Completo.pdf

http://www.math.iit.edu/~kaul/talks/LongArtGalleryTalk.pdf

News of the week

So today I’d like to write something because quite honestly I’m pretty pissed with two things. Let’s star by the worse:

So apparently LucasFilms was sold by some billions bucks to Disney, which by itself is bad. Then I read: “blah blah… plan to film Star Wars 7 until 2015… blah”… I’m speechless. I don’t know what’s worse, the fact the best series of movies ever made is going to be ruined, or because it is going to be ruined by Disney!! Disney for god’s sake! Well, I understand George Lucas argument saying he wanted Star Wars to outlive him and let someone else continue the great saga. On the other hand, I can’t understand why he’d think this is the time for it or that Disney will carry it out… And on top of all of that, news seem to assume that it is a good thing and that Star Wars fans will want a Star Wars VII made by Disney… I think I need think this thing through…

The second thing that is annoying me is the NHL. Basically the NHL season was cancel through November 30 because of “the absence of a Collective Bargaining Agreement between the NHL Players’ Association and the NHL”. My instincts say there will be no season this year if this goes on, but I really don’t want to believe in that. I don’t get why do they need to complicate over these petty things and end up screwing something millions of people enjoy watching…  Sigh…

I’m Back

Yes, I’ve been out for quite a long time, but I’m back!

The thing is, I’ve been training and working for tons of stuff including two International Olympiads (IOI and IPhO) which was very time consuming. I hope to restart the blog by writing some articles about Algorithms and Problem Solving and of course about cool stuff that I’ll eventually do.

Stick around and Yay!