Antics with semantics: May 2016

Tuesday, May 17, 2016

Topcoder problem analysis - ChessCover (Inv 2001 Semi A+B 1000 pointer)

Problem statement
https://community.topcoder.com/stat?c=problem_statement&pm=202&rd=53

This turned out simpler than the 500 pointer, especially since I had the 2d iterator class. Piece of cake - didn't take more than 30 minutes to code.

Solution

The algorithm is really simple -

We scan the chess board for pieces
For each piece except knights and pawns we call attack(), passing an array of directions
For knights and pawns we call attackOne()
The attack() function iterates in each of the passed directions starting at that piece and marks the empty squares unsafe. It stops if any piece is encountered
The attackOne() function does the same, but only one step instead of all the way

The fact that we had an iterator means, we don't have to care about the bounds of the board, or do anything differently for any direction.

Note the use of std::initializer_list to allow us to pass literal array constants to a function and treat them as a container (with zero runtime overhead)

This was one of the most satisfying bits of code I wrote so far in this series.

Topcoder problem analysis - Tothello (Inv 2001 Semi A+B 500 pointer)

Problem statement
https://community.topcoder.com/stat?c=problem_statement&pm=203&rd=53

This is quite straightforward - you have to scan every row, column and diagonal range on the board and detect if you can surround a contiguous line of your opponent's pieces on both sides by placing your next piece and capture them. Each set of captured pieces becomes your piece and they can cause further cascading captures until there are no more capture situations on the board

You have to find the move that captures the maximum number of pieces.

The obvious way to do it (as many solutions do) is to scan the board using for() loops - To me this seemed a terribly ugly way of doing this.

We're trying to find a pattern like RBBBBBBR along any straight line of the board. If it were only in the horizontal direction we can do this cleanly with just two calls to adjacent_find().
However we need to handle eight different directions not just the horizontal ones.

So I decided that a 2d iterator would be interesting to write.

2D iterator

Usually, iterators are a subtype of a container, but here we're not going to do that, since we can't extend vector and so on.
Instead we create a class that holds a reference/pointer to its container - in our case a vector<vector<T>>

STL iterators have a lot of properties and are written in a consistent way, since they all work on linear one dimensional sequences. They move in two directions at most.

Our requirements are different:

We want the iterator to be able to move in any of the eight directions
We want to have the bare minimum of operations implemented and a couple of extras

The basic operations we need:

Ability to set a direction
Copy constructible
*, >, ==, != and pre-increment operators (post increment is not useful enough)

Extras:

bool operator to check if an iterator is not at the end.
A function to get a successor to an iterator without mutating it.
a distance() function - I could not get std::distance() to work right with my iterator. Needs some studying

Full implementation

The code is not brilliant and some corners are cut, but it does the job.

Some points to note:

Coordinates and increments are simple pair<int, int>
We predefine the eight directions
We have to store a pointer to the container we refer to, so we can refer to its contents.
The increment operator clamps to the first out of bounds co-ordinate
Whatever extra boilerplate the STL needs is borrowed from std::vector
Our end iterator is created from another iterator, not from the container as is usually done, since our iterator has a direction

The way its used is - you construct an iterator passing in a vector of vectors, a coordinate, and a direction. You construct an end of range iterator in turn from that. Now you can treat the two iterators like any other ones you have seen.

Once we have this 2d iterator class, the original problem becomes almost trivial to solve

Full implementation

The algorithm is as follows:

We place a piece tentatively on the board on an empty position
We look for any captures - on every column, on every row, every primary and secondary diagonal
If anything was captured, repeat the above step because we may have cascading captures
If we had no captures, count how many total pieces we have after
Repeat this entire process for every empty position on the board and see what is the maximum count we can get.

The main work is done in apply() and applyOne()

apply() calls applyOne() eight times, each time passing a co-ordinate and two directions - The two directions are the primary and secondary axis to scan - For example we'd scan each column by starting at each point on the first row and going downwards.

applyOne() is extremely simple since we can use the STL

We need to find a pattern like 100000001 on the board along the secondary axis starting at each point on the primary axis (our representation uses 1 for our piece and 0 for the opponents)
Call adjacent_find to see if there is a 1,0
Call adjacent_find on that result to find an 0,1 - the fun part is that if the previous call had not succeeded, it would have returned the end() iterator, so this call will do nothing. We don't have to do any messy and error prone special casing.
Now we have two iterators one that points to an 1,0 and another to a 0,1
If both are valid, we count how many 0s exist between the two surrounding 1s
If that count is equal to the distance between the two iterators, everything between the 1s was a 0 and thus we have a capture
If we have a capture, just fill that captured range with 1

It looks like a lot of work to write an iterator just for this problem, but we made our code simple and error free - just imagine the nightmare of writing several loops, each which has multiple conditions. Maybe some people are so smart they can juggle all that in their head. Personally, I'd let the compiler handle it.
Some of the solutions in the practice room are horrendously verbose. A typo would be fatal for such code.

It turns out that the 1000 point problem in this room is a similar problem involving a 2d game board. Having this iterator code made that problem trivial to solve - in fact it turned out even simpler than this one!

Use the abstraction Luke!

Monday, May 16, 2016

Topcoder problem analysis - RuleEngine (CC 2002 REG Semi 1000 pointer)

Problem statement
https://community.topcoder.com/stat?c=problem_statement&pm=352&rd=62

I have no idea why this was chosen as a 1000 pointer - the problem is really simple looking. So simple in fact that many coders missed a couple of obvious details, and I spent a while successfully challenging several solutions in the practice room...

The cardinal mistakes people made:

Forget to use a 64 bit integer for computations - 19 to the 14th power is a huge number.
Someone actually implemented a simple BigInteger class in C++, without realizing they could use a long long
Write a nested for loop that checked all possible numbers for each rule! Probably not going to terminate in the lifetime of the universe!

Another thing I noticed is the repetitivity of the code in many solutions. If you have two blocks of code that differ in only a few places, it begs to be converted into a function, or often a macro.

Solution

The correct way to solve the problem is:

Go through each rule, and count how many numbers fulfill that rule
Multiply all the above counts together

Note the usage of the HANDLE_OP() macro to handle repetitive code - Macros allow you to de-duplicate code in ways that you just cannot by any other means. Here we used the fact that the literal symbol "!=" and so on is the actual C++ operator for that rule. Had we used anything but a macro, we'd have had to write a tedious case statement or similar logic, mapping strings to relational operators.

Note the appending of ",0" to the rules, to make sure the parsing of all rules is uniform. You can ensure that a condition is always met in order to eliminate a conditional statement - this is similar in principle to sentinels, a much underrated tool in your programming toolbox.

Topcoder problem analysis - Birds (CC 2002 REG Semi 500 pointer)

Problem statement

https://community.topcoder.com/stat?c=problem_statement&pm=330&rd=62

This is not a very hard problem once you figure out the right algorithm. I spent a lot of time chasing a solution after having misread the problem.
I assumed that a bird is migratory if it spends its time in two regions for 90 days or more each in any 365 day period, rather than just the specified start date to December 31st.

Lesson #1 : Read the problem statement carefully!

Subtask - Find the number of days between two dates in a year

The naive way to do this is to calculate it by summing up the months upto the current and adding the date.
Instead of doing this in a loop every time, we can precalculate this with partial_sum():

The algorithm itself

My solution was as follows:

Put the locations in an array and sort by date.
Find all pairs of locations the bird stays which are at least 1000 KM distant
For every pair of locations say A and B, walk the array starting from A;
Keep walking the array as long as every location is within 1000 KM of A and more than 1000 KM from B. Sum up the time spent along this sequence.
The entries that were walked are all in the same region and we have the total time spent in that region.
Repeat the same process vice versa starting from B
Now we have the total time spent in both the regions, if both are > 90 days were have a migration!

The third step finds the longest time spent in any one region - there is a lot of redundant computation - For e.g. if we have ABCDEFG as locations, and G happens to be 1000 Km away from the rest, while all the others are close to each other, we will be walking from A to G, B to G, C to G and so on - but we don't care, since all that matters is that the time spent in some subset of [A to F] is 90 days and the stay in G itself is 90 days.

There is only one tricky detail - time spent at any location is given by t[d + 1] - t[d] - but obviously the last element of the locations array doesn't have a successor.
It's implicitly assumed that if a bird no longer moves, it stays where it was until December 31st.
So we put in a January 1st (next year or day 366) sentinel at the end of the locations array, so we have a way for the loop to be within bounds.

Perversely, the problem statement explicitly says the same is not the case for the beginning of the year - in other words, if a bird migrated on March 1st, you don't assume it lived at that location before that. I assumed the opposite the first time I read the problem and didn't figure it out until one system test case failed!

Final code

The code seems quite verbose, but the essence of the algorithm is only about 10 lines.
Note the simplicity of using FOR_EVERY_UNIQUE_PAIR()
Also notice the use of stoi() - C++11 onwards has string to numeric conversion functions and vice versa. The only alternative before that was istringstream or -- Dog forbid! -- the sprintf() family

C++ as it is meant to be used

There are two things that are the biggest blessing as well as the biggest curse for C++

A) It is (almost perfectly) backward compatible with C

Perhaps nothing else ensured its success, but it also means that people tend to clump together the two as C/C++ and for most C++ is learned after prior exposure to C. This only leads to folks using C++ as a better C, bringing in all the bad habits :

Globals - In C it's impossible to avoid them, but they can and should be avoided in C++ ( Why? )
Manual memory management - For every flower there's a bee, For every fruit there's a tree, but unfortunately for every malloc() there's not always a free(). The paranoia of the bugs caused by this led the designers of Java to opt for garbage collection instead (Of course they forgot that memory is not the only resource that needs to be managed!!). In C++ we have tools for the job - Deterministic finalization means we can use RAII classes or simply use auto pointers to ensure matched allocation and deallocation. In fact, most of the time, you can simply use a std::string as a buffer. Google does it in their protocol buffer code - If it's good enough for google it's good enough for me. In the past few years I can't recall having used new or delete[] for buffers (or even for objects for that matter)
Using the C functions - using strcpy(), printf(), scanf(), and their brethren is a surefire way to induce vulnerabilities and bugs. C++ gives you a nice string, stream I/O and if you use boost, you can use type safe code like the lexical_cast and the format library.
It is this C legacy which leads to insecure and fragile code.

B) It is labeled "Object Oriented"

Destructors, Encapsulation, Operator overloading and Pure abstract classes are quite useful features - Especially destructors, without with RAII would be impossible.

However, I'm not much of a fan of the OO bandwagon. OO has it's conveniences and good points, but in emphasizing it above all else, there is too much hype. It really does not give you much of a new way of looking at problem solving unless you opt for pure message passing style OO like Smalltalk
Most of day to day OO usage could be very well done in C - Think of the stdio.h functions or most of the Win32 API that take an opaque FILE* or HANDLE respectively. I tend to agree with this gentleman's opinion on the matter.

What does good C++ look like?

In my opinion, minimum reliance on code that you have written, and maximum use of well written libraries ( I refer to the STL and boost ). Even if it makes the code only readable by someone more experienced - No one said C++ is an easy language.

Some rules of thumb that I tend to follow

Use containers instead of C arrays
Use string rather than char* buffers
Avoid new and malloc()
Avoid explicit loops over data when possible, use copy(), transform() etc.
Avoid pointers as much as possible
Avoid functions that take varargs ... arguments
Once again - do not reinvent the wheel, unless you need a square one, it's been done before, there's always a well written library out there with the functionality you need.

Topcoder problems - Tokenizing

One of the most common tasks in topcoder problems is tokenizing a string into individual elements. Almost always the entries are separated by spaces.

The naive and obvious version

Seen surprisingly often in various avatars, it is a C mode of thinking goes something like this:

Very verbose and tedious!
Of course on topcoder they usually write tersely equivalent code on a single line #defined as a macro or an inline function.

Can we do better?

Here is a more C++ like version

Still smells like C

How about

Think in functional terms, get the loop mentality out of the way - The STL is your friend :

Nice and neat!

Can this be a one-liner?

Let's try putting everything together...

Unfortunately this is not correct. The temporary istringstream dies after the istream iterator is constructed, so bad things will happen. The constructor of the istream_iterator takes a istream& rather than a const istream& otherwise it would be workable - See this
Actually should have been obvious...
You cannot read stuff out of a const reference to a stream since it changes state when you do.

What about other delimiters?

C++ streams are tied to locales, so they read stuff based on the definition of isspace() for the current locale. There seems to be no way to manufacture our own locales for splitting strings on things other than whitespace, so we need a different approach:

Neat huh?
If that looks like Greek and Latin to you, don't worry, it looks that way to me too, three days after writing it.

Let's analyze what the heck is going on here...

The intent is as follows - we want to maintain two pointers/iterators - one points to the start of a word, and the other points to a separator (or end of string)

The for loop starts by initializing ps(pointer to start), pe(pointer to end-of-string).
Since we use auto we have to assign something to pd(pointer to delimiter), so might as well assign ps to it. We are going to loop until ps reaches pe.

Next we place the next available token into the result vector - emplace_back() is a function that can push an element onto a vector without copying the value passed into it - it uses move semantics to achieve this. Whenever you push a temporary object onto a vector, use emplace_back()
Remember never to reuse the emplaced object again, or else...

OK, but what the hell is "the next available token" mentioned above?
It is a string consisting of an iterator range - the start of the range is obviously ps. The end of the range is the next delimiter in the string, or the end of the string if no such delimiter exists.
Well that's exactly what find_first_of() does - returns an iterator in a range that has any one of the values you give it (here the characters in the string seps). If not found, it returns the end of the range, which is what we want.

So all we need to do is set ps to the point after the delimiter, so we can grab the next token - this is done in the increment part of the for loop. If it goes past the end, were done. Assigning pd on the fly inside the expression is a neat way to avoid one extra line of code. Remember that everything in C/C++ except control statements return a value (In fact and BTW, in GNU C, there is an extension that lets every statement be a value).

Not too bad at all! I can't think of a more neat version than this. I have some vague idea about using adjacent_find(), any_of() and so on, but I suspect it won't be as neat as this.

One thing you might notice is that I use pointers instead of iterators. The point is that pointers are the best kind of iterator, and if you know that the underlying object is contiguous, one might as well use a pointer than an iterator (the compiler will actually generate almost the same code for an iterator in this case). There is nothing evil about pointers, as long as you use them sparingly in a type safe way.

Here's how it looks in the brief form using the macros from my topcoder template:

Try writing a similar split in your favorite language for kicks!

Topcoder problem analysis - A new hope

Intro

Off late, I've started practising topcoder problems to get better at problem solving. I'm in no state to compete in SRMs right now, but I'll try to practice as often as possible, till I feel up to it.
I'd never ever solved a 1000 pointer before, in any amount of time. But I managed to solve a couple, albeit taking a lot of time.
My intent was not to just get code to work, but write good idiomatic C++, with some amount of terseness.

Setup

The first thing I noticed is that the topcoder beta arena based on HTML really sucks. It doesn't let you browse problems by SRM and as far as I could tell, there is no way to see other peoples solutions. The good old Java applet still works, and I guess it will continue to work because it remains in production, so thank goodness for that.

The first thing to do is to configure a good editor plugin - I used to use FileEdit and TZTester before, but Greed seems like the best thing to use right now. It also downloads the problem descriptions and is highly configurable. So far I haven't been able to get it to generate automated test code, but that's probably because the problems I opened are from very early matches, where the test cases are not specified in a standard form which the plugin can parse.

The next thing is what template to use - as usual, a bunch of macros.
This is what I have right now:

Just a bunch of macros borrowed from many topcoder contestants. all() is probably the most useful, followed by FOR(), and P() is invaluable while debugging. Most of them just reduce verbosity and avoid typos. One has to be careful to use macro names that are not identifiers, otherwise you can get some horrendous errors.

Modus operandi

My goal is not to try to write code as fast as possible. It is to try to write the best (by my taste) C++ for each problem. This means that I will try my best to refine a solution once I get it to work, or try to write it in a refined STL centric manner from the get go.

In future posts I will try to dissect each problem, algorithmically and code wise, and try to show how best to leverage C++.

Ideally I would like to see myself writing code in the very first iteration, that looks like the last iteration I produce now, after much refinement.

In a contest situation, some of the code I write is not ideal, since in a contest, cleanliness of code is irrelevant, as is code duplication. If anything some contestants write code in a purposely obscure manner to thwart others from grokking the code.

For practice purposes, I'd rather write elegant code. Some terseness and one-lining is nice though as code golf.
I do not adhere strictly to my variable naming convention, since it leads to come verbosity. I also break some whitespace rules to make the code look compact.
A supplementary library of code is getting accumulated as I do these problems, since some things are patterns that repeat.
I will discuss those bits of code in future posts.