Misapplied Math

CS

# Testing Collections With Guava-Testlib and JUnit 4

## Testing Custom Java Collections

Over the years I’ve built up a suite of unit tests for various types of java collections. When I started doing so, I didn’t know of any good off-the-shelf means of generic collections testing. The JDK itself has some unit/regression tests, but not as many as you would think, and that aside they rely on a custom build system/testing framework.

I’ve used google-guava in a few projects but recently discovered their not so well advertised test support library: guava-testlib. I tried it out out on my most recent project (a custom random access list with $O(1)$ deque operations) and I’m very impressed with the test coverage and flexibility. The documentation is sparse but once you get it working you’ll have a test suite with hundreds to thousands of unit tests generated for your list, map, or set. Guava-testlib also generates tests for the non-standard collections in google-guava.

The library relies on JUnit, and most of what I found online detailing its use pertains to JUnit 3. There was a substantial API break between JUnit 3 and JUnit 4, as JUnit 3 relies on naming conventions and inheritance whereas JUnit 4 uses less intrusive annotations. I struggled for a bit trying to figure out how to get things working without resorting to the old JUnit 3 APIs. I arrived at something that works, but it’s still a little dirty as it indirectly relies on a magic static method named suite() from the JUnit 3 days.

CS

# Twelve Days 2013: The Bellman-Ford Algorithm

## Day Twelve: The Bellman-Ford Algorithm

### TL/DR

If you need to find the shortest path for a graph with negative weights, the Bellman-Ford algorithm can do so. Dijkstra’s algorithm has better time complexity, but Bellman-Ford can handle graphs with negative weights, and even be used to detect negative cycles.

## Detecting Negative Cycles

Today’s post is short and sweet – I might add code later but the algorithm is quite simple. For graphs with negative weights, Dijkstra’s algorithm fails. Negative weights introduce the potential for negative cycles, in which case a shortest path might not exist as a shorter path could always be constructed by transversing the cycle once again. That breaks Dijkstra. The Bellman-Ford algorithm can detect this when it happens. Unfortunately, the generality comes at the cost of time complexity. Dijkstra’s algorithm runs in $O(|E| + |V|\log(|E|))$. Bellman-Ford takes $O(|E| \cdot |V|))$, where $V$ and $E$ are the sizes of the vertex and edge sets, respectively. There are modifications to Bellman-Ford with slightly better time complexity, but Bellman-Ford is quite simple to implement.

## Finding Negative Cycles

Once you know that there is a negative cycle, you still need to find it. There are fast algorithms to do so, but depth-first search (DFS) is arguably the simplest route. You can run DFS on a subgraph that’s produced as a bi product running Bellman-Ford (the shortest path edge set that’s generated along the way) for slightly better time complexity. There’s a break even point between just running depth-first search on the full graph or starting off with Bellman-Ford to check if there’s a negative cycle in the first place. The run-time complexity of DFS is exponential for graphs with a branching factor greater than one so it’s not cheap. For really small graphs you’re almost certainly better off with a vectorized brute force search.

## Why do we Care?

Finding a path via DFS, or the shortest path via Bellman-Ford/Dijkstra’s algorithm/A-star has lots of applications in optimization and logistics. Here’s one nice example. Graphs with negative cycles are a special case of the shortest path problem, and once upon a time, currencies were inefficient enough that triangular arbitrage persisted for some time. These days, it’s a purely high frequency game.

CS

# Twelve Days 2013: Shunting-Yard Algorithm

## Day Eleven: The Shunting-Yard Algorithm

### TL/DR

Parsing is hard. Parser generators such as ANTLR make it significantly easier by providing a tool to express complex grammars via Extended Backus–Naur Form (a grammar for defining grammars) and generate parsers automatically. However, that’s a pretty heavy weight solution, especially if you just need to do something quick and dirty like implement an expression evaluator. There’s a significantly easier means of parsing simple grammars that describe things like mathematical equations called the Shunting-yard algorithm.

The Shunting-yard algorithm can be extended to handle more complex tasks, but at its core the algorithm parses an expression written in infix notation and applies the rules of operator precedence with optional parentheses to rewrite the original expression as something unambiguous. You can use it to rewrite the parsed expression in Reverse Polish Notation, or to produce an abstract syntax tree (AST). There are lots of examples of the former but I haven’t seen to many of the latter. The code below implements both.

## Parsing 101

The simplest end of the parsing spectrum involves something along the lines of taking a CSV file and converting it into a stream of usable values. On the most complex end you have parsers that can read a C++ source file and create a usable representation of it for the compiler. The former definitely doesn’t merit anything complicated, the latter requires some of the most powerful tools for language recognition that we have. For “heavy duty parsing” the process usually looks something like: formal grammar -> lexer -> parser -> abstract syntax tree. If you’re writing a compiler or something similar, the abstract syntax tree gets used as a unique, structurally validated, and unambiguous input to whatever happens next.

There are very powerful parsers such as LL(*) that are almost impossible to write by hand for any sizable grammar. Thankfully, tools such as ANTLR will write them for us, given a formal description of the language. However this process is fairly heavy weight, and if you’re doing something simple like implementing a simple scripting language or an equation evaluator you might be able to get away without it.

## Operators, Precedence, and Associativity

Most of us are used to looking at mathematical equations in infix notation: $3 + 4 \cdot 2 /(1 - 5 )^{2^3}$. Unfortunately, that syntax is very hard for a computer to deal with. We subconsciously recognize the rules of operator associativity and operator precedence for mathematical equations, and as programmers we have to know about operator precedence for the languages that we work with; we’ve learned to parse these things with ease so it’s easy to overlook how much is actually going on. When you throw in parenthetical grouping, the task at hand is definitely non-trivial.

Alternatives such as Reverse Polish Notation write expressions in a fashion that’s unambiguous and denotes order-of-operation without parentheses. However, unless you’re really used to working with it (I’ve seen people beast through computations on RPN calculators like the HP-48), you’ll have to spend some time thinking your way through the expression that you’re writing or reading. As such, it’s a good thing that languages don’t require us to write things that way, but that does leave us with the problem of parsing an expression while taking into account associativity and precedence.

## Using The Shunting-Yard Algorithm

Dijkstra first described the algorithm in 1961 (as if that guy hadn’t done enough brilliant work already…). It provides a simple means of converting expressions in infix notation to prefix notation. Most of code that I’ve seen for it outputs the original expression in RPN but the same procedure can generate an abstract syntax tree as well (in fact, formally speaking RPN is generated by the post-order traversal of an AST). The algorithm is iterative and runs in $O(n)$ so performance wise it’s as good as you can do for a parser. The code below assumes that every operator has two operands – modifying it to accept unary operators or functions with parameters is quite simple. I also avoided the cruft of refactoring the code into something more OO to keep things short and simple. You would usually want to make the AST nicer to work with, and it’s easy to specialize nodes as operators to clean up evaluation.