Over the years I’ve built up a suite of unit tests for various types of java collections. When I started doing so, I didn’t know of any good off-the-shelf means of generic collections testing. The JDK itself has some unit/regression tests, but not as many as you would think, and that aside they rely on a custom build system/testing framework.

I’ve used google-guava in a few projects but recently discovered their not so well advertised test support library: guava-testlib. I tried it out out on my most recent project (a custom random access list with $O(1)$ deque operations) and I’m very impressed with the test coverage and flexibility. The documentation is sparse but once you get it working you’ll have a test suite with hundreds to thousands of unit tests generated for your list, map, or set. Guava-testlib also generates tests for the non-standard collections in google-guava.

The library relies on JUnit, and most of what I found online detailing its use pertains to JUnit 3. There was a substantial API break between JUnit 3 and JUnit 4, as JUnit 3 relies on naming conventions and inheritance whereas JUnit 4 uses less intrusive annotations. I struggled for a bit trying to figure out how to get things working without resorting to the old JUnit 3 APIs. I arrived at something that works, but it’s still a little dirty as it indirectly relies on a magic static method named suite() from the JUnit 3 days.

packagecom.indigo.collections;importcom.google.common.collect.testing.*;importcom.google.common.collect.testing.features.CollectionFeature;importcom.google.common.collect.testing.features.CollectionSize;importcom.google.common.collect.testing.features.ListFeature;importjunit.framework.TestSuite;importorg.junit.Test;importorg.junit.runner.RunWith;importorg.junit.runners.Suite;importjava.util.ArrayList;importjava.util.Arrays;importjava.util.List;/** * Your test class must be annotated with {@link RunWith} to specify that it's a * test suite and not a single test. */@RunWith(Suite.class)/** * We need to use static inner classes as JUnit only allows for empty "holder" * suite classes. */@Suite.SuiteClasses({ArrayListTest.GuavaTests.class,ArrayListTest.AdditionalTests.class,})publicclassArrayListTest{/** * Add your additional test cases here. */publicstaticclassAdditionalTests{@TestpublicvoidtestFoo(){}}/** * This class will generate the guava test suite. It needs a public static * magic method called {@link GuavaTests#suite()} to do so. */publicstaticclassGuavaTests{/** * The method responsible for returning the {@link TestSuite} generated * by one of the {@link FeatureSpecificTestSuiteBuilder} classes. * This method must be public static method with no arguments named * "suite()". * * @return An instance of {@link TestSuite} for collection testing. */publicstaticTestSuitesuite(){/** * guava-testlib has a host of test suite builders available, * all descending from {@link FeatureSpecificTestSuiteBuilder}. * The * {@link FeatureSpecificTestSuiteBuilder#usingGenerator(Object)} * is the main entry point in that collections are tested by * implementing {@link TestCollectionGenerator} and providing an * instance of the interface to the test suite builder via the * #usingGenerator(Object) method. Most of suite builder classes * provide a convenience method such as * {@link ListTestSuiteBuilder.using()} that streamline the process * of creating a builder. * */returnListTestSuiteBuilder// The create method is called with an array of elements// that should populate the collection..using(newTestStringListGenerator(){@OverrideprotectedList<String>create(String[]elements){returnnewArrayList<>(Arrays.asList(elements));}})// You can optionally give a name to your test suite. This// name is used by JUnit and other tools during report// generation..named("ArrayList tests")// Guava has a host of "features" in the// com.google.common.collect.testing.features package. Use// them to specify how the collection should behave, and// what operations are supported..withFeatures(ListFeature.GENERAL_PURPOSE,ListFeature.SUPPORTS_ADD_WITH_INDEX,ListFeature.SUPPORTS_REMOVE_WITH_INDEX,ListFeature.SUPPORTS_SET,CollectionFeature.SUPPORTS_ADD,CollectionFeature.SUPPORTS_REMOVE,CollectionFeature.SUPPORTS_ITERATOR_REMOVE,CollectionFeature.ALLOWS_NULL_VALUES,CollectionFeature.GENERAL_PURPOSE,CollectionSize.ANY).createTestSuite();}}}

If you need to find the shortest path for a graph with negative weights, the Bellman-Ford algorithm can do so. Dijkstra’s algorithm has better time complexity, but Bellman-Ford can handle graphs with negative weights, and even be used to detect negative cycles.

Detecting Negative Cycles

Today’s post is short and sweet – I might add code later but the algorithm is quite simple. For graphs with negative weights, Dijkstra’s algorithm fails. Negative weights introduce the potential for negative cycles, in which case a shortest path might not exist as a shorter path could always be constructed by transversing the cycle once again. That breaks Dijkstra. The Bellman-Ford algorithm can detect this when it happens. Unfortunately, the generality comes at the cost of time complexity. Dijkstra’s algorithm runs in $O(|E| + |V|\log(|E|))$. Bellman-Ford takes $O(|E| \cdot |V|))$, where $V$ and $E$ are the sizes of the vertex and edge sets, respectively. There are modifications to Bellman-Ford with slightly better time complexity, but Bellman-Ford is quite simple to implement.

Finding Negative Cycles

Once you know that there is a negative cycle, you still need to find it. There are fast algorithms to do so, but depth-first search (DFS) is arguably the simplest route. You can run DFS on a subgraph that’s produced as a bi product running Bellman-Ford (the shortest path edge set that’s generated along the way) for slightly better time complexity. There’s a break even point between just running depth-first search on the full graph or starting off with Bellman-Ford to check if there’s a negative cycle in the first place. The run-time complexity of DFS is exponential for graphs with a branching factor greater than one so it’s not cheap. For really small graphs you’re almost certainly better off with a vectorized brute force search.

Why do we Care?

Finding a path via DFS, or the shortest path via Bellman-Ford/Dijkstra’s algorithm/A-star has lots of applications in optimization and logistics. Here’s one nice example. Graphs with negative cycles are a special case of the shortest path problem, and once upon a time, currencies were inefficient enough that triangular arbitrage persisted for some time. These days, it’s a purely high frequency game.

Parsing is hard. Parser generators such as ANTLR make it significantly easier by providing a tool to express complex grammars via Extended Backus–Naur Form (a grammar for defining grammars) and generate parsers automatically. However, that’s a pretty heavy weight solution, especially if you just need to do something quick and dirty like implement an expression evaluator. There’s a significantly easier means of parsing simple grammars that describe things like mathematical equations called the Shunting-yard algorithm.

The Shunting-yard algorithm can be extended to handle more complex tasks, but at its core the algorithm parses an expression written in infix notation and applies the rules of operator precedence with optional parentheses to rewrite the original expression as something unambiguous. You can use it to rewrite the parsed expression in Reverse Polish Notation, or to produce an abstract syntax tree (AST). There are lots of examples of the former but I haven’t seen to many of the latter. The code below implements both.

Parsing 101

The simplest end of the parsing spectrum involves something along the lines of taking a CSV file and converting it into a stream of usable values. On the most complex end you have parsers that can read a C++ source file and create a usable representation of it for the compiler. The former definitely doesn’t merit anything complicated, the latter requires some of the most powerful tools for language recognition that we have. For “heavy duty parsing” the process usually looks something like: formal grammar -> lexer -> parser -> abstract syntax tree. If you’re writing a compiler or something similar, the abstract syntax tree gets used as a unique, structurally validated, and unambiguous input to whatever happens next.

There are very powerful parsers such as LL(*) that are almost impossible to write by hand for any sizable grammar. Thankfully, tools such as ANTLR will write them for us, given a formal description of the language. However this process is fairly heavy weight, and if you’re doing something simple like implementing a simple scripting language or an equation evaluator you might be able to get away without it.

Operators, Precedence, and Associativity

Most of us are used to looking at mathematical equations in infix notation: $3 + 4 \cdot 2 /(1 - 5 )^{2^3}$. Unfortunately, that syntax is very hard for a computer to deal with. We subconsciously recognize the rules of operator associativity and operator precedence for mathematical equations, and as programmers we have to know about operator precedence for the languages that we work with; we’ve learned to parse these things with ease so it’s easy to overlook how much is actually going on. When you throw in parenthetical grouping, the task at hand is definitely non-trivial.

Alternatives such as Reverse Polish Notation write expressions in a fashion that’s unambiguous and denotes order-of-operation without parentheses. However, unless you’re really used to working with it (I’ve seen people beast through computations on RPN calculators like the HP-48), you’ll have to spend some time thinking your way through the expression that you’re writing or reading. As such, it’s a good thing that languages don’t require us to write things that way, but that does leave us with the problem of parsing an expression while taking into account associativity and precedence.

Using The Shunting-Yard Algorithm

Dijkstra first described the algorithm in 1961 (as if that guy hadn’t done enough brilliant work already…). It provides a simple means of converting expressions in infix notation to prefix notation. Most of code that I’ve seen for it outputs the original expression in RPN but the same procedure can generate an abstract syntax tree as well (in fact, formally speaking RPN is generated by the post-order traversal of an AST). The algorithm is iterative and runs in $O(n)$ so performance wise it’s as good as you can do for a parser. The code below assumes that every operator has two operands – modifying it to accept unary operators or functions with parameters is quite simple. I also avoided the cruft of refactoring the code into something more OO to keep things short and simple. You would usually want to make the AST nicer to work with, and it’s easy to specialize nodes as operators to clean up evaluation.

packageparsing;/** * A simple interface common to all operators in a grammar. * * @author Kelly Littlepage */publicinterfaceOperator{/*** * * @return <code>true</code> if the operator is right associative, and * <code>false</code> otherwise. By definition, any operator that isn't * right associative is left associative. */booleanisRightAssociative();/*** * Compares the precedence of this operator against another operator. * * @param o The operator to compare against. * * @return -1 if this operator is of lower precedence, 1 if it's of greater * precedence, and 0 if they're of equal precedence. If two operators are of * equal precedence, right associativity and parenthetical groupings must be * used to determine precedence. */intcomparePrecedence(Operatoro);/*** * * @return Gets the symbol used to denote the operator. */chargetSymbol();}

packageparsing;/** * A simple base implementation of the {@link Operator} interface. * * @author Kelly Littlepage */publicclassBaseOperatorimplementsOperator{privatefinalcharsymbol;privatefinalbooleanrightAssociative;privatefinalintprecedence;/*** * Creates a new BaseOperator. * * @param symbol The symbol used to represent the operator. * @param rightAssociative <code>true</code> if the operator is right * associative, and false otherwise. * @param precedence A numerical precedence for the operator relative to * all other operators in use. */publicBaseOperator(charsymbol,booleanrightAssociative,intprecedence){this.symbol=symbol;this.rightAssociative=rightAssociative;this.precedence=precedence;}@OverridepublicbooleanisRightAssociative(){returnrightAssociative;}@OverridepublicintcomparePrecedence(Operatoro){if(oinstanceofBaseOperator){BaseOperatorother=(BaseOperator)o;returnprecedence>other.precedence?1:other.precedence==precedence?0:-1;}else{// Defer the comparison to the second operator reflectivelyreturn-o.comparePrecedence(this);}}@OverridepublicchargetSymbol(){returnsymbol;}@OverridepublicStringtoString(){returnCharacter.toString(symbol);}}

packageparsing;/** * A simple AST node class. * * @author Kelly Littlepage */publicclassASTNode{privatefinalcharvalue;privatefinalASTNodeleftASTNode;privatefinalASTNoderightASTNode;/*** * Constructs a new AST node. There's no explicit specialization for leaf * nodes. Leaves are denoted by nodes where both the left and right node * is null. * * @param value The value held by the node. * @param leftASTNode The left node, or <code>null</code> if there isn't one. * @param rightASTNode The right node, or <code>null</code> if there isn't one. */publicASTNode(charvalue,ASTNodeleftASTNode,ASTNoderightASTNode){this.value=value;this.leftASTNode=leftASTNode;this.rightASTNode=rightASTNode;}/*** * * @return The value held by the node. */publicchargetValue(){returnvalue;}/*** * * @return The left node, or <code>null</code> if there isn't one. */publicASTNodegetLeftASTNode(){returnleftASTNode;}/*** * * @return The right node, or <code>null</code> if there isn't one. */publicASTNodegetRightASTNode(){returnrightASTNode;}}

packageparsing;importjava.util.Collection;importjava.util.HashMap;importjava.util.Map;importjava.util.Stack;/** * A simple implementation of the Shunting Yard algorithm to parse an * infix expression and generate either an abstract syntax tree or another * expression in Reverse Polish notation. This class duplicates code between * {@link #convertInfixNotationToRPN(String)} and * {@link #convertInfixNotationToAST(String)} as the underlying algorithm is * nearly identical. You can easily yield RPN given an AST via post-order * traversal of the tree. * * Error handling is almost non existent, all operators are taken as single * characters. Both of these issues are easily corrected for. * * @see <a href="http://en.wikipedia.org/wiki/Shunting-yard_algorithm"> * http://en.wikipedia.org/wiki/Shunting-yard_algorithm</a> for details of * the algorithm. * * @author Kelly Littlepage */publicclassShuntingYardParser{privatefinalMap<Character,Operator>operators;privatestaticvoidaddNode(Stack<ASTNode>stack,charoperator){finalASTNoderightASTNode=stack.pop();finalASTNodeleftASTNode=stack.pop();stack.push(newASTNode(operator,leftASTNode,rightASTNode));}/*** * Creates a new ShuntingYardParser for the given operators. * * @param operators A collection of operators that should be recognized by * the parser. */publicShuntingYardParser(Collection<Operator>operators){this.operators=newHashMap<>();for(Operatoro:operators){this.operators.put(o.getSymbol(),o);}}/*** * Convert an expression in infix notation to an abstract syntax tree. * * @param input The expression, in infix notation. * * @return An {@link ASTNode} that serves as the root of the AST. */publicASTNodeconvertInfixNotationToAST(finalStringinput){finalStack<Character>operatorStack=newStack<>();finalStack<ASTNode>operandStack=newStack<>();finalchar[]chars=input.toCharArray();main:for(charc:chars){charpopped;switch(c){case' ':break;case'(':operatorStack.push('(');break;case')':while(!operatorStack.isEmpty()){popped=operatorStack.pop();if('('==popped){continuemain;}else{addNode(operandStack,popped);}}thrownewIllegalStateException("Unbalanced right "+"parentheses");default:if(operators.containsKey(c)){finalOperatoro1=operators.get(c);Operatoro2;while(!operatorStack.isEmpty()&&null!=(o2=operators.get(operatorStack.peek()))){if((!o1.isRightAssociative()&&0==o1.comparePrecedence(o2))||o1.comparePrecedence(o2)<0){operatorStack.pop();addNode(operandStack,o2.getSymbol());}else{break;}}operatorStack.push(c);}else{operandStack.push(newASTNode(c,null,null));}break;}}while(!operatorStack.isEmpty()){addNode(operandStack,operatorStack.pop());}returnoperandStack.pop();}/*** * Convert an expression in infix notation to an expression in Reverse * Polish Notation. * * @param input The expression, in infix notation. * * @return An expression in Reverse Polish notation. */publicStringconvertInfixNotationToRPN(finalStringinput){finalStack<Character>operatorStack=newStack<>();finalStringBuildersb=newStringBuilder();finalchar[]chars=input.toCharArray();main:for(charc:chars){charpopped;switch(c){case' ':break;case'(':operatorStack.push('(');break;case')':while(!operatorStack.isEmpty()){popped=operatorStack.pop();if('('==popped){continuemain;}else{sb.append(" ").append(popped);}}thrownewIllegalStateException("Unbalanced right "+"parentheses");default:if(operators.containsKey(c)){finalOperatoro1=operators.get(c);Operatoro2;while(!operatorStack.isEmpty()&&null!=(o2=operators.get(operatorStack.peek()))){if((!o1.isRightAssociative()&&0==o1.comparePrecedence(o2))||o1.comparePrecedence(o2)<0){operatorStack.pop();sb.append(" ").append(o2.getSymbol());}else{break;}}operatorStack.push(c);}else{if(sb.length()>0){sb.append(" ");}sb.append(c);}break;}}while(!operatorStack.isEmpty()){sb.append(" ").append(operatorStack.pop());}returnsb.toString();}}

packageparsing;importjava.util.ArrayList;importjava.util.Collection;importjava.util.Stack;/** * A simple demonstration of the Shunting-Yard algorithm. It's easy to extend * this code to handle arbitrary operators/functions (including unary ones) * and to validate input. * * @author Kelly Littlepage */publicclassShuntingYardDemo{/*** * Evaluates the calculation encoded in the given abstract syntax tree. * This method uses recursion to keep things clean. If you needed to * evaluate a very deep tree you might need to rewrite this method to use * depth first search and evaluate the tree using an explicit stack. * * @param tree The {@link ASTNode} to evaluate. * * @return The result of the computation. */privatestaticdoubleevaluateAST(ASTNodetree){switch(tree.getValue()){case'^':returnMath.pow(evaluateAST(tree.getLeftASTNode()),evaluateAST(tree.getRightASTNode()));case'*':returnevaluateAST(tree.getLeftASTNode())*evaluateAST(tree.getRightASTNode());case'/':returnevaluateAST(tree.getLeftASTNode())/evaluateAST(tree.getRightASTNode());case'+':returnevaluateAST(tree.getLeftASTNode())+evaluateAST(tree.getRightASTNode());case'-':returnevaluateAST(tree.getLeftASTNode())-evaluateAST(tree.getRightASTNode());default:returnDouble.valueOf(Character.toString(tree.getValue()));}}/*** * Evaluates the expression given in Reverse Polish notation. * * @param rpn The expression, in reverse polish notation. * * @return The result of the calculation. */privatestaticdoubleevaluateRPN(Stringrpn){finalStack<String>computation=newStack<>();finalchar[]chars=rpn.toCharArray();for(charc:chars){finalStringv1;finalStringv2;switch(c){case'^':v2=computation.pop();v1=computation.pop();computation.push(Double.toString(Math.pow(Double.valueOf(v1),Double.valueOf(v2))));break;case'*':v2=computation.pop();v1=computation.pop();computation.push(Double.toString(Double.valueOf(v1)*Double.valueOf(v2)));break;case'/':v2=computation.pop();v1=computation.pop();computation.push(Double.toString(Double.valueOf(v1)/Double.valueOf(v2)));break;case'+':v2=computation.pop();v1=computation.pop();computation.push(Double.toString(Double.valueOf(v1)+Double.valueOf(v2)));break;case'-':v2=computation.pop();v1=computation.pop();computation.push(Double.toString(Double.valueOf(v1)-Double.valueOf(v2)));break;case' ':break;default:computation.push(Character.toString(c));}}returnDouble.valueOf(computation.pop());}/*** * A simple demonstration of parsing an infix expression and converting it * to either Reverse Polish Notation or an abstract syntax tree. * */publicstaticvoidmain(String[]args){// Define our basic operators for arithmetic.finalCollection<Operator>operators=newArrayList<>();operators.add(newBaseOperator('^',true,4));operators.add(newBaseOperator('*',false,3));operators.add(newBaseOperator('/',false,3));operators.add(newBaseOperator('+',false,2));operators.add(newBaseOperator('-',false,2));finalShuntingYardParserparser=newShuntingYardParser(operators);finalStringinput="3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3";finalStringrpn=parser.convertInfixNotationToRPN(input);System.out.println("RPN expression: "+rpn);finalASTNodeparseTree=parser.convertInfixNotationToAST(input);finaldoubleresult=evaluateAST(parseTree);System.out.println("Result: "+result);assert0==Double.compare(result,evaluateRPN(rpn));}}