We introduce forest straight-line programs (FSLPs) as a compressed representation for unranked trees (forests). These are compared to other compression formalisms for forests, namely top dags and TSLPs (tree straight-line programs) for fcns (first-child next-sibling) encoding. We show that FSLPs are equally succinct to TSLPs for fcns encodings but converting them to top dags requires a blowup of the alphabet size. All of these formalisms are treated uniformly using an algebraic setting.
We then use FSLPs to implement various data structures and algorithms:
We can answer in polynomial time if two FSLPs produce equal forests modulo associativity and/or modulo commutativity. Allowing linear preprocessing time, we implement a data structure that allows navigation steps on FSLPs (go to the first/last child, the next/previous neighbor or to the parent, or print the symbol at the current location) to occur in constant time. Allowing polynomial time preprocessing, we extend this to include a subtree equality check that works in constant time.
We also study the effects of unorderedness on compression and derive various bounds. In the best case, compression using DAGs can be improved from $n$ to $log n$ when trees are allowed to be reordered, i.e. a ratio of $n / log n$.
For TSLPs/FSLPs instead of DAGs, we show a ratio of $n * log log n / log^2 n$.
We show how to evaluate a form of visibly one-counter automata on FSLPs in polynomial time by implementing them as an algebra.
Finally, we implement an algorithm that produces a top dag and is worst-case optimal, i.e. it always finds a top dag that compresses to at least $n/log n$, while also retaining other beneficial properties from previous top dag compressors.