Thursday, July 24, 2008

String Trees: an interview question


Different people are finding creating a parser quite a hard job. But the main “ring” in it is to understand not some development problem, but is to find a good representation (format) for the serializer and deserializer. This means that the hardest part of the parser is to find good delimiters for the task.
So let’s begin from creating a parser, for a simple binary tree. Let’s suppose that each node of that tree can have two (left and right) child nodes, and also a value. Then let’s choose some simple format for serialization. Here is a simple example.

{LeftChild|Value:RightChild}

This will surely work, if our content (values of nodes) does not contain any characters like “:” or “{”, “}”.

And now let’s look at a simple example:
{{{|4:}|3:{|5:{|2:}}}
Let’s understand the above written example. The root node of the tree has a value of 3. It has a left child, which has value of 4 and has no child nodes – it’s a leaf node. The right child of the root node is a node, with a value of 5. It has no left child, but has a right child, which has a value of 2. And that node is also a leaf node. So now we finally have an appropriate imagination about the example given above.
Now we can create a parser for it very easy. Most starting developers are trying to use recursive functions to create parsers, but they usually miss an important aspect, that if in that case the (for example) tree, which they are going to parse is too big, they will get a “Memory overflow” exception. This means that the best way to create a parser is to use sequential parsing algorithm.

Here is the code for the upper example on C#.
public class TreeNode
{
    public const char NODE_START = '{';
    public const char NODE_END = '}';
    public const char LEFT_SEPARATOR = '|';
    public const char RIGHT_SEPARATOR = ':';

    private int value;
    private TreeNode leftChild;
    private TreeNode rightChild;
    private TreeNode parentNode;

    public static TreeNode Parse(string argString)
    {
        //The format for the input node is:
        //{[leftChild]|value:[rightChild]}

        TreeNode tmpCurrentRoot = null;
        int tmpPosition = 0;
        ParseStage tmpStage = ParseStage.LeftChild;
        while (tmpPosition < argString.Length)
        {
            switch (argString[tmpPosition])
            {
                case NODE_START:
                    //Create a new node with the current parent
                    TreeNode tmpNewNode = new TreeNode(tmpCurrentRoot);
                    if (tmpCurrentRoot != null)
                    {
                        if (tmpStage == ParseStage.LeftChild)
                        {
                            tmpCurrentRoot.LeftChild = tmpNewNode;
                        }
                        else if (tmpStage == ParseStage.RightChild)
                        {
                            tmpCurrentRoot.RightChild = tmpNewNode;
                        }
                    }
                    //Set the pointer of current node to the new created one
                    tmpCurrentRoot = tmpNewNode;
                    tmpStage = ParseStage.LeftChild;

                    tmpPosition++;
                    break;

                case LEFT_SEPARATOR:
                    tmpStage = ParseStage.Value;
                    tmpPosition++;
                    break;

                case RIGHT_SEPARATOR:
                    tmpStage = ParseStage.RightChild;
                    tmpPosition++;
                    break;

                case NODE_END:
                    if (tmpCurrentRoot.ParentNode == null)
                    {
                        return tmpCurrentRoot;
                    }
                    else
                    {
                        if (tmpCurrentRoot == tmpCurrentRoot.ParentNode.LeftChild)
                        {
                            tmpStage = ParseStage.LeftChild;
                        }
                        else
                        {
                            tmpStage = ParseStage.RightChild;
                        }
                        tmpCurrentRoot = tmpCurrentRoot.ParentNode;
                    }
                    tmpPosition++;
                    break;

                default:
                    if (tmpStage == ParseStage.Value)
                    {
                        string tmpValueStr = argString.Substring(tmpPosition, argString.IndexOf(RIGHT_SEPARATOR, tmpPosition) - tmpPosition);
                        tmpCurrentRoot.Value = Int32.Parse(tmpValueStr);
                        tmpPosition += tmpValueStr.Length;
                        tmpStage = ParseStage.RightChild;
                    }
                    break;
            }
        }

        throw new Exception("ERROR DURING PARSING");
    }

    public TreeNode(TreeNode argParentNode)
    {
        this.parentNode = argParentNode;
        leftChild = null;
        rightChild = null;
        value = 0;
    }

    public int Value
    {
        get
        {
            return this.value;
        }
        set
        {
            this.value = value;
        }
    }

    public TreeNode LeftChild
    {
        get
        {
            return this.leftChild;
        }

        set
        {
            this.leftChild = value;
            if (this.leftChild != null)
            {
                this.leftChild.parentNode = this;
            }
        }
    }

    public TreeNode RightChild
    {
        get
        {
            return this.rightChild;
        }
        set
        {
            this.rightChild = value;
            if (this.rightChild != null)
            {
                this.rightChild.ParentNode = this;
            }
        }
    }

    public TreeNode ParentNode
    {
        get
        {
            return this.parentNode;
        }
        set
        {
            this.parentNode = value;
        }
    }

    public override string ToString()
    {
        string tmpResult = String.Empty;
        tmpResult += NODE_START;
        if (this.leftChild != null)
        {
            tmpResult += this.leftChild.ToString();
        }
        tmpResult += LEFT_SEPARATOR;
        tmpResult += this.value;
        tmpResult += RIGHT_SEPARATOR;

        if (this.rightChild != null)
        {
            tmpResult += this.rightChild.ToString();
        }

        tmpResult += NODE_END;
        return tmpResult;
    }

    protected enum ParseStage
    {
        LeftChild,
        Value,
        RightChild
    }
}